Appendix L: Information content functions
editAppendix L: Information content functions
editThe information content functions detect anomalies in the amount of information that is contained in strings within a bucket. These functions can be used as a more sophisticated method to identify incidences of data exfiltration or C2C activity, when analyzing the size in bytes of the data might not be sufficient.
The machine learning features include the following information content functions:
-
info_content
,high_info_content
,low_info_content
Info_content, High_info_content, Low_info_content
editThe info_content
function detects anomalies in the amount of information that
is contained in strings in a bucket.
If you want to monitor for unusually high amounts of information,
use high_info_content
.
If want to look at drops in information content, use low_info_content
.
These functions support the following properties:
-
field_name
(required) -
by_field_name
(optional) -
over_field_name
(optional) -
partition_field_name
(optional)
For more information about those properties, see the create anomaly detection jobs API.
Example 1: Analyzing subdomain strings with the info_content function.
{ "function" : "info_content", "field_name" : "subdomain", "over_field_name" : "highest_registered_domain" }
If you use this info_content
function in a detector in your anomaly detection job, it
models information that is present in the subdomain
string. It detects
anomalies where the information content is unusual compared to the other
highest_registered_domain
values. An anomaly could indicate an abuse of the
DNS protocol, such as malicious command and control activity.
In this example, both high and low values are considered anomalous.
In many use cases, the high_info_content
function is often a more appropriate
choice.
Example 2: Analyzing query strings with the high_info_content function.
{ "function" : "high_info_content", "field_name" : "query", "over_field_name" : "src_ip" }
If you use this high_info_content
function in a detector in your anomaly detection job,
it models information content that is held in the DNS query string. It detects
src_ip
values where the information content is unusually high compared to
other src_ip
values. This example is similar to the example for the
info_content
function, but it reports anomalies only where the amount of
information content is higher than expected.
Example 3: Analyzing message strings with the low_info_content function.
{ "function" : "low_info_content", "field_name" : "message", "by_field_name" : "logfilename" }
If you use this low_info_content
function in a detector in your anomaly detection job,
it models information content that is present in the message string for each
logfilename
. It detects anomalies where the information content is low
compared to its past behavior. For example, this function detects unusually low
amounts of information in a collection of rolling log files. Low information
might indicate that a process has entered an infinite loop or that logging
features have been disabled.