Entity Analytics Inputedit

This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.

The Entity Analytics input collects identity assets, such as users, from external identity providers.

The following identity providers are supported:

Configuration optionsedit

The entity-analytics input supports the following configuration options plus the Common options described later.

provideredit

The identity provider. Must be one of: azure-ad or okta.

Common optionsedit

The following configuration options are supported by all inputs.

enablededit

Use the enabled option to enable and disable inputs. By default, enabled is set to true.

tagsedit

A list of tags that Filebeat includes in the tags field of each published event. Tags make it easy to select specific events in Kibana or apply conditional filtering in Logstash. These tags will be appended to the list of tags specified in the general configuration.

Example:

filebeat.inputs:
- type: entity-analytics
  . . .
  tags: ["json"]
fieldsedit

Optional fields that you can specify to add additional information to the output. For example, you might add fields that you can use for filtering log data. Fields can be scalar values, arrays, dictionaries, or any nested combination of these. By default, the fields that you specify here will be grouped under a fields sub-dictionary in the output document. To store the custom fields as top-level fields, set the fields_under_root option to true. If a duplicate field is declared in the general configuration, then its value will be overwritten by the value declared here.

filebeat.inputs:
- type: entity-analytics
  . . .
  fields:
    app_id: query_engine_12
fields_under_rootedit

If this option is set to true, the custom fields are stored as top-level fields in the output document instead of being grouped under a fields sub-dictionary. If the custom field names conflict with other field names added by Filebeat, then the custom fields overwrite the other fields.

processorsedit

A list of processors to apply to the input data.

See Processors for information about specifying processors in your config.

pipelineedit

The ingest pipeline ID to set for the events generated by this input.

The pipeline ID can also be configured in the Elasticsearch output, but this option usually results in simpler configuration files. If the pipeline is configured both in the input and output, the option from the input is used.

keep_nulledit

If this option is set to true, fields with null values will be published in the output document. By default, keep_null is set to false.

indexedit

If present, this formatted string overrides the index for events from this input (for elasticsearch outputs), or sets the raw_index field of the event’s metadata (for other outputs). This string can only refer to the agent name and version and the event timestamp; for access to dynamic fields, use output.elasticsearch.index or a processor.

Example value: "%{[agent.name]}-myindex-%{+yyyy.MM.dd}" might expand to "filebeat-myindex-2019.11.01".

publisher_pipeline.disable_hostedit

By default, all events contain host.name. This option can be set to true to disable the addition of this field to all events. The default value is false.

Providersedit

Azure Active Directory (azure-ad)edit

The azure-ad provider allows the input to retrieve users, with group memberships, from Azure Active Directory (AD).

Setupedit

The necessary API permissions need to be granted in Azure in order for the provider to function properly:

Permission Type

GroupMember.Read.All

Application

User.Read.All

Application

Device.Read.All

Application

For a full guide on how to set up the necessary App Registration, permission granting, and secret configuration, follow this guide.

How It Worksedit

Overviewedit

The Azure AD provider periodically contacts Azure Active Directory, retrieving updates for users, devices and groups, updates its internal cache of user and device metadata and group membership information, and ships updated user metadata to Elasticsearch.

Fetching and shipping updates occurs in one of two processes: full synchronizations and incremental updates. Full synchronizations will send the entire list of users and devices in state, along with write markers to indicate the start and end of the synchronization event. Incremental updates will only send data for changed users and devices during that event. Changes on a user or device can come in many forms, whether it be a change to the user or device metadata, a user/device was added or deleted, or group membership was changed (either direct or transitive).

API Interactionsedit

The provider periodically retrieves changes to user, device and group metadata from the Microsoft Graph API for Azure Active Directory. This is done through calls to three API endpoints:

The /delta endpoint will provide changes that have occurred since the last call, with state being tracked through a delta token. If the /delta endpoint is called without a delta token, it will provide a full listing of users, devices or groups, similar to the non-delta endpoint. Since many results may be returned, there is a paging mechanism that is used. In the response body, there are two fields that may appear, @odata.nextLink and @odata.deltaLink.

  • If a @odata.nextLink is returned, then there are more results to fetch, and the value of this field will contain the URL which should be immediately fetched.
  • If a @odata.deltaLink is returned, then there are currently no more results, and the value of this field (a URL) should be saved for the next time updates need to be fetched (the delta token).

The group metadata will be used to enrich users and devices with group membership information. Direct memberships, along with transitive memberships, will be provided for users and devices.

Sending User and Device Metadata to Elasticsearchedit

During a full synchronization, all users and devices stored in state will be sent to the output, while incremental updates will only send users which have been updated. Full synchronizations will be bounded on either side by write marker documents, which will look something like this:

{
    "@timestamp": "2022-11-04T09:57:19.786056-05:00",
    "event": {
        "action": "started",
        "start": "2022-11-04T09:57:19.786056-05:00"
    },
    "labels": {
        "identity_source": "azure-1"
    }
}

User documents will show the current state of the user.

Example user document:

{
    "@timestamp": "2022-11-04T09:57:19.786056-05:00",
    "event": {
        "action": "user-discovered",
    },
    "azure_ad": {
        "userPrincipalName": "example.user@example.com",
        "mail": "example.user@example.com",
        "displayName": "Example User",
        "givenName": "Example",
        "surname": "User",
        "jobTitle": "Software Engineer",
        "mobilePhone": "123-555-1000",
        "businessPhones": ["123-555-0122"]
    },
    "user": {
        "id": "5ebc6a0f-05b7-4f42-9c8a-682bbc75d0fc",
        "group": [
            {
                "id": "331676df-b8fd-4492-82ed-02b927f8dd80",
                "name": "group1"
            },
            {
                "id": "d140978f-d641-4f01-802f-4ecc1acf8935",
                "name": "group2"
            }
        ]
    },
    "labels": {
        "identity_source": "azure-1"
    }
}

Device documents will show the current state of the device.

Example device document:

{
    "@timestamp": "2022-11-04T09:57:19.786056-05:00",
    "event": {
        "action": "device-discovered",
    },
    "azure_ad": {
        "accountEnabled": true,
        "deviceId": "2fbbb8f9-ff67-4a21-b867-a344d18a4198",
        "displayName": "DESKTOP-LETW452G",
        "operatingSystem": "Windows",
        "operatingSystemVersion": "10.0.19043.1337",
        "physicalIds": {
            "extensionAttributes": {
                "extensionAttribute1": "BYOD-Device"
            }
        },
        "alternativeSecurityIds": [
            {
                "type": 2,
                "identityProvider": null,
                "key": "DGFSGHSGGTH345A...35DSFH0A"
            },
        ]
    },
    "device": {
        "id": "adbbe40a-0627-4328-89f1-88cac84dbc7f",
        "group": [
            {
                "id": "331676df-b8fd-4492-82ed-02b927f8dd80",
                "name": "group1"
            }
        ]
        "registered_owners": [
            {
                "id": "5ebc6a0f-05b7-4f42-9c8a-682bbc75d0fc",
                "userPrincipalName": "example.user@example.com",
                "mail": "example.user@example.com",
                "displayName": "Example User",
                "givenName": "Example",
                "surname": "User",
                "jobTitle": "Software Engineer",
                "mobilePhone": "123-555-1000",
                "businessPhones": ["123-555-0122"]
            },
        ],
        "registered_users": [
            {
                "id": "5ebc6a0f-05b7-4f42-9c8a-682bbc75d0fc",
                "userPrincipalName": "example.user@example.com",
                "mail": "example.user@example.com",
                "displayName": "Example User",
                "givenName": "Example",
                "surname": "User",
                "jobTitle": "Software Engineer",
                "mobilePhone": "123-555-1000",
                "businessPhones": ["123-555-0122"]
            },
        ],
    },
    "labels": {
        "identity_source": "azure-1"
    }
}

Configurationedit

Example configuration:

filebeat.inputs:
- type: entity-analytics
  enabled: true
  id: azure-1
  provider: azure-ad
  dataset: "all"
  sync_interval: "12h"
  update_interval: "30m"
  client_id: "CLIENT_ID"
  tenant_id: "TENANT_ID"
  secret: "SECRET"

The azure-ad provider supports the following configuration:

tenant_idedit

The Tenant ID. Field is required.

client_idedit

The client/application ID. Used for authentication. Field is required.

secretedit

The secret value, used for authentication. Field is required.

datasetedit

The datasets to collect from the API. This can be one of "all", "users" or "devices", or may be left empty for the default behavior which is to collect all entities. When the dataset is set to "devices", some user entity data is collected in order to populate the registered users and registered owner fields for each device.

sync_intervaledit

The interval in which full synchronizations should occur. The interval must be longer than the update interval (update_interval) Expressed as a duration string (e.g., 1m, 3h, 24h). Defaults to 24h (24 hours).

update_intervaledit

The interval in which incremental updates should occur. The interval must be shorter than the full synchronization interval (sync_interval). Expressed as a duration string (e.g., 1m, 3h, 24h). Defaults to 15m (15 minutes).

login_endpointedit

Override the default authentication login endpoint. Only change if directed to do so. Altering this value will also require a change to login_scopes.

login_scopesedit

Override the default authentication scopes. Only change if directed to do so.

select.usersedit

Override the default user query selections. This is a list of optional query parameters. The default is ["accountEnabled", "userPrincipalName", "mail", "displayName", "givenName", "surname", "jobTitle", "officeLocation", "mobilePhone", "businessPhones"].

select.groupsedit

Override the default group query selections. This is a list of optional query parameters. The default is ["displayName", "members"].

select.devicesedit

Override the default device query selections. This is a list of optional query parameters. The default is ["accountEnabled", "deviceId", "displayName", "operatingSystem", "operatingSystemVersion", "physicalIds", "extensionAttributes", "alternativeSecurityIds"].

Okta User Identities (okta)edit

The okta provider allows the input to retrieve users and devices from the Okta user API.

Setupedit

The necessary API permissions need to be granted in Okta in order for the provider to function properly. In the administration dashboard for your Okta account, navigate to Security>API and in the Tokens tab click the "Create token" button to create a new token. Copy the token value and retain this to configure the provider. Note that the token will not be presented again, so it must be copied now. This value will use given to the provider via the okta_token configuration field.

Devices API access needs to be activated by Okta support.

How It Worksedit

Overviewedit

The Okta provider periodically contacts the Okta API, retrieving updates for users and devices, updates its internal cache of user metadata, and ships updated user/device metadata to Elasticsearch.

Fetching and shipping updates occurs in one of two processes: full synchronizations and incremental updates. Full synchronizations will send the entire list of users and devices in state, along with write markers to indicate the start and end of the synchronization event. Incremental updates will only send data for changed users and devices during that event. Changes on a user or device can come in many forms, whether it be a change to the user’s metadata, or a user was added or deleted.

API Interactionsedit

The provider periodically retrieves changes to user/device metadata from the Okta User and Device APIs. This is done through calls to:

Updates are tracked by the provider by retaining a record of the time of the last noted update in the returned user list. During provider updates the Okta provider makes use of the Okta API’s query filtering to only request records updated at or since the provider’s recorded last update.

Sending User Metadata to Elasticsearchedit

During a full synchronization, all users/devices stored in state will be sent to the output, while incremental updates will only send users and devices that have been updated. Full synchronizations will be bounded on either side by write marker documents, which will look something like this:

{
    "@timestamp": "2022-11-04T09:57:19.786056-05:00",
    "event": {
        "action": "started",
        "start": "2022-11-04T09:57:19.786056-05:00"
    },
    "labels": {
        "identity_source": "okta-1"
    }
}

User documents will show the current state of the user.

Example user document:

{
    "@timestamp": "2023-07-04T09:57:19.786056-05:00",
    "event": {
        "action": "user-discovered",
    },
    "okta": {
        "id": "userid",
        "status": "RECOVERY",
        "created": "2023-06-02T09:33:00.189752+09:30",
        "activated": "0001-01-01T00:00:00Z",
        "statusChanged": "2023-06-02T09:33:00.189752+09:30",
        "lastLogin": "2023-06-02T09:33:00.189752+09:30",
        "lastUpdated": "2023-06-02T09:33:00.189753+09:30",
        "passwordChanged": "2023-06-02T09:33:00.189753+09:30",
        "type": {
            "id": "typeid"
        },
        "profile": {
            "login": "name.surname@example.com",
            "email": "name.surname@example.com",
            "firstName": "name",
            "lastName": "surname"
        },
        "credentials": {
            "password": {},
            "provider": {
                "type": "OKTA",
                "name": "OKTA"
            }
        },
        "_links": {
            "self": {
                "href": "https://localhost/api/v1/users/userid"
            }
        }
    },
    "user": {
        "id": "userid",
    },
    "labels": {
        "identity_source": "okta-1"
    }
}

Device documents will show the current state of the device, including any associated users.

Example device document:

{
    "@timestamp": "2023-07-04T09:57:19.786056-05:00",
    "event": {
        "action": "device-discovered",
    },
    "okta": {
        "created": "2019-10-02T18:03:07Z",
        "id": "deviceid",
        "lastUpdated": "2019-10-02T18:03:07Z",
        "profile": {
            "diskEncryptionType": "ALL_INTERNAL_VOLUMES",
            "displayName": "Example Device name 1",
            "platform": "WINDOWS",
            "registered": true,
            "secureHardwarePresent": false,
            "serialNumber": "XXDDRFCFRGF3M8MD6D",
            "sid": "S-1-11-111"
        },
        "resourceAlternateID": "",
        "resourceDisplayName": {
            "sensitive": false,
            "value": "Example Device name 1"
        },
        "resourceID": "deviceid",
        "resourceType": "UDDevice",
        "status": "ACTIVE",
        "_links": {
            "activate": {
                "hints": {
                    "allow": [
                        "POST"
                    ]
                },
                "href": "https://localhost/api/v1/devices/deviceid/lifecycle/activate"
            },
            "self": {
                "hints": {
                    "allow": [
                        "GET",
                        "PATCH",
                        "PUT"
                    ]
                },
                "href": "https://localhost/api/v1/devices/deviceid"
            },
            "users": {
                "hints": {
                    "allow": [
                        "GET"
                    ]
                },
                "href": "https://localhost/api/v1/devices/deviceid/users"
            }
        },
        "users": [
            {
                "id": "userid",
                "status": "RECOVERY",
                "created": "2023-05-14T13:37:20Z",
                "activated": "0001-01-01T00:00:00Z",
                "statusChanged": "2023-05-15T01:50:30Z",
                "lastLogin": "2023-05-15T01:59:20Z",
                "lastUpdated": "2023-05-15T01:50:32Z",
                "passwordChanged": "2023-05-15T01:50:32Z",
                "type": {
                    "id": "typeid"
                },
                "profile": {
                    "login": "name.surname@example.com",
                    "email": "name.surname@example.com",
                    "firstName": "name",
                    "lastName": "surname"
                },
                "credentials": {
                    "password": {},
                    "provider": {
                        "type": "OKTA",
                        "name": "OKTA"
                    }
                },
                "_links": {
                    "self": {
                        "href": "https://localhost/api/v1/users/userid"
                    }
                }
            }
        ]
    },
    "device": {
        "id": "deviceid",
    },
    "labels": {
        "identity_source": "okta-1"
    }
}

Configurationedit

Example configuration:

filebeat.inputs:
- type: entity-analytics
  enabled: true
  id: okta-1
  provider: okta
  dataset: "all"
  sync_interval: "12h"
  update_interval: "30m"
  okta_domain: "OKTA_DOMAIN"
  okta_token: "OKTA_TOKEN"

The okta provider supports the following configuration:

okta_domainedit

The Okta domain. Field is required.

okta_tokenedit

The Okta secret token, used for authentication. Field is required.

collect_device_detailsedit

Whether the input should collect device and device-associated user details from the Okta API. Device details must be activated on the Okta account for this option.

datasetedit

The datasets to collect from the API. This can be one of "all", "users" or "devices", or may be left empty for the default behavior which is to collect all entities. When the dataset is set to "devices", some user entity data is collected in order to populate the registered users and registered owner fields for each device.

sync_intervaledit

The interval in which full synchronizations should occur. The interval must be longer than the update interval (update_interval) Expressed as a duration string (e.g., 1m, 3h, 24h). Defaults to 24h (24 hours).

update_intervaledit

The interval in which incremental updates should occur. The interval must be shorter than the full synchronization interval (sync_interval). Expressed as a duration string (e.g., 1m, 3h, 24h). Defaults to 15m (15 minutes).

Metricsedit

This input exposes metrics under the HTTP monitoring endpoint. These metrics are exposed under the /inputs path. They can be used to observe the activity of the input.

Metric Description

sync_total

The total number of full synchronizations.

sync_error

The number of full synchronizations that failed due to an error.

sync_processing_time

Histogram of the elapsed full synchronizations times in nanoseconds (time of API contact to items sent to output).

update_total

The total number of incremental updates.

update_error

The number of incremental updates that failed due to an error.

update_processing_time

Histogram of the elapsed incremental updates times in nanoseconds (time of API contact to items sent to output).

This input is experimental and is under active developement. Configuration options and behaviors may change without warning. Use with caution and do not use in production environments.