How to consume audit logs from Elastic Cloud Enterprise

illustration-generic-cloud-update-1680x980.png

In many organisations, the Elastic Stack is run on the Elastic Cloud Enterprise (ECE) platform. This platform enables teams to create, manage, and update Elastic Stack without any prior knowledge.

Since this is a powerful tool, organisations want to be informed about changes in the platform.  Many users might access this web user interface (UI) and create deployments, upgrade them, and add or remove specific settings. All the data is inside the built-in logging and metrics clusters; thus, we can leverage it and use Kibana rules and alerting to create alerts for specific events.

Below, I’ll cover how to use the audit functionality of ECE to be alerted when a critical event happens, as well as review topics and questions that often come up in the audit process.

[Related article: 4 ways the Elastic Platform in the cloud multiplies the benefits of search-powered data analytics solutions]

An API-first approach

Luckily ECE is built upon an API-first approach. Thus every click we do in a UI is backed by an API (application programming interface). Those API calls go against a proxy, and this proxy logs every request in the internal logging and metrics cluster and stores them inside the Kibana data view called services-logs.

Generally speaking, our API is always designed to be prefixed by /api/v1.

Who logged in and when?

Any login goes against the /auth/_login URL using a POST request. A sample document looks like this:

{
   "request_id": "8f01392d1ab029f5340eeac60c965e88",
   "method": "POST",
   "path": "https://34.140.124.31:12443/api/v1/users/auth/_login",
   "headers": [...],
   "payload": {
       "username": "philipp",
       "password": "-"
   },
   "return_code": 200
}

It contains the URL in the path field inside the payload, and we know the username. Don’t worry — the password is permanently excluded. At the bottom of the message, we have a return_code. If this is 200, we can be sure the login worked. If the return_code is missing, the API did not answer, which means a login was tried but was unsuccessful.

Based on the document above, we can create a simple alerting rule to get notified if someone tries to login with philipp and fails at least three times within a certain time.

A deployment was added, altered, and deleted

This section covers everything that is happening to a deployment. Any change — like adding nodes, upgrading to a newer version, or removing nodes — will trigger a plan change, which also goes through an API. The general API for all deployment-related tasks is the following /deployments.

A cluster creation is defined by running a POST against /deployments with a payload that describes a cluster. We won’t go into the payload details, and examples can be found in our documentation. The exciting part is the admin_id and the return_code. This clearly states that philipp created a new deployment. The deployment name is in the payload.name, demo. The return_code is 201 (created) consequently, the API call is successful, and the deployment is created.

{
   "request_id": "f315f3ebb45b6dd0bbdccde6373aca14",
   "method": "POST",
   "path": "https://34.140.124.31:12443/api/v1/deployments?validate_only=false",
   "headers": [...],
   "payload": {
       "name": "demo",
       "resources": {
           "elasticsearch": [...],
           "kibana": [...],
           "apm": [...],
           "enterprise_search": [...]
       },
       "metadata": {
           "system_owned": false
       }
   },
   "organization_id": "fb212ede0d7b4beb8ffdafbecd3630b7",
   "admin_id": "philipp",
   "return_code": 201
}

Now that the deployment is created, changes will inadvertently be made to it. We can identify any change to a target deployment using the same approach as above. Changes have identical endpoint /deployments instead of a POST, and they leverage PUT and contain the deployment id as a URL parameter. Don’t worry, you don’t need to have a lookup table of deployment ids to deployment names, as the payload contains the name.

{
   "request_id": "db854d4a5ca697284d4d144ef8d5ab5a",
   "method": "PUT",
   "path": "https://34.140.124.31:12443/api/v1/deployments/dd88aa1e66694df0bae7fbbf79ee09db?hide_pruned_orphans=false&skip_snapshot=false&validate_only=true",
   "headers": [...],
   "payload": {
       "name": "demo",
       "prune_orphans": true,
       "resources": {
           "elasticsearch": [...],
           "kibana": [...],
           "apm": [...],
           "enterprise_search": [...]
       },
       "metadata": {
           "system_owned": false,
           "hidden": false
       }
   },
   "organization_id": "fb212ede0d7b4beb8ffdafbecd3630b7",
   "admin_id": "philipp",
   "return_code": 200
}

When I delete a deployment, I need the deployment id, the DELETE HTTP verb, and the same /deployments endpoint. Sadly, our API does not store the deployment's name in this call, and you can search using the deployment id to get back the name.

{
   "request_id": "9fc836a6f3d53a96e296d57745b03f25",
   "method": "DELETE",
   "path": "http://34.140.124.31:12443/api/v1/deployments/5c8490871e5a4f74a8ab95552e6ae5fc",
   "headers": [...],
   "organization_id": "fb212ede0d7b4beb8ffdafbecd3630b7",
   "admin_id": "philipp",
   "return_code": 200
}

Using Kibana rules and alerting, it is easy to handcraft Elasticsearch queries or use threshold alerts to be alerted on certain events.

User created, modified, and deleted

If the user API endpoint is /users, and creating a user is similar to deployment, a POST request is used.

{
   "request_id": "f40ce1fc8c46bd37c04bb6ae35757be0",
   "method": "POST",
   "path": "https://34.140.124.31:12443/api/v1/users",
   "headers": [
   ],
   "payload": {
       "user_name": "philipp",
       "security": {
           "roles": [
               "ece_platform_admin"
           ],
           "password": "-",
           "enabled": true
       },
       "full_name": "Philipp Kahr (Elastic)",
       "email": "philipp.kahr@elastic.co"
   },
   "return_code": 200
}

This payload contains the user_name, the associated roles, the full_name, and an email. A common trigger for an alert is when a new ece_platform_admin is assigned to a new or existing user.

Changing an existing user is a PATCH call against the /users/id endpoint, and the id is the user_name. In the example below, the user philipp was modified to be an ece_deployment_manager.

{
   "request_id": "4216f0ca22bffdc7b30ec375a3d759d8",
   "method": "PATCH",
   "path": "https://34.140.124.31:12443/api/v1/users/philipp",
   "headers": [],
   "payload": {
       "user_name": "philipp",
       "full_name": "",
       "email": "",
       "security": {
           "roles": [
               "ece_deployment_manager"
           ],
           "enabled": true
       }
   },
   "return_code": 200
}

Password changes for existing users

In ECE, any admin can change the password of an existing user. This is similar to a user change, as it is a change request against the user's password. Password change is a PATCH call against the /users/id endpoint. The id is the user_name. The crucial difference is that inside the payload.security, a password field exists. The admin_id, as in the other payloads, explains which users initiated the password change.

{
   "request_id": "05a3672a1712b3acde75659734f7e0ba",
   "method": "PATCH",
   "path": "https://34.140.124.31:12443/api/v1/users/philipp",
   "headers": [
   ],
   "payload": {
       "user_name": "philipp",
       "full_name": "",
       "email": "",
       "security": {
           "password": "-",
           "roles": [
               "ece_deployment_manager"
           ],
           "enabled": true
       }
   },
   "organization_id": "fb212ede0d7b4beb8ffdafbecd3630b7",
   "admin_id": "admin",
   "return_code": 200
}

Role mappings

Many organisations incorporate login from ECE with a third-party authentication realm, like Active Directory. How to implement it is documented here. The vital part is that we can map groups to roles. Role mapping alterations are a PUT request against the /regions/ece-region/platform/configuration/security API. If someone maps a widely available group as an ECE admin, this should trigger an alert. In the payload, there is payload.role_mappings object that contains the necessary information.

{
   "request_id": "0c6ab8b1b0f16bf9bfca168be3fb5e51",
   "method": "PUT",
   "path": "https://34.140.124.31:12443/api/v1/regions/ece-region/platform/configuration/security/realms/active-directory/myad",
   "headers": [],
   "payload": {
       "id": "myad",
       "name": "myad",
       "urls": [
           "ldaps://10.11.11.0:233"
       ],
       "domain_name": "myad",
       "bind_anonymously": true,
       "group_search": {
           "base_dn": "cn=users,dc=example,dc=com",
           "scope": "sub_tree"
       },
       "user_search": {
           "base_dn": "cn=users,dc=example,dc=com",
           "scope": "sub_tree"
       },
       "load_balance": {
           "type": "failover"
       },
       "role_mappings": {
           "default_roles": [
               "ece_deployment_viewer"
           ],
           "rules": [
               {
                   "type": "group_dn",
                   "roles": [
                       "ece_platform_admin"
                   ],
                   "value": "cn=myadmins,dc=example,dc=com"
               }
           ]
       },
       "order": 2
   },
   "admin_id": "philipp",
   "return_code": 200
}

Next step: Dashboard

In this blog, we showed how easily one could use the audit functionality of ECE to be alerted when a critical event happens. When combining all those events, we can build a dashboard to watch what is happening in our ECE.