03 February 2016 Engineering

​Securing fields and documents with Shield

By Martijn van Groningen

Securing data with Shield was already possible at the index level by defining privileges for indices and aliases via Shield's role based access control. Since Shield 2.0 data can be secured at an even lower level, up to the field and document levels inside an index, let's set up field level security for an imaginary ticketing platform. Assume the following:

{
  "subject" : "Missing emails",
  "message" : "Last week when...",
  "severity" : "low",
  "time_spent_in_minutes" : 5,
  "escalated" : false,
  "private_notes" : ["This is likely caused by a bug"]
}

There are two kinds of users using this data, the customers creating tickets and the support engineers interacting via the ticket system with the user and eventually resolving the ticket. Both have access to the same tickets, but customers shouldn't have to see all properties of a ticket. The ‘time_spent_in_minutes', ‘escalated' and ‘private_notes' are properties of the ticket data that are private to the only support engineers and these fields can be made hidden and inaccessible by enabling field level security for the ‘customer' role:

customer:
  indices:
    'ticket_index':
      privileges: read
      fields:
        - subject
        - message
        - severity

As can been seen in the example ‘roles.yaml' file is that enabling field level security is as easy as defining a list of fields that are accessible for a role. Configuring document level security is similar, only then a query needs to be defined that includes documents that are accessible. More details about how to configure field and document level security can be found in the Shield documentation.

The need for lower level access control

There are many other use cases can benefit from controlling access to the data on the field and document level. For example when data is shared across many organizations or departments within an organization, but not all parties involved are allowed to see all properties of the data. Before Shield 2.0 data would have to be duplicated. This would mean that each department in an organization would have its own index and then in Shield each department role would only allow access to their own index. With field and document level security duplicating data is no longer needed. All departments will share the same index and each department role will have a list of allowed fields and optionally a query that dictate the visible fields or documents.

When field and document level security is enabled it is applied for all Elasticsearch read APIs, in a secure manner, so how could this implemented?

Filter response approach

A naive approach would be to filter responses. Each api that returns data would need to check if keys or values inside the response are allowed to be returned. This might work out for filtering fields or documents that aren't visible in the search and get APIs, but doesn't prevent someone from running a query or an aggregation of a field that isn't visible. For example the total hit key in the search response would then still indicate that there is more than is visible.

So in order for field and document access control to work correctly, the request needs to be filtered too, which means that queries and aggregations on not allowed fields need to be removed. By removing disallowed queries and aggregations this means the request needs to rewritten before execution and this is harder than it looks, especially if a search request contains many compound queries and complex aggregations. Also with what query should a disallowed query be replaced with? And if this modified request is ran through the explain or profile api how would that look? Also how would field or document level access control be implemented in other read APIs?

There are many APIs in Elasticsearch and not all APIs are structured in the same way. On the ES side this would require to do access control checking in many different places. Data security leaks should be avoided at all cost and with this approach there is a high chance that there is an accidental data leak because of a mistake during development now or in the future. If instead access control is only applied in one place than the chances for mistakes are much slimmer.

This clearly shows that filtering on out keys and values in the response is applying field and document level access control on the wrong level. A better approach would be to apply field and document level access control at the Lucene level. In fact this is how Shield implements field and document level access control.

Securing data with Lucene

Each Elasticsearch index has one or more shards and each shard is a Lucene index. In Lucene the inverted index, stored fields, doc values and term vectors are independent data structures accessible and separated on a per field basis. Applying field level access control is required deep understanding of how Lucene works, and making sure they are not exposed.

This means queries and aggregations on disallowed field are skipped, because these queries and aggregations think that the required field doesn't exist. For the end user there is no difference in querying a field that doesn't exist or a field that he or she isn't allowed to see, because the end result is the same, no results. This makes it very convenient to apply field level access control at the Lucene level instead of filtering queries / aggregations the request level and keys and values at the response level, which doesn't provide the same level of security.

Implementing access control on the Lucene level has another benefit and that is that the logic is applied once for all Elasticsearch APIs the data is exposed. The same logic would be triggered if search request with a query and an aggregation is executed, when the get api is executed to fetch a particular document, when term vectors are requested for a particular field via the term vector api or when field stats are requested via the field stats api.

But wait, what about document level access control? Also the Lucene level is the right place to apply document level access control. A similar low level solution can be applied on Lucene level, where effectively, we can “hide” documents from the rest of ES infrastructure making them inaccessible regardless of how they are called.

Secure and share

Since we released Field and Document level security in Shield, it has been widely adopted by our users as it allowed our users to share data between different types of users at a level that wasn't possible before. We are very excited about the opportunities it opens up for our users, on Elasticsearch level, and Kibana.

Demo

If you want to see Shield’s field level security in action then follow the following demo to see how fields are secured.

Step 1
Download the latest Elasticsearch version, extract in a convenient directory and make this directory your current directory in the console.

Step 2
Install the License and Shield plugins by running the following commands:

bin/plugin install license
bin/plugin install shield

Step 3
Add a support engineer user that has the builtin admin role.

bin/shield/esusers useradd support_engineer1 -r admin -p changeme

The -r option assigns the user the admin role, which is a predefined role. The -p option sets the password of the user to changeme.

Step 4
Add the following setting to the elasticsearch.yaml file which is located in the conf directory:

shield.dls_fls.enabled: true

Step 5
Start Elasticsearch

bin/elasticsearch

Step 6
Add a sample document:

curl -XPUT "http://support_engineer1:changeme@localhost:9200/ticket_index/ticket/1" -d'
{
  "subject" : "Missing emails",
  "message" : "Last week when...",
  "severity" : "low",
  "time_spent_in_minutes" : 5,
  "escalated" : false,
  "private_notes" : ["This is likely caused by a bug"]
}'

Step 7
Run a sample query as user support_engineer1:

curl -XGET "http://support_engineer1:changeme@localhost:9200/ticket_index/_search?pretty" -d'
{
  "query": {
    "match": {
      "severity": "low"
    }
  }
}'

All fields are visible as can be seen in this response:

{
  "took": 98,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.30685282,
    "hits": [
      {
        "_index": "ticket_index",
        "_type": "ticket",
        "_id": "1",
        "_score": 0.30685282,
        "_source": {
          "subject": "Missing emails",
          "message": "Last week when...",
          "severity": "low",
          "time_spent_in_minutes": 5,
          "escalated": false,
          "private_notes": [
            "This is likely caused by a bug"
          ]
        }
      }
    ]
  }
}

Step 8
Add the customer role by adding the following yaml snippet with your favourite editor to the roles.yaml file in the Elasticsearch config directory:

customer:
  indices:
    'ticket_index':
      privileges: read
      fields:
        - subject
        - message
        - severity

Step 9
Add a customer user:

bin/shield/esusers useradd customer1 -r customer -p changeme

Step 10
Rerun the same sample query, but now as the customer1 user:

curl -XGET "http://customer1:changeme@localhost:9200/ticket_index/_search?pretty" -d'
{
  "query": {
    "match": {
      "severity": "low"
    }
  }
}'

The following response will be returned:

{
  "took": 98,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.30685282,
    "hits": [
      {
        "_index": "ticket_index",
        "_type": "ticket",
        "_id": "1",
        "_score": 0.30685282,
        "_source": {
          "subject": "Missing emails",
          "message": "Last week when...",
          "severity": "low"
        }
      }
    ]
  }
}

As can been seen field level security is active as only the allowed fields are returned. You can also try to query or aggregate on the other fields as customer1 and as you will find out no results will be returned and if you switch back to support_engineer1 all fields are then visible and accessible.