Tech Topics

Document-Level Attribute-Based Access Control in Elasticsearch

Editor's Note: This post refers to X-Pack. Starting with the 6.3 release, the X-Pack code is now open and fully integrated as features into the Elastic Stack. The guidance and examples below still work with the current version.

Thanks to a new feature in Lucene 7.1, the CoveringQuery, and the exposure of that feature in the new terms_set query released with Elasticsearch 6.1, it is possible to setup an attribute-based access control (ABAC) scheme for documents stored in Elasticsearch. This works by leveraging the templated role query mechanism for document level security within the X-Pack security role-based access control (RBAC) feature.

Background

There is a long, gnarly, and branching history of trying to describe and implement a complete and coherent scheme to control access to things. Cryptography to protect access to information, physical security to protect access to spaces, forts and walls to protect access to lands, etcetera, etcetera, and so forth. Standards, best practices, and trade secrets exist around all of them, as does on-going research: perfect security is yet to be found.

Humans are generally the weakest link...

Along with authentication — do you have a key to get in the door? — is authorization: what can you do once you get inside?

RBAC

In computer security, authorization, i.e. access control, comes in a number of flavors. Role-based access control (RBAC) is one such scheme and is the primary one used by X-Pack. RBAC is characterized by the collection of privileges (i.e. the specific list of things you can do) into roles such that the only way for a user to obtain a privilege is to be assigned membership in one or more roles. This method makes sense in a traditional hierarchical environment where there are clear lines of authority and responsibility around a relatively few number of different job types. A simple example is with employee records: the HR director role can view and edit all employee records, the manager role can view employee records for those they manage, and employees can only view their records.

RBAC is good, but has some limitations:

  • The number of roles would balloon as an organization grew and as the number of kinds of data grew, making their management unwieldy
  • Keeping roles mutually exclusive and collectively exhaustive (MECE) is hard, meaning it'd be possible to grant someone a number of contradictory roles that could result in a leak of data or other unintended behaviors
  • Roles are meant to be generic and applicable to many people: they don't take user-specific information into account

An example of the last point is with health records: to view a person's private health information, a number of conditions must be met to include a recent HIPAA training certificate. Since everyone can take the training on different dates, no single role can take into account a person's training status. Training status is an attribute of the user.

ABAC

Attribute-based access control depends on attributes assigned to users, things, and actions, and a policy to make decisions based on them. For a user, attributes could include projects they work on, team memberships, certifications, years of service, and physical location. For a thing (i.e. resource), attributes could be sensitivity level, PII status, time-to-live (TTL), or physical location.

A real world control policy that is easier to model in ABAC is printing information in a secured environment: you can only print (action) from a specific printer (resource) if you are allowed to print things (action attribute + user attribute), that printer is near your workspace (resource attribute + user attribute), and your security training is up to date (contextual information: current date + user attribute). In RBAC, you'd need a printing role, a role for each printer (how many thousands in a big org?), need to update printing role membership everyday as users slipped out of training compliance, and update the membership in the printers' roles each day as people joined, left, and moved throughout the organization.

ABAC in Elasticsearch

terms_set

Why couldn't this be done before? The primary reason had to do with how lists of values — lists being a very common type of ABAC attribute — were handled. With its roots deep in information retrieval, Lucene was tuned to be greedy in finding things. A list of values for a single field was used as a logical OR; there was no logical AND. To be clear, I'm not referring to analyzed fields where you would use full-text search, but rather structured fields like int and keyword.

For example:

PUT my_index
{
    "mappings" : {
        "doc" : {
            "properties" : {
                "body" : { "type" : "text" },
                "security_attributes":{"type": "keyword"}
            }
        }
    }
}
PUT my_index/doc/1
{ 
    "security_attributes": ["living", "in a van", "down by the river"],
    "body": "you're not going to amount to jack squat"
}
PUT my_index/doc/2
{
    "security_attributes": ["living", "in a house", "down by the river"],
    "body": "keep calm, carry on"
}
GET my_index/_search
{
    "query": {
        "terms": {
            "security_attributes": ["living", "in a van", "down by the river"]
        }
    }
}

…would return both documents.

With terms_set, you can now enforce the existence of all three attributes. Given the two docs we created in the previous example, the following would only return the first document:

GET my_index/_search
{
    "query": { 
        "terms_set": {
            "security_attributes": {
                "terms": ["living", "in a van", "down by the river"],
                "minimum_should_match_script": {
                  "source": "params.num_terms"
                }
            }            
        }
    }
}

NOTE: While I used minimum_should_match_script in the above example, it isn't a very efficient pattern. The alternative minimum_should_match_field is the better approach, but using it in the example would have meant a couple of more PUTs to add the necessary field to the documents, so I went with brevity.

terms_set + templated role query

When defining a role using X-Pack security features, you can optionally specify a query template that will apply to every query made by users with that role. It is a document level security control that restricts access to documents in search queries as well as aggregations. The template can make use of user attributes via a Mustache template. Yes, it's templates all the way down. By combining user attributes with role query templates, it's possible to create ABAC logic on top of X-Pack's RBAC scheme. Injecting user attributes into role queries via a template has always been possible, but most security policies needed this "list ANDed" logic.

Let's expand our example. We'll keep the same two documents and add two users and a role:

PUT _xpack/security/role/my_policy
{ 
    "indices": [{
        "names": ["my_index"],
        "privileges": ["read"],
        "query": {
            "template": {
                "source": "{\"bool\": {\"filter\": [{\"terms_set\": {\"security_attributes\": {\"terms\": {{#toJson}}_user.metadata.security_attributes{{/toJson}},\"minimum_should_match_script\":{\"source\":\"params.num_terms\"}}}}]}}"
            }
        }
    }]
}
PUT _xpack/security/user/matt_foley
{
    "username": "matt_foley",
    "password":"testtest",
    "roles": ["my_policy"],
    "full_name": "Matt Foley",
    "email": "mf@rivervan.com",
    "metadata": {
        "security_attributes": ["living", "in a van", "down by the river"]
    }
}
PUT _xpack/security/user/jack_black
{
    "username": "jack_black",
    "password":"testtest",
    "roles": ["my_policy"],
    "full_name": "Jack Black"
    "email": "jb@tenaciousd.com",
    "metadata": {
        "security_attributes": ["living", "in a house", "down by the river"]
    }
}

…Yes, decoding that role template query is like seeing the matrix (see this issue for some idea as to why it is this way and a proposal to make it go away), but it's essentially the same as the term_set query from above. The only difference is the use of the {{_user.metadata.security_attributes}} Mustache template in place of the hard coded attribute list. To be clear, by adding in those security attributes from the user metadata, we've made this role apply user-specific attributes to each query a user with that role makes: an attribute-based access control query.

If Matt Foley were to login and run a query, the only doc he'd be able to see is document 1. He wouldn't see document 2 because he only has two of the three security attributes and the terms_set filter in the role query template says the minimum number that must match is all of them (params.num_terms is equal to the number of terms in the list, 3 in this case). Similarly, Jack Black would only be able to see document 2.

But couldn't I already do ANDs with list values using bool?

True! There has long been a way to do list ANDs by splitting each list item into its own must clause of a bool query. With X-Pack, the main problem was templating that query: how can you write a single query template that includes the right number of must clauses for each of the documents? Document 1 might have three required attributes and document 2 might have four. But what about building your own ABAC logic on top of open source Elasticsearch that generated the right queries for each user and document? The problem there is that user attributes can be both a subset and a superset of document attributes. In the cases where it's a subset, all is well. But in cases where it's a superset, doing a naive multi-must bool query — one for each user attribute — would result in no documents back. In the above example, imagine a user has ["I am 35", "living", "in a van", "down by the river"] as attributes: a superset of document attributes. If I do a must for each one, no documents would come back. But, access control policy is almost always an "at least these attributes" vs "exactly this list and nothing more". To make this work, we'd need to split out each possible attribute list value into its own attribute, removing lists entirely. The logic then gets complicated as you have to do a bunch of existence checks to get around the same superset problem; the resulting combination of bool, must, should, and exist clauses is something to be feared. You can see an example of this on my colleague Dave Erickson's blog.

One last example

Let's see it all come together with something slightly more complicated, combining three kinds of logic. There's a security level to ensure a user has a level greater than or equal to that of the document, a program list to check if the user has access to the necessary programs, and a date to determine if they have taken the mandatory certification training within that past year.

NOTE: The date comparison is done via an embedded script, which is _not_ the most efficient solution (and it uses LocalDateTime vs ZonedDateTime), but I think illustrates the point. 

NOTE 2: Considering that the documents themselves contain the security "policy", care should be given to permissions to update those documents. My recommendation would be to secure the security fields using Field Level Security...5 of the last 8 words are "field" or "security" - not quite as good as buffalo buffalo

Checkout this gist for a bash script version.

PUT my_index
{
    "mappings": {
        "doc": {
            "properties": {
                "security_attributes": {
                    "properties": {
                        "level": {"type":"short"},
                        "programs": {"type":"keyword"},
                        "min_programs": {"type":"short"}
                    }
                },
                "body":{"type":"text"}
            }
        }
    }
}
PUT my_index/doc/1
{
    "security_attributes": {
        "level": 2,
        "programs": ["alpha", "beta"],
        "min_programs": 2
    },
    "body": "This document contains information that should only be visible to those at level 2 or higher, with access to both the alpha and beta programs"
}
PUT my_index/doc/2
{
    "security_attributes": {
        "level": 2,
        "programs": ["alpha", "beta", "charlie"],
        "min_programs": 3
    },
    "body": "This document contains information that should only be visible to those at level 2 or higher, with access to the alpha, beta, and charlie programs"
}
PUT my_index/doc/3
{
    "security_attributes": {
        "level": 3,
        "programs": ["charlie"],
        "min_programs": 1
    },
    "body": "This document contains information that should only be visible to those at level 3 or higher, with access to the charlie program"
}
PUT _xpack/security/role/my_policy
{    
    "indices": [
    {
        "names": ["abac-test"],
        "privileges": ["read"],
        "query": {
            "template": {
                "source": "{\"bool\": {\"filter\": [{\"range\": {\"security_attributes.level\": {\"lte\": \"{{_user.metadata.level}}\"}}},{\"terms_set\": {\"security_attributes.programs\": {\"terms\": {{#toJson}}_user.metadata.programs{{/toJson}},\"minimum_should_match_field\": \"security_attributes.min_programs\"}}}, {\"script\": {\"script\": {\"inline\": \"!LocalDateTime.ofInstant(Calendar.getInstance().toInstant(), ZoneId.systemDefault()).isAfter(LocalDateTime.parse('{{_user.metadata.certification_date}}').plusYears(1))\"}}}]}}"
            } 
        }
    }]
}
PUT _xpack/security/user/jack_black
{
    "username": "jack_black",
    "password": "testtest",
    "roles": ["my_policy"],
    "full_name": "Jack Black",
    "email": "jb@tenaciousd.com",
    "metadata": {
        "programs": ["alpha", "beta"],
        "level": 2,
        "certification_date": "2018-01-02T00:00:00"
    }
}
PUT _xpack/security/user/barry_white
{
    "username": "barry_white",
    "password": "testtest",
    "roles": ["my_policy"],
    "full_name": "Barry White",
    "email": "bw@cantgetenough.com",
    "metadata": {
        "programs": ["alpha", "beta", "charlie"],
        "level": 2,
        "certification_date": "2018-01-02T00:00:00"
    }
}
PUT _xpack/security/user/earl_grey
{
    "username": "earl_grey",
    "password": "testtest",
    "roles": ["my_policy"],
    "full_name": "Earl Grey",
    "email": "eg@hot.com",
    "metadata": {
        "programs": ["charlie"],
        "level": 3,
        "certification_date": "2018-01-02T00:00:00"
    }
}
PUT _xpack/security/user/james_brown
{
    "username": "james_brown",
    "password": "testtest",
    "roles": ["my_policy"],
    "full_name": "James Brown",
    "email": "jb2@newbag.com",
    "metadata": {
        "programs": ["alpha", "beta", "charlie"],
        "level": 5,
        "certification_date": "2017-01-02T00:00:00"
    }
}

Expected results:

curl -u jack_black:testtest http://localhost:9200/my_index/_search
hits: 1
ids: [1]
curl -u barry_white:testtest http://localhost:9200/my_index/_search
hits: 2
ids: [1, 2]
curl -u earl_grey:testtest http://localhost:9200/my_index/_search
hits: 1
ids: [3]
curl -u james_brown:testtest http://localhost:9200/my_index/_search
hits: 0

Do you know why James Brown doesn't get any documents back? You can see the answer here.

Have fun with this new-found power, and as always, please let us know how it goes!