25 July 2017 Engineering

Strict Content-Type Checking for Elasticsearch REST Requests

By Tim Vernum

The Elasticsearch engineering team is busy working on features for Elasticsearch 6.0. One of the changes that is coming in Elasticsearch 6.0 is strict content-type checking.

What’s changing?

Starting from Elasticsearch 6.0, all REST requests that include a body must also provide the correct content-type for that body.

In earlier releases of Elasticsearch, the content-type was optional, and if it was missing or not recognised, then the server would sniff the content and make an educated guess regarding the content-type. That will no longer happen in Elasticsearch 6.0 - every incoming request needs to have the correct content-type for the body it contains.

This ability to enforce strict content-type checking has existed since Elasticsearch 5.3 via the http.content_type.required configuration setting. In 5.x it is optional, and defaults to false, in Elasticsearch 6.0, that setting defaults to true, and there is no way to disable it.

Why are we changing this?

We know that the content-type sniffing has been quite convenient when using basic HTTP tools such as curl. Many of us are quite accustomed to searching a cluster by running something like this:

curl 'http://localhost:9200/_search' -d'
{
    "query" : {
        "match_all" : {}
    }
}'

But, we need to make that sort of operation slightly more verbose, and include the content-type, in the interests of clarity and security.

Clarity

As Elasticsearch has evolved we’ve made a conscious decision to favour reliability and predictability over leniency. And while being lenient with content-types has been convenient, it also produced some surprising results.

For example, if you tried to send plain text content to an API that didn’t support it, then you would usually receive a clear error like this:

Content-Type header [text/plain] is not supported

But under the covers Elasticsearch was doing its best to try and guess what you might have meant. So, if your body started with “{” then it would guess that your content was actually JSON, but when it tried to parse that, it would fail and the error message would look more like:

Unexpected character ('a' (code 97)): was expecting double-quote to start field name

And, while most of our APIs support YAML formatted requests, the content-type sniffing required that the body start with a start-of-document marker (“---”), which is not what users expected.

When it comes to content-type, we’ve come to the conclusion that “Say what you mean” provides a more reliable and predictable outcome, than guessing. Being explicit is the safer, clearer and more consistent approach.

Security

Strict checking of content-type is also useful as a layer of protection against Cross Site Request Forgery attacks.

Because the Elasticsearch REST API uses simple HTTP requests, what’s easy to do with curl, is often easy to do with your web browser. If your internal network allows it, you can point your favourite browser at the /_cluster/settings endpoint on one of your Elasticsearch nodes and see the settings for your cluster.

Unfortunately, if an attacker has the right knowledge about your internal network and Elasticsearch cluster, they can craft a malicious webpage that would use that same technique to perform unwanted updates to your cluster. Web browsers implement a number of security policies that help protect from such attacks, and part of that protection is to place limits on the content-types that may be used when sending data to remote servers.

As an example, consider this very simple web page:

<html>
  <body>
    <script src="https://code.jquery.com/jquery-3.2.1.min.js"
            type="text/javascript"></script>
    <script type="text/javascript">
      $(function() {
        $.ajax({
          url: "http://localhost:9200/visitors/doc/",
          type:'POST',
          data: JSON.stringify({ browser: navigator.userAgent,
                                 date: new Date() }),
          contentType: 'text/plain'
        });
      });
    </script>
  </body>
</html>

If you run an out-of-the-box install of Elasticsearch 5 on your local computer, and then open up that page in your web browser, it will add a new document in your Elasticsearch cluster that stores your browser’s User Agent and the time that you loaded the page. That’s not so scary, but with minimal changes we could make that same page overwrite or delete existing documents.

If you try run that code in your browser, you will find that there an error message is displayed in the development console. Here’s what it looks like in Google Chrome:

XMLHttpRequest cannot load http://localhost:9200/visitors/doc/. No 'Access-Control-Allow-Origin' header is present on the requested resource.

That error is due to the Same Origin Policy 1 that is one of the security features of the web. By default, a web page loaded from one site may only access content from that same site. This policy prevented our sample web page from reading the JSON that was sent as a response when it stored the document in Elasticsearch.

But why does the browser even allow us to send data to the Elasticsearch server if we’re not allowed to read the result? The answer lies in a second browser feature called Cross Origin Resource Sharing (CORS) 2. While the Same Origin Policy acts as a very useful default to securing the web, there are also many times where it is helpful for two otherwise independent sites to be able to share resources. CORS defines a mechanism by which a site can optionally grant other sites access to its resources.

Due to the history of the web, and the way it has evolved over time, CORS assumes that some types of requests can always be sent safely. For example, web browsers have always allowed cross-origin form submission - a form on my web page can be configured to send its data to your server. If the browser determines that a cross-origin request meets certain requirements, then it will declare it to be safe and will send that request off to the third-party server. It is only when the response comes back from that server, that the browser checks to see whether the original web page is allowed to access the provided content.

In our example above, the request sets the content-type to be text/plain which browsers treat as a safe value 3, so the requests is sent off to the Elasticsearch server. When Elasticsearch responds, the browser looks for special CORS-related headers, so that it can decide whether the calling script is allowed to process the content of the response. By default, an Elasticsearch server does not include any of those CORS headers in the response, so the cross-origin request fails, and our web-page is prevented from seeing the results of the POST. But by then the damage has already been done - the request was sent to the Elasticsearch cluster and the document has been stored.

The strict content-type checking in Elasticsearch 6.0 helps prevent that damage. The Index API that is being used in this example does not support a content-type of text/plain, so Elasticsearch 6.0 will reject the request without performing any updates.

We might attempt to work around those content-type checks by changing our test page to send a valid Content-Type such as JSON.

<html>
  <body>
    <script src="https://code.jquery.com/jquery-3.2.1.min.js"
            type="text/javascript"></script>
    <script type="text/javascript">
      $(function() {
        $.ajax({
          url: "http://localhost:9200/visitors/doc/",
          type:'POST',
          data: JSON.stringify({ browser: navigator.userAgent,
                                 date: new Date() }),
          contentType: 'application/json'
        });
      });
    </script>
  </body>
</html>

However, the CORS security policy does not treat application/json as a safe content-type, so the browser performs what is known as a preflight request. That request is sent to the same URL on the Elasticsearch server, but the HTTP method is set to OPTIONS and no data is sent in the request body. Once again the web browser looks for the special CORS response headers, and since Elasticsearch doesn’t send them, the cross-origin request is refused and the POST body is never sent to the Elasticsearch server.

By enforcing strict content-type checks in Elasticsearch 6.0, we reduce the risk of Cross Site Request Forgery attacks and help protect against accidental or malicious destruction of data.

What do I need to do?

For most users there’s nothing you need to do - everything has been taken care of.

All the components of the Elastic Stack, as well as our official REST clients will send the correct content-type for each request - just make sure that you’re on a recent version. If you are using a third-party client, or one that you built yourself, then you’ll need to check whether that client sends a valid content-type for each request.

If you regularly use curl or another command line tool to send data into Elasticsearch, you’ll need to add the Content-Type header to any request that contains a body. For curl, that means adding -H'Content-Type: application/json' to the command line of any request that has a JSON body 4.

If you’re interested in testing your application or client library, we’ve released 6.0.0-alpha2, and it includes content-type checking. Alternatively, you can turn on strict content-type checking in recent versions of Elasticsearch 5 (see below).

What about Elasticsearch 5.x?

I mentioned earlier that you can enable strict content-type checking in recent releases of Elasticsearch 5 by enabling the http.content_type.required configuration option. Given the security reasons mentioned above, you should consider whether that is something that would be of value to you right now.

If you’re deploying a brand new Elasticsearch cluster, it’s probably a good idea to require strict content-types from the start. It will be one less thing to worry about when you do upgrade to 6.x, and it gives you an added layer of protection against Cross Site Request Forgery attacks.

If you have an existing Elasticsearch installation, then turning on that setting may be a little trickier - you need to know that all of your clients are sending the correct content-type. But if you can tackle that problem now that will get you one step closer to being able to migrate to Elasticsearch 6 when it is officially available.

Conclusion

This is not a decision that we made lightly. We had a lot of conversation about it, and considered various options. We recognise that the old content sniffing approach was convenient, but we feel strongly that this change is a necessary one to help provide stable, reliable and predictable features in Elasticsearch.


1. https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy

2. https://developer.mozilla.org/en-US/docs/Web/HTTP/Access_control_CORS

3. https://fetch.spec.whatwg.org/#cors-safelisted-request-header

4. If you’re copying examples from our documentation, you’ll find that the the COPY AS CURL button automatically includes this option.