Tech Topics

All about Scripting

With the release of 1.3 and moving forward, we are making some big changes to the scripting infrastructure in Elasticsearch. In this blog post we’ll cover the major changes that we’re making, some of the upcoming changes to scripting, and some of the new ways to work with scripting in ES!

Dynamic scripting disabled

One of the first (and most noticable) changes that has impacted scripting is in the Elasticsearch 1.2 release. We recently released a blog post titled “Scripting and Security” about the security implications of this. Please take a look for more information about how dynamic scripting was changed after the 1.2 release.

Groovy and the sandbox

Starting with version 1.3, we have decided to add a sandboxed version of Groovy to the Elasticsearch scripting languages, with plans to transition all scripts from MVEL to Groovy in the long run. The reason for this is threefold:

  1. Groovy has shown to perform better than MVEL in our scripting tests, especially in some cases such as running loops.
  2. Groovy has a more active development pace than MVEL, supporting Java 8 fully as well as taking advantage of newer JVM features.
  3. Groovy has an easier route for adding sandboxing to the language.

The last point leads back to dynamic scripting. In 1.2.0 we disabled dynamic scripting for non-sandboxed languages. However, since Groovy can be sandboxed, we can still allow dynamic scripts to be sent with each request.

What do we mean by sandboxing? First sandboxing does NOT address or prevent DOS (Denial Of Service) attack scripts, it is only intended to prevent scripts from accessing parts of the operating system or internals of Elasticsearch they are not intended to access. A malicious script can still be run with an infinite loop, exhausting system resources by sending it many times. If you would like to disable the sandbox (thus causing scripts sent dynamically as strings with requests to be denied), you can disable it by adding script.groovy.sandbox.enabled: false to your configuration in Elasticsearch 1.3 or later.

Be sure to check out the different sandboxing configuration parameters in the scripting documentation, as well as more information about disabling the sandbox, or enabling dynamic scripting for Elasticsearch overall.

In the 1.3 release of Elasticsearch, MVEL is still the default language. We are planning to transition away from MVEL to Groovy as the default language for the 1.4 release. You will be able to transition your scripts from MVEL to Groovy by specifying "lang": "groovy" in the scripts, or changing the default scripting language for all scripts to Groovy by adding script.default_lang: groovy to elasticsearch.yml. You can then transition each MVEL script individually to Groovy. When upgrading to Elasticsearch 1.4, MVEL will be removed entirely as a scripting language. If you want to continue using MVEL as a scripting language you will need to install the elasticsearch-lang-mvel plugin. While testing, however, we found that the Groovy and MVEL languages were so similar that MVEL scripts needed very minimal changes, if any, to work with Groovy.

Why not Javascript?

One question that we anticipate getting is why not use Javascript for the scripting language? After all, Javascript is becoming a very popular language. There are a few reasons we decided to go with Groovy instead of Javascript. First, Groovy was faster than Javascript (using Rhino) in our tests, and Nashorn has poor support for concurrent execution of scripts. Additionally, the syntactic difference between Groovy and Javascript is very small for simple scripts, so there should be little difficulty understanding scripts.

With the release of 1.3 however, we do have an additional scripting language available for use: Lucene Expressions.

Lucene Expressions

Lucene Expressions provide a mechanism for dynamically evaluating a single Javascript numeric statement, per document. Its primary purpose is to provide easy scoring adjustment, without writing custom Java code, but the framework allows execution with any per document use case. Each expression is compiled to Java bytecode, to achieve “native code”-like performance.

Integrating expressions as a new scripting language was an easy fit. The new “expression” lang for scripts can be used for virtually all current uses of query scripts in ES: script_score, script_fields, sort scripts and numeric aggregation scripts. And did we mention they are fast? Initial benchmarks show speeds many times faster than Groovy scripts, and even slightly faster than native scripts!

As with any feature, gaining great performance comes with a cost. In this case it is the limitations that expression scripts impose:

  • No loops! These are single statements in javascript (i.e. an “expression”). No local variables or anything else, just the right-hand-side of an assignment.
  • Only single valued, numeric fields are accessible
  • No stored fields access: this is slow!

You can read more about expressions (and what functions and operators are available) in the Lucene documentation, and how to use them in Elasticsearch in the scripting documentation.

An alternative to scripting

Before signing off, we would like to talk a little bit about another scripting alternative for a commonly used case. Sometimes you may want to influence the score of a document based on a field inside the document. Think the popularity of a restaurant, or the star-rating of a hotel. One way to do this is to use a script as was shown in the first example. However, there is an easier and faster way of influencing the score based on a document’s field’s value – the field_value_factor function in the function_score query.

Whereas before you would specify:

POST /imdb/_search?pretty
{
  "query": {
    "function_score": {
      "query": {"match_all": {}},
      "functions": [
        {
          "script_score": {
            "script": "log(n * doc['popularity'].value)",
            "params": {
              "n": 1.5
            }
          }
        }
      ]
    }
  }
}

Instead, you can now use:

POST /imdb/_search?pretty
{
  "query": {
    "function_score": {
      "query": {"match_all": {}},
      "functions": [
        {
          "field_value_factor": {
            "field": "popularity",
            "factor": 1.5,
            "modifier": "log"
          }
        }
      ]
    }
  }
}

This not only requires no scripting, but is faster because the execution can use native paths instead of compiling and executing an MVEL or Groovy script. Keep the field_value_factor function in mind as a faster alternative to scripting!