2014年09月16日

Scripting with Elasticsearch

By Njal Karevoll

UPDATE: This article refers to our hosted Elasticsearch offering by an older name, Found. Please note that Found is now known as Elastic Cloud.

Scripting is an important part of the toolbox of any Elasticsearch user, and enables evaluating custom expressions that may be used to synthesize fields or provide customized scoring. In this post we take a brief look at the upcoming changes to the scripting module and the different scripting languages available to use today.

In a recent blog post about scripting, Elasticsearch outlined some coming changes to its scripting support. The two important takeaways from that post is that the sandboxing story of scripts will be improved and that the default scripting language will be changed from MVEL to Groovy.

Let’s take a look at the current official scripting languages.

The Well Known MVEL

MVEL (MVFLEX Expression Language) is a simple, but powerful expression language with an intuitive syntax that most users of Elasticsearch are able to catch on to quickly. It was the default language used by scripts for Elasticsearch up and including the 1.3.x branch, but it was removed in favor of Groovy from 1.4. In 1.4 and later, the elasticsearch-lang-mvel plugin will need to be installed in order to use MVEL as a scripting language.

When using MVEL, it’s worth keeping in mind that it has no sandboxing support at all (which was part of the reason it was replaced as a default), which means any user able to execute scripts on your cluster must be trusted not to have malicious intent.

It’s still the scripting language that the most people using Elasticsearch are familiar with, but that may very well change soon. In either case, changing from MVEL to Groovy and back is a minor setting tweak (and in some cases, a small syntax change).

Getting Groovy

Groovy is a well-known dynamic language for the JVM (the Java Virtual Machine, which Elasticsearch runs on). Unlike MVEL, it’s actually a general purpose language that became known with the release of Grails, which served the same purpose for Rails did for Ruby. Since its release, Groovy has been under constant development, and since version 2.0.0 has started supporting static compilation as well.

The reasons for changing the default to Groovy were performance, development/features as well as easier sandboxing route. With the possibility of static compilation, a large community that continuously improves the language as well as the sandboxing features it comes with, it is certainly not a bad choice for a default.

While Groovy supports sandboxing, we would still advise users to take extra precautions and avoid giving untrusted clients the ability to run arbitrary code, in case the sandbox ends up being possible to break out of. Even if the sandbox holds, it does not prevent simple denial of service attacks such as simple while(true) {} style loops which exhausts resources and prevents Elasticsearch from operating properly.

The Ubiquitous Javascript

No scripting support would be complete without Javascript, and of course, Elasticsearch supports it. While it was not chosen as the default scripting language for 1.4 and forward, it’s possible to use it by installing the official elasticsearch-lang-javascript-plugin. The reasoning behind it not being chosen as the default had to do with the performance of Rhino and the concurrent script execution support of Nashorn. Whether the differences amount to anything that matters significantly is ultimately up to you to decide for your own use case, so if you enjoy using Javascript, it still enjoys official support.

Python

Rounding off with the officially provided scripting language support is Python, which is supported through the Jython project. While development on that particular project has been slow due to missing resources, it’s considered stable and works well, so it may be a good choice for anyone who is most familiar with Python.

Custom Languages

If there’s a programming language missing from Elasticsearch scripting, add it yourself. Any scripting language supporting JSR-223 can be easily added using our Script Module primer as a starting point. Other languages can be added as well, but if there’s no implementation of it available on the JVM, it’s certainly significantly more work doing so.

Lucene Expressions

While Lucene Expressions, documented in the Lucene API is an experimental feature of Elasticsearch, it’s certainly a very performant one.

It’s not really a full-blown scripting language either, but using an advanced feature of Lucene, which lies at the core of Elasticsearch, a Javascript-like expression may be used as a script. The expression is compiled once before it’s used, and if you’re using nested expressions, the results are even cached to avoid recomputation. Under the hood, the expressions are compiled to .class-files containing regular Java bytecode, which is about as fast it gets.

Be aware of the limitations though, such as no support for stored fields and it’s only possible to work with numeric fields.

sum()ing up

Elasticsearch is very flexible when it comes to scripting, allowing its users to use their preferred scripting language to accomplish a wide variety of tasks. In a later article we’ll take a closer look at various applications of scripts: where it makes sense to use it and where scripts should be avoided.