Tech Topics

Running Groovy Scripts without Dynamic Scripting

With the release of Elasticsearch 1.3.8 and 1.4.3, we have, by default, disabled the ability for Groovy scripts to be sent dynamically with a request or an indexed script, but Groovy remains the default scripting language. This blog post is here to show you how to continue to use scripts for unsandboxed languages, albeit a little less dynamically.

This blog post exists to help you to understand what that means and, more importantly, how to continue using scripts to get important tasks done safely.

What is a Dynamic Script?

For anyone not familiar with Elasticsearch, you can submit scripts as a part of many different requests: searches, aggregations, updates, upserts, and deletes by query. In short, you can effectively add scripting behavior to extend beyond the normal behavior for your own use cases.

For example, this is a dynamic script that returns documents that have the same value for field1 and field2 + shift:

GET /_search
{
 
"query":{
   
"filtered":{
     
"filter":{
       
"script":{
         
"script":"doc['field1'].value == (doc['field2'].value + shift)",
         
"lang":"groovy",
         
"params":{
           
"shift":3
         
}
       
}
     
}
   
}
 
}
}

You can change the language, which may naturally require the syntax to change and add limitations (e.g., Lucene Expressions instead of Groovy scripts), by changing the lang parameter to the chosen language.

Why is that Dynamic?

The above is an example of a dynamic script because the actual script portion needs to be interpreted and compiled on the server side dynamically. Dynamic scripts are any script that is sent to the data nodes via an Elasticsearch API, which includes indexed scripts.

In other words, if the script is not stored with all of the data nodes, then it is treated as a dynamic script.

What to Expect with Dynamic Scripting Disabled

Due to the changes in the latest release, dynamic scripting is now disabled by default for Groovy. With that and given the earlier example script, this is what you should expect to see if you submitted that request (reduced for brevity and emphasis):

{
   
"error":"SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[8FJ02MofSnqVvOQ10BXxhQ][test][0]: SearchParseException[[test][0]: from[-1],size[-1]: Parse Failure [Failed to parse source [{...}]]]; nested: ScriptException[dynamic scripting for [groovy] disabled]...",
   
"status":400
}

The key part of the error message is “ScriptException[dynamic scripting for [groovy] disabled]”.

How to Continue to use Scripting

There are three ways to submit a script to Elasticsearch. The two dynamic ways are per-request scripting (shown) or using an indexed script. Using an indexed script works by storing Groovy scripts in Elasticsearch itself, and using them on demand (this works really well, but it still allows untrusted users to add their own scripts given open access to your cluster!). These work in much the same way as a stored procedure works in a relational database: you prewrite your script and invoke it by name later as part of a request.

GET /_search
{
 
"query":{
   
"filtered":{
     
"filter":{
       
"script":{
         
"script_id":"your_custom_script",
         
"lang":"groovy",
         
"params":{
           
"shift":3
         
}
       
}
     
}
   
}
 
}
}

Notice that not much changed, except that the inner script became script_id and its value changed to the name of the prewritten script.

The non-dynamic way to provide scripts to Elasticsearch is to write your script to a file on disk, instead of indexing them, and then store it with the configuration for each script that you plan to use. This is how you can continue to use unsandboxed scripts with dynamic scripting disabled, for any scripting language.

In the first example, the Groovy script was literally doc['field1'].value == doc['field2'].value + shift. You can write that script into your file with a “.groovy” extension:

doc['field1'].value == (doc['field2'].value + shift)

If this file were named your_custom_script.groovy and placed in the config/scripts directory of all of your Elasticsearch data nodes, then Elasticsearch will pick up the script within 60 seconds (configurable by changing watcher.interval in your elasticsearch.yml) and pre-compile it for use with future requests. You do need to ensure that the file is readable by the user running Elasticsearch! After writing this to disk, your configuration directory should look something like this:

config/
  elasticsearch.yml
  logging.yml
  scripts/
    your_custom_script.groovy

This is intentionally not as dynamic as submitting the script with each request or indexing the script, but it does still allow for dynamically adding scripts in a trusted environment.

Using a script written to disk

Your script will not be usable until the script is loaded, at which point you will see something like this in your log file(s):

[2015-02-11 11:14:47,066][INFO ][script                   ] [Sergei Kravinoff] compiling script file [/path/to/elasticsearch-1.4.3/config/scripts/your_custom_script.groovy]

Once your script has been loaded by all of your Elasticsearch data nodes, then you can begin to use it. To use it, you supply the name of your script as the file (not script_id!):

GET /_search
{
 
"query":{
   
"filtered":{
     
"filter":{
       
"script":{
         
"file":"your_custom_script",
         
"lang":"groovy",
         
"params":{
           
"shift":3
         
}
       
}
     
}
   
}
 
}
}

Note: the language is optional because Groovy is the default language. If you use a different scripting language or if you change the default scripting language (e.g., to Lucene Expressions), then the language must be supplied to help it to find the right script. As a best practice, we recommend that applications include the lang parameter. This will insulate you from any future changes in the default scripting language.

Questions?

If you have any questions, then please do not hesitate to reach out to us on Twitter (@elasticsearch). You can report any problems on the GitHub issues page.