30 mai 2014

Extending the Scripts Module

Par Konrad Beiske

UPDATE: This article refers to our hosted Elasticsearch offering by an older name, Found. Please note that Found is now known as Elastic Cloud.

How to Add Another Scripting Language

You can use this post as a starting point to make a language plugin for any scripting engine that implements the JSR-223 interfaces.

Introduction

Currently Elasticsearch supports several different kinds of scripting languages. The core distribution supports MVEL, which is also the default language, but you can add more by installing plugins. For Closure, Groovy, Javascript and Python there are official plugins maintained by Elasticsearch Inc. In Elasticsearch 1.3 the default is planned to be changed to Groovy. In this article we will look closer at what it takes to create a language plugin and to demonstrate we will create a plugin targeting the JSR-223 API.

The JSR-223 API, or the javax.script package, is not language specific, but an API created to be used by all scripting engines implemented for the JVM. The emphasis will be on the extension points of the scripting module and not so much how to write a plugin.

The ScriptEngineService Interface

The ScriptEngineService interface is the key component to implement. The rest is just bindings to register our implementation with the ScriptService.

public interface ScriptEngineService {

    String[] types();

    String[] extensions();

    Object compile(String script);

    ExecutableScript executable(Object compiledScript, @Nullable Map vars);

    SearchScript search(Object compiledScript, SearchLookup lookup, @Nullable Map vars);

    Object execute(Object compiledScript, Map vars);

    Object unwrap(Object value);

    void close();
}

The types() and extensions() methods are where we declare the script types and file extensions supported by our engine. This information is used by the ScriptService to decide which engine to use for a given script, whether it be a script stored in a file or a dynamic one where you have declared a scripting language with the lang field.

The compile() method should compile the given script, but simply returning the string would also work, a sensible choice if the script is for a non compiled language like Bash. What is worth noting about the method signature is that it is an example of the mnemonic pattern, as the actual type of the return value is hidden for the caller. The only thing the caller can assume about the return value is that it may be passed to the executable, searchand execute methods of the engine that made it. Just like the mnemonic pattern where an external service may persist an objects internal state without being able to mutate it, the ScriptingService in Elasticsearch can cache compiled scripts. Hence the method should be implemented as a proper function in the sense that a given input will always produce the same output and the returned objects should be reusable.

The ScriptService Gives You a Lot for Free

A key part of the scripting module is the ScriptService. It is the entry point for clients of the scripting module and acts as a facade that delegates to each ScriptEngineService instance. In addition to deciding which engine to invoke for a given script it also caches compiled scripts for later reuse. Threading however is left to be decided by the caller so make sure your ScriptEngineService is thread safe. This actually makes a lot of sense too when considering that Elasticsearch has dedicated thread pools for searches, indexing and so on.

Once you have an implementation of ScriptEngineService you hook it up to the scripting module by including a method like this in your implementation of AbstractPlugin.

public void onModule(ScriptModule module) {
    module.addScriptEngine(MyScriptEngineServiceImpl.class);

You can find more info on the AbstractPlugin in the writing a plugin article. For this to work you must also remember to annotate the constructor of your implementation with @injectso Guice can do its magic and instantiate it.

The JSR-223 API

Getting started using a JSR-223 engine can be as simple as adding to your class path and looking it up by its extension like this:

ScriptEngineManager manager = new ScriptEngineManager();
ScriptEngine engine = manager.getEngineByExtension("lua")

In the above example I used Luaj which you can get with this Maven dependency:


  org.luaj
  luaj-jse
  3.0-beta2

The javax.script.ScriptEngine interface exposes among other the following methods which are useful to us:

public interface ScriptEngine  {

    public Object eval(String script, Bindings n) throws ScriptException;

    public Bindings createBindings();

}

Using only those two methods we’re able to create a valid implementation of ScriptEngineService, but it would not be very performant for compiled languages. In order to make compilations cacheable we check if the ScriptEngine also implements the javax.script.Compilable interface. This interface lets us transform a string into a CompiledScript that may be executed multiple times with different parameters.

Registering Any Number of Engines

When it comes to registering the plugin, there is one key difference between a plugin adapting JSR-223 and one targeting a specific engine. The normal way of registering the implementation by the class name does not work when we need to register one instance of the class per engine found on the class path. The solution is to use the multibinder.addBinding().toInstance() feature in Guice.

public class JSR223Module extends AbstractModule {

    @Override
    protected void configure() {
        Multibinder multibinder = Multibinder.newSetBinder(binder(), ScriptEngineService.class);
        getClass().getClassLoader();
        ScriptEngineManager manager = new ScriptEngineManager();
        ESLogger logger = Loggers.getLogger(getClass());
        for (ScriptEngineFactory factory : manager.getEngineFactories()) {
            logger.info("Registering JSR223 engine: [{}-{}] for type: [{}] and extensions: [{}]",
                    factory.getEngineName(), factory.getEngineVersion(), factory.getLanguageName(),
                    factory.getExtensions());
            multibinder.addBinding().toInstance(new JSR223Adapter(factory));
        }
    }

}

Hello World!

The complete source is available on Github. To test it we build it with maven package and use

bin/plugin --url file:///path/to/checkout/target/releases/elasticsearch-lang-jsr223-0.1.0-1.1.1-SNAPSHOT.zip --install elasticsearch-lang-jsr223

with the resulting zip file. Once Elasticsearch starts up, take a look at the log output to see which engines are detected. On my computer this was the result:

[2014-05-22 15:02:30,937][INFO ][node                     ] [Hazmat] version[1.1.1], pid[24820], build[f1585f0/2014-05-03T16:01:42Z]
[2014-05-22 15:02:30,938][INFO ][node                     ] [Hazmat] initializing ...
[2014-05-22 15:02:30,964][INFO ][plugins                  ] [Hazmat] loaded [river-twitter, JSR-223-plugin], sites []
ScriptEngineManager providers.next(): javax.script.ScriptEngineFactory: Provider scala.tools.nsc.interpreter.IMain$Factory could not be instantiated
[2014-05-22 15:02:31,527][INFO ][no.found.elasticsearch.plugin.jsr223.JSR223Module] Registering JSR223 engine: [Luaj-Luaj-jse 3.0-beta2] for type: [lua] and extensions: [[lua, .lua]]
[2014-05-22 15:02:31,534][INFO ][no.found.elasticsearch.plugin.jsr223.JSR223Module] Registering JSR223 engine: [AppleScriptEngine-1.1] for type: [AppleScript] and extensions: [[scpt, applescript, app]]
[2014-05-22 15:02:31,579][INFO ][no.found.elasticsearch.plugin.jsr223.JSR223Module] Registering JSR223 engine: [Mozilla Rhino-1.7 release 3 PRERELEASE] for type: [ECMAScript] and extensions: [[js]]
[2014-05-22 15:02:32,814][DEBUG][discovery.zen.ping.multicast] [Hazmat] using group [224.2.2.4], with port [54328], ttl [3], and address [null]

The AppleScript and Rhino engines are bundled with my JDK and since I launched Elasticsearch from Eclipse, which also adds the test scope of the project (technically not correct, but very convenient), it also added Luaj and tried to launch the Scala engine which failed to run for some reason I have yet to figure out.

To test if the engines are launchable we can do the traditional “Hello world”. Any index with at least one document will do fine and then post the below query to the _search endpoint:

{
    "aggs": {
        "script": {
            "terms": {
                "lang": "lua",
                "script": "return 'Hello World!'"
            }
        }
    },
    "size": 0
}

And once more for Rhino:

{
    "aggs": {
        "script": {
            "terms": {
                "lang": "ECMAScript",
                "script": "\"Hello World!\""
            }
        }
    },
    "size": 0
}

Conclusion and Future Work

The implementation works, but it is just a proof of concept and will probably remain as such since most scripting engines will require a little more adaption and configuration to allow for good integration with Elasticsearch. That said, it should be a good starting point to make a plugin for any scripting engine that implement the JSR-223 interfaces. Feel free to check it out on Github.