July 9, 2014

Elasticsearch Plugin Types

UPDATE: This article refers to our hosted Elasticsearch offering by an older name, Found. Please note that Found is now known as Elastic Cloud.

In order to write a great plugin, we need to have an overview of the different plugin types and scopes Elasticsearch has made available to us. This article gives a rundown of these and also some tips on how to write a 'well behaved' plugin.

Introduction

Plugins make up an important part of the Elasticsearch ecosystem, and plugin authors are allowed to redefine or add - and even remove - functionality to/from Elasticsearch. In Writing an Elasticsearch Plugin: Getting Started we showed how to start developing a plugin from scratch; this time around we’re going to look at the different scopes a plugin can add functionality to.

Initialization

Plugins are discovered by looking for plugin definition files within the Java classpath of the Elasticsearch process. These provide the exact class names of the Plugin subclasses that Elasticsearch then loads and initializes. This process is described here. After being loaded, Elasticsearch will call certain methods on the Plugin instances depending on the scope the plugin currently can affect.

Plugin Component Scopes

In essence, there’s three levels, or scopes, of components that plugins may provide. These scopes are:

Global
Index
Shard

The scopes determine when the components they define are used. Global scope is used whenever the associated Elasticsearch instance is initializing, index scope whenever an index is created (or opened after being closed), and shard for every shard started on the given node. This feature allows different functionality to be attached to different indexes (or even shards) within the same Elasticsearch cluster.

Plugin Component Types

Each of the scopes has two kinds of components:

Module
Service

Modules are actually Guice modules that usually “bind” classes to implementations. Often, this simply implies that one or more instances of the class will be instantiated at a later time, when it’s required. An introduction to Guice is outside the scope of this article, but interested developers can find multiple tutorials and great introductions to the Guice framework here.

Services are anything that starts and may be stopped at a later time. Services in the global scope are implementations of LifecycleComponent, which has two important methods:start andstop that are invoked by Elasticsearch when it starts and stops respectively.

Plugins also get notified when Modules are created, which enables calling public methods on the modules before they’re used. The most common use case of this is registering additional analyzers, tokenizers or token filters on the AnalysisModule.

For an overview over which modules are available, have a look at Elasticsearch Internals: An Overview.

Additionally, plugins are allowed to provide additional settings that may be used by both Elasticsearch core and other plugins/modules.

Creating and Freeing Resources

Classes instantiated via modules may of course contain start/stop logic as well, but they have to perform their own bookkeeping, for example by registering a LifecycleListener on an external LifecycleComponent. It’s generally considered good practice to avoid doing heavy lifting or spawning threads during initialization, and rather use LifecycleComponent callbacks to actually create and - more importantly - stop and clean up resources when they’re no longer required. Not freeing resources when a module is no longer required, for example for shard scoped modules, may lead to memory leaks.

Overview of `Plugin` methods and what kind they define.
Method	Kind
modules()	Module
services()	LifecycleComponent
indexModules	Module
indexServices()	CloseableIndexComponent
shardModules()	Module
shardServices()	CloseableIndexComponent

In the above table, notice that index and shard services are not actually LifecycleComponents or ClosableIndexComponents. The difference between them is that the index and shard services are implicitly started when they are created, due to their nature (they’re only invoked when required).

Limitations

Not all functionality in Elasticsearch is currently intended (or even possible) to be extended or replaced. Some pieces of functionality are commonly either properly named with the Internal prefix, such as InternalNode and InternalTransportClient, hard-coded via Guice-bindings in a module (and not configurable) or not separated into the common interface/implementation pairs.

One example of a commonly used service that is not possible to replace or extend by writing a plugin is the ClusterService. This service is responsible for maintaining and handling updates to the cluster state, as well as notifying cluster state listeners about any changes. The binding of this service is hard-coded into Elasticsearch and is not configurable.

Another example is the TransportService. The TransportService is responsible for handling the connections to the other instances in the cluster, sending requests to the other nodes and providing the caller with the response via a callback. While it is possible to configure Elasticsearch to bind a different implementation by using the transport.service.type-configuration key, the TransportService is a class, not an interface, and is used extensively in different parts of the Elasticsearch code-base, so a significantly different implementation would be difficult since there’s a lot of compatibility that has to be kept.

Conclusion

At Found we see a lot of different use cases for plugins, ranging from simply providing word lists to custom rivers to adding support for new and exciting analyzers to Elasticsearch.

Found has a set of officially supported plugins, and the list of known Elasticsearch plugins keeps growing.

Plugins are a source of almost endless possibilities of extensions to Elasticsearch and is a great way to provide additional functionality to Elasticsearch without having to run a custom fork. We can’t wait to see what plugins the community comes up with next!