25 4월 2016 엔지니어링

Effective Elasticsearch Plugin Management with Docker

By Tyler Langlois

If you're running Elasticsearch within Docker containers, there are some important operational considerations to bear in mind. This is especially true when managing stateful services and daemons - when persisting data outside of an ephemeral container becomes important.

Using Elasticsearch plugins within containers is an example of this, both in terms of installing them in a repeatable, trackable manner and managing plugin configuration and data.

In this post, we'll explore some of the options to achieve sane plugin management within the context of Docker and Elasticsearch.

Note: In this guide we will reference the Elasticsearch image found on the Docker Hub. The development and production of this Docker image is not affiliated with Elastic, Inc.

A Docker Persistence Primer

Like a file change in a version control system, changing the filesystem in a running container introduces differences in the running image. Steps need to be taken if data and changes need to persist permanently outside the scope of an impermanent container.

Although one could leverage some more complex storage schemes to achieve container persistence, the basic Docker mechanism of bind mounts is the most illustrative. This is achieved by, for example, keeping Elasticsearch indices long-term in /data by passing an option like -v /data:/usr/share/elasticsearch/data to a docker run command, which effectively stores your Elasticsearch data in the host's (not container's) /data directory.

If the changes are more permanent, that is, they are expected to be there without changing over time, codifying changes in a Dockerfile may make more sense. Both approaches are useful in different cases.

Managing Basic Plugins

In the following examples, I refer to "basic" plugins as those that do not require licenses or any other special components. Straightforward plugins like cloud-aws just need to place some files on the system to work.

In a case like this, managing the presence of the plugin is simplest by just extending the image using a Dockerfile. For example, considering the following Dockerfile:

FROM elasticsearch:2

RUN /usr/share/elasticsearch/bin/plugin install --batch cloud-aws

This Dockerfile starts with the Elasticsearch image provided by maintainers at the Docker hub and runs a simple plugin install command. This image can then be built and referenced by a tag:

$ docker build -t elasticsearch-aws .

When run in the same directory as the Dockerfile, the image name elasticsearch-aws is built and can now be referenced in future Docker commands when starting new containers with behavior inherited FROM the original.

More Complex Plugins

Some plugins may require the presence of additional files (such as certificates when using Shield) for certain features. The aforementioned technique of building a custom image can handle the installation of these plugins, but managing configuration is a task better left to a different approach. This helps keep images generic for deployment re-use and maintains tighter control over secrets.

Note: Some commercial plugins require the presence of a license. In the following examples, we simply rely on the temporary trial license present by default. When deploying in production, license management is performed through the Elasticsearch REST API, which stores the license in Elasticsearch's data path. As long as your data is persisted appropriately through a volume mount or otherwise, your license will be saved within your cluster.

Example: Shield

As outlined in the Shield installation documentation, installing the license and shield plugins is a prerequisite, which we can achieve by using the previous strategy to build a derived image:

FROM elasticsearch:2

RUN /usr/share/elasticsearch/bin/plugin install --batch license
RUN /usr/share/elasticsearch/bin/plugin install --batch shield

Then build the image to use for future steps:

$ docker build -t elasticsearch-shield .

At this point, if we volume mount a config directory into the container, Shield will pick up our settings. As an example configuration, consider the following directory structure:

$ tree config
config
├── elasticsearch.yml
├── logging.yml
├── scripts
└── shield
    ├── roles.yml
    ├── users
    └── users_roles

2 directories, 4 files

The logging.yml contains a default logging configuration. In elasticsearch.yml, binding to the wildcard 0.0.0.0 ensures we can reach the container:

$ cat elasticsearch.yml
network.host: 0.0.0.0

For the Shield configuration, we have defined a single role, admin, along with a user called "example" with a password of "password" and added it to the admin role (you'll obviously want a more secure configuration than this!):

$ cat shield/roles.yml
admin:
  cluster: all
  indices:
    '*':
    privileges: all
$ cat shield/users
example:$2a$10$ppZqjFEXgVE3yT/yQPsp4etGMdF4.RFCS9OOGwZGAp0l3lPh4/ALC
$ cat shield/users_roles
admin:example

Note: In this example we are using a password hash generated using the esusers Shield utility.

We then start the container, passing in the volume for our configuration and exposing the REST port:

$ docker run -d -p 9200:9200 -v "$PWD/config":/usr/share/elasticsearch/config elasticsearch-shield

Elasticsearch should deny unauthenticated requests and permit access to the credentials used earlier (in this example it is assumed that Docker is exposing ports on the localhost):

$ curl -I -XGET -k https://localhost:9200/_cluster/health
HTTP/1.1 401 Unauthorized
WWW-Authenticate: Basic realm="shield"
Content-Type: application/json; charset=UTF-8
Content-Length: 389
$ curl -I -XGET -k -u example:password https://localhost:9200/_cluster/health
HTTP/1.1 200 OK
Content-Type: application/json; charset=UTF-8
Content-Length: 389
Adding SSL/TLS

Like mounting Shield configuration files, SSL and TLS certificates can be similarly managed. Most of the steps outlined in the Shield SSL/TLS guide can be followed, bearing in mind that the CONFIG_DIR directory is the path that we will be mounting into the container at runtime. The full extent of CA and certificate management is outside the scope of this tutorial, so we will assume here that you are using a correctly configured Java keystore file, referred to here as node01.jks.

Exposing the keystore file is simply a matter of including it within the configuration directory that is mounted into the container.

$ tree config
config
├── elasticsearch.yml
├── logging.yml
├── node01.jks
├── scripts
└── shield
    ├── roles.yml
    ├── users
    └── users_roles
$ file config/node01.jks
config/node01.jks: Java KeyStore

Following the Shield user guide, we enable transport encryption:

$ cat config/elasticsearch.yml
network.host: 0.0.0.0
shield.ssl.keystore.path: /usr/share/elasticsearch/config/node01.jks
shield.ssl.keystore.password: password
shield.transport.ssl: true
shield.http.ssl: true

Note: In production, you may want tighter control over your keystore - in this example, we only locked the keystore with a generic password.

With the keystore file in place and SSL enabled, we can start the node and issue requests over HTTPS (in this case, passing -k to curl to bypass a self-signed certificate):

$ docker run -d -p 9200:9200 -v "$PWD/config":/usr/share/elasticsearch/config elasticsearch-shield
87e51d000cc11d63fbedb8a61d58ab1723f4a598b13614272a3b9d7f36a7b223
$ curl -I -XGET -k -u example:password https://localhost:9200/_cluster/health
HTTP/1.1 200 OK
Content-Type: text/plain; charset=UTF-8
Content-Length: 0

When it comes time to distribute keystore files across the cluster, they could potentially be managed by a configuration management module or a similar approach.

Note: Pay close attention to the appropriate options to pass to the -ext option when generating certificates for your nodes that will run within Docker. DNS names and IP addresses should correctly reflect the hostname or IP address that nodes will use to communicate with one another.

Summary

Although we've given a few concrete examples in this blog post, every deployment is different, and you should tailor your setup according to whatever promotes reliability, repeatability, and security in your environment. Generally speaking, following these guidelines should aid in a good plugin management scheme:

  • Define your most generic steps in Docker images. By managing plugins early then running containers from those images, you can avoid re-running the same installation commands over and over.
  • Maintain persistent data within volume mounts. Storing your index data and plugin configuration separately from the container image ensures that your data is controlled and containers remain ephemeral.
  • Test! Before deploying any production infrastructure, ensure that Elasticsearch behaves as expected within Docker, especially in regards to network communication (including unicast and IP address binding behavior), JVM resource allocation, and plugin functionality.