03 November 2015 Engineering

Performance Considerations for Elasticsearch 2.0 Indexing

By Michael McCandless

A little over a year ago I described how to maximize indexing throughput with Elasticsearch, but a lot has changed since then so here I explain the changes that affect indexing performance in Elasticsearch 2.0.0.

small-auto-throttle.jpg

Store throttling is now automatic

Prior to 2.0.0, Elasticsearch throttled ongoing merges at fixed rate (20 MB/sec by default), but this was often far too low and would easily lead to merges falling behind and subsequent index throttling. 

As of 2.0.0, Elasticsearch now uses adaptive merge IO throttling: when merges are starting to fall behind, the allowed IO rate is increased, and decreased when merges are keeping up. This means a sudden but rare large merge on a low-throughput indexing use case should not saturate all available IO on the node, minimizing impact to ongoing searches and indexing.

But for heavy indexing, the allowed merge IO will increase to match the ongoing indexing.

This is good news!  It means you shouldn't need to touch any throttling settings: just use the defaults.

Multiple path.data

Using multiple IO devices (by specifying multiple path.data paths) to hold the shards on your node is useful for increasing total storage space, and improving IO performance, if that's a bottleneck for your Elasticsearch usage.

With 2.0, there is an important change in how Elasticsearch spreads the IO load across multiple paths: previously, each low-level index file was sent to the best (default: most empty) path, but now that switch is per-shard instead.  When a shard is allocated to the node, the node will pick which path will hold all files for that shard. 

The improves resiliency to IO device failures: if one of your IO devices crashes, now you'll only lose the shards that were on it, whereas before 2.0 you would lose any shard that had at least one file on the affected device (typically this means nearly all shards on that node).

Note that an OS-level RAID 0 device is also a poor choice as you'll lose all shards on that node when any device fails, since files are striped at the block level, so multiple path.data is the recommended approach.

You should still have at least 1 replica for your indices so you can recover the lost shards from other nodes without any data loss.

The Auto-ID optimization is removed

Previously, Elasticsearch optimized the auto-id case (when you didn't provide your own id for each indexed document) to use append-only Lucene APIs under-the-hood, but this proved problematic in error cases and so we removed it.

We also greatly improved ID lookup performance so that this optimization was no longer so important, and this allowed us to remove bloom filters entirely as they consumed non-trivial heap but didn't offer much ID lookup performance gain.

Finally, as of 1.4.0, Elasticsearch now uses Flake IDs to generate its IDs, for better lookup performance over the previous UUIDs.

All of this means you should feel free to use your own id field, with no performance penalty due to the now removed auto-id handling, but just remember that your choice of id field values does affect indexing performance.

More indexing changes...

Beyond these changes there have been a great many additional 2.0 indexing changes such as: enabling doc values by default (instead of using CPU- and heap-costly field data at search time), moving the dangerous delete-by-query API from a core API to a safe plugin on top of the bulk indexing API, exposing control over increased stored fields compression while indexing, better compression of sparse doc values and norms, reducing heap required when merging and a number of improvements in resiliency such as detecting pre-existing index corruption when merging.

If you are curious about the low-level Lucene actions while indexing, add index.engine.lucene.iw: TRACE to your logging.yml, but be warned: this generates impressively copious output!

Our nightly indexing performance charts shows that we are generally moving in the right direction with our defaults over time: the docs/sec and segment counts for the defaults (4 GB) heap has caught up to the fast performance line.