25 Juni 2012

0.19.5 Released

Von Shay Banon

elasticsearch version 0.19.5 is out. You can download it here. The release includes critical bug fixes in elasticsearch cluster management and peer recovery process, and it is highly recommended for all users.

[Update]: Some users reported a problem when starting elasticsearch with 0.19.5 due to verification error. This problem has been fixed in 0.19.6.
p. [Update 2]: The deb package seems to be broken in 0.19.5 and 0.19.6, a fix will be issued later.
p. [Update 3]: 0.19.7 has been released fixing the debian packaging problem and a bug in the new compression support (on reads).

The release also includes several features, including:

Store Compression

elasticsearch has had the ability to compress the _source document for a long time. The problem with it is the fact that small documents don’t end up compressing well, as several documents compressed in a single compression “block” (we use LZF, but Snappy, whihc will be supported also in the future, works the same) will provide a considerable better compression ratio. This version introduces the ability to compress stored fieds using the index.store.compress.stored setting, as well as term vector using the index.store.compress.tv setting.

The settings can be set on the index level, and are dynamic, allowing to change them using the index update settings API. elasticsearch can handle mixed stored / non stored cases. This allows, for example, to enable compression at a later stage in the index lifecycle, and optimize the index to make use of it (generating new segmetns that use compression).

Best compression, comprared to _source level compression, will mainly happen when indexing smaller documents (less than 64k). The price on the other hand is the fact that for each doc returned, a block will need to be decompressed (its fast though) in order to extract the document data.

Store Throttling

The way Lucene, the IR library elasticsearh uses under the covers, works is by creating immutable segments (up to deletes) and constantly merging them (the merge policy settings allow to control how those merges happen). The merge proces happen in an asyncronous manner without affecting the indexing / search speed. The problem though, especially on systems with low IO, is that the merge process can be expensive and affect search / index operation simply by the fact that the box is now taxed with more IO happening.

The store module now allows to have throttling configured for merges (or all) either on the node level, or on the index level. The node level throttling will make sure that out of all the shards allocated on that node, the merge process won’t pass the specific setting bytes per second. It can be set by setting indices.store.throttle.type to merge, and setting indices.store.throttle.max_bytes_per_sec to something like 5mb. The node level settings can be changed dynamically using the cluster update settings API.

If specific index level configuration is needed, regardless of the node level settings, it can be set as well using the index.store.throttle.type, and index.store.throttle.max_bytes_per_sec. The default value for the type is node, meaning it will throttle based on the node level settings and participate in the global throttling happening. Both settings can be set using the index udpate settings API dynamically.