This Week in Elasticsearch and Apache Lucene - 2017-07-10
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Defence against full disks
Elasticsearch had a first line of defence in place to prevent nodes from running out of disk space. Once a node has reached a certain disk-utilization the cluster tries to move shards away from nodes with little disk space to prevent catastrophic situations where nodes fail will no space left on their devices. However, moving shards around is not always possible and even if it is, indexing might fill up disks faster than shards can be relocated.
Coming with Elasticsearch 6.0, nodes will stop accepting write requests to indices that have one or more shards allocated on a node being tight on disk space. This should provide a safer failure mode for clusters than the current behavior, where resources may be exhausted before being detected. Users will need to manually set indices back to read/write once they have provisioned more disk space.
The last 6.0 blockers for sequence numbers are closed! All Sequence Numbers related PRs slated for 6.0 are merged. We now have:
- fast operation based recovery,
- a custom translog retention policy to make fast recovery more likely,
- cleanup of old transaction logs on idle indices, and
- a primary/replica sync on primary promotions.
We also have the infrastructure we need to start developing the cross data centre replication (xDCR) X-Pack feature. We will continue to use the new infrastructure to tackle more complex correctness problems: the roll back of unneeded operations in replicas and usage of sequence numbers for optimistic locking.
Aggregations now have a rewrite phase which is similar to the query rewrite phase. This gives any aggregation the opportunity to rewrite itself into a simpler or more generic form, which increases the chance of the aggregation being cached. The first implementation of this rewrite is in the filter/filters aggregations where we now rewrite the filters meaning we can cache requests which use the filter/filters aggregation (as long as the underlying filter is cacheable).
Java High Level REST Client docs
The docs for the upcoming Java High Level REST Client are in the workings. You can get a preview here: https://www.elastic.co/guide/en/elasticsearch/client/java-rest/5.x . Plan is to add the missing search docs and a specific page with examples on how to migrate from Java API to REST client.
Cluster alerts in monitoring now cacheable
Cluster alerts (which use Watcher internally) were problematic for users with many clusters as they would run into the soft limit on the number of script compilations per minute. (Mustache templates, used by Watcher, are treated as scripts internally). Now, all the watches in cluster monitoring use the same searches, as we removed custom search inputs per cluster, so that it does not matter how many different clusters are being monitored. This means, there is no tweaking of the compilations per minute settings necessary. This was uncovered by an SDH ticket of a customer who monitored several clusters.
We had a longstanding issue about storing ids in binary form in the index. Elasticsearch accepts arbitrary strings as document identifiers, and up to now we used UTF8 encoding when indexing/storing them in the index. However, it is very common to have strings that represent a base64-encoded byte as ids (eg. autogenerated ids) or a number (eg. auto-increment ids coming from an external database whose content is replicated to Elasticsearch). In those cases, the UTF8 representation is respectively 33% and 2.4x larger than the original binary representation, so we had room for improvement.
We just changed the internal representation of ids to try to detect when the id might be a base64-encoded byte or a stringified number, and use a more efficient representation in those cases. This encoding makes base64-encoded ids about 32% smaller and numeric ids about 2x smaller compared to today, very close to the size of the original binary representation of those ids.
Beware that these savings will not automatically translate to significant reductions of the size of the index, given that Lucene performs prefix-compression of those ids in the terms dictionary, and LZ4 (or DEFLATE if using
index.codec: best_compression) compression of those ids in stored fields. However, more compact ids mean that Lucene has fewer bytes to compare when sorting values at flush time, when merging terms dictionaries at merge time, and when compressing data in general. This change will also allow to work on interesting follow-ups such as reordering bytes of autogenerated ids and dropping the sortability of ids (which we do not leverage) in favour of an order of bytes that makes prefix compression and indexing more efficient.
Changes in 5.x:
datefields is no longer updatable as it could prevent already indexed docs from being reindexed.
- Not-analyzed string fields upgraded from 2.x will no longer return
fielddata:falsein their mapping as this interfered with reindexing in 5.x.
joinfield (replacement for parent-child) should not add the parent type and id to each search hit as this information is already available in the source and these fields interfere with reindexing.
- Cross-cluster search now validates the cluster name whenever it updates its list of seed names, to be sure that the cluster name hasn't changed.
- Upgraded to Netty 4.1.13.Final.
Changes in master:
- BREAKING: A request to a valid REST endpoint with an unsupported HTTP method will now return a
405 METHOD NOT ALLOWEDstatus, and an
OPTIONSrequest to any REST endpoint will respond with the list of allowed HTTP methods. This required refactoring
RestControllerto use a single
PathTriefor all endpoints.
- BREAKING: The deprecated
foundresponse keys in index, delete, and bulk have been removed in favour of the
- BREAKING: Removed deprecated
fielddata_fieldsfrom the search request.
QueryParseContexthas been removed as it had become a simple wrapper around
- BREAKING: Removed the deprecated
IdsQueryBuilderconstructor which accepted
index.mapping.single_typesetting now defaults to
true, and can no longer be set in 6.0.
_analyzeAPI now supports normalizers.
- Snapshots to S3 sometimes failed with a security exception when a stream was closed during snapshotting.
- The search API will no longer silently ignore negative
transport.profiles.*settings have had a big refactoring including adding validation for these settings.
- Added a framework for cross-validating mutually dependent settings. This allows validating that disk threshold settings are correctly set.
Better version compatibility checks
Now that Lucene stores the version that was used to first create the index, it will use this version in order to check that you are not attempting to read a too old index. It was impossible before since Lucene only recorded the version that was used to write segments and commit points, so it was possible to use merging to make Lucene think an index is recent when in fact it is not. As of Lucene 8.0, this leniency will be gone.
- The new concurrent delete/update improvements had a race condition.
- The addition of hooks to run wildcard terms in phrases proves controversial since it would encourage usage of the dangerous SpanMultiTermQueryWrapper.
- SpanMultiTermQueryWrapper is a dangerous query which might expand to an arbitrary number of terms. We are looking into adding protection against this.
- Can we improve IndexOrDocValuesQuery to be more aware of the difference in cost between points and doc values?
- We keep forgetting that we already have a query that matches documents that have a value for a given doc-value field, so we decided to rename it so that it better reflects what it does.
- The fact that compiling scripts gets exponentially slower over time triggered a discussion about whether we should cache the result of compilations.
- Factory methods of doc values queries have been renamed to make sure their name includes "slow".
- Can we add a sort field that allows to sort child docs by value of their parent document?
- Should we use a common interface when multiple analysis factories set the same option?
- You can now group using the new ValuesSources API.
- IndexWriter sometimes gets wrong about the total number of docs in the index.
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!