This Week in Elasticsearch and Apache Lucene - 2018-06-08
Cross Cluster Search now preserves remote status code
To date, if an exception is thrown on the remote cluster in a cross-cluster search, the local cluster will return a 500 error -- which prevents clients from being made aware of problems with the request (e.g., searching an index that does not exist). We have merged a PR that will return to the client the proper status code from the remote cluster.
Faster indexing with lots of fields
An engineer investigated an indexing performance regression with a user at ElasticON London, which boiled down to the fact that shards could never get more than 256MB of memory for the indexing buffer. While increasing this buffer will not yield indexing speed improvements for users with a typical number of fields, users with many fields (20k fields in this case) will benefit from an increased buffer. We made a change to shift away from this fixed 256MB buffer and instead give the shard access to the entire node’s buffer. The IndexingMemoryController will now handle flushing shards to disk when the node’s buffer is at capacity.
Our team has made good progress on Zen2 - our Zen2 implementation is now thread-safe, has serialization support for our request objects, and has a real transport implementation. This ultimately allowed us to have our first integration test case for Zen2 that does a simple cluster formation.
CCR & Benchmarking
We’ve extended the existing http logs benchmarks to cover both smaller documents (geo_points track) and larger (pmc track containing full text oriented benchmark measuring indexing performance and running queries and aggregations on a corpus of scientific papers). The benchmarks all completed successfully. Future benchmarking work will extend to clusters spread wider geographically (e.g., US/Japan) and across providers (e.g., AWS and GCP).
We’ve merged work on application privileges into a feature branch. This feature introduces custom privileges to the x-pack security model, which permits applications to represent and store their own privileges model within Elasticsearch roles.
Changes in 5.6:
- Give the engine the whole index buffer size on init. #31105
- Add an escape hatch to increase the maximum amount of memory that IndexWriter gets. #31133
Changes in 6.3:
- Add an escape hatch to increase the maximum amount of memory that IndexWriter gets. #31132
- Fix audit index template upgrade loop #30779
- Make Persistent Tasks implementations version and feature aware #31045
Changes in 6.4:
- High-Level REST api: cancel task #30745
- Move RestGetSettingsAction to RestToXContentListener #31101
- Add high-level client methods that accept RequestOptions #31069
- Remove RestGetAllMappingsAction #31129
- Enable engine factory to be pluggable #31183
- Add support for ignore_unmapped to geo sort #31153
- Index phrases #30450
- Added max_expansion param to span_multi #30913
- Reject long regex in query_string #31136
- Deprecate unindexed phrases #31072
- Remove extra checks from HdfsBlobContainer #31126
- Do not check for S3 blob to exist before writing #31128
- Security: make native realm usage stats accurate #30824
- [Rollup] Disallow index patterns that match rollup indices #30491
- Add get mappings support to high-level rest client #30889
- Allow terms query in _rollup_search #30973
- Fix index prefixes to work with span_multi #31066
- Only auto-update license signature if all nodes ready #30859
- Add a doc value format to binary fields. #30860
- Move caching of the size of a directory to StoreDirectory. #30581
- BREAKING: Match phrase queries against non-indexed fields should throw an exception #31060
- Transport client: Don’t validate node in handshake (#30737) #31080
- Add TRACE, CONNECT, and PATCH http methods #31079
- Make sure KeywordFieldMapper#clone preserves split_queries_on_whitespace. #31049
- Change ObjectParser exception #31030
- [Rollup] Specialize validation exception for easier management #30339
- Moved pipeline APIs to ingest namespace #31027
- Reuse expiration date of trial licenses #31033
Changes in 7.0:
- Add a feature_vector field. #31102
- Add cors support to NioHttpServerTransport #30827
- Netty4SizeHeaderFrameDecoder error #31057
We started the release process for Lucene 7.4.0. This is the Lucene version that Elasticsearch 6.4.0 will be based on.
Arrays.copyOf and copyOfRange made forbidden APIs
A lucene developer found out that a failure to reject invalid payloads at index-time boiled down to the fact that Lucene doesn't check the offset and length of payloads and that Arrays.copyOf and Arrays.copyOfRange just append zeros when the end index is greater than the length of the provided array. While this is the documented behavior, it also makes these APIs trappy whenever you want to copy a slice of an array as you should make sure to also check bounds.
As a consequence we made them forbidden APIs and replaced them with ArrayUtil.grow and ArrayUtil.copyOfSubArray to force callers to be explicit about whether padding with zeros is the expected behavior.
- Can we improve readability of MoreLikeThis?
- We are discussing how TokenStreamToAutomaton should handle trailing position gaps.
- We started a discussion about whether we could make Lucene enforce more constraints on documents, like requiring that a field is unique or that you can't change the type of an existing field (which is only partially enforced today).