Elasticsearch Dev Highlights
Bitset Level Cache for Document Security
Document Level Security currently uses the shared BitSet cache. However that cache (by design) never expires its entries. The problem is that document level security queries support templates that can make use of user metadata (like usernames and roles). So, it’s possible that every user has their own concrete query which will cause us to construct and cache a separate bitset for each user, for every lucene segment. In some cases this has caused nodes to hold more than 10 Gb worth of bit sets all due to DLS. We opened a PR to switch DLS to use its own cache, with a configurable memory limit so that we can avoid such problems.
We merged the automatic cleanup logic for orphaned indices that will automatically run on snapshot deletes in ES versions 7.3 and above. We are also looking at adding a dedicated ES endpoint to trigger this clean-up.
We are making reindex resilient against coordinator crashes by moving parts of it to the persistent tasks framework. We are currently looking at ways to make the API backward compatible. The reindex API, when run with wait_for_completion=false, currently returns a task id which can be used with the Tasks APIs to cancel or get the status of the task. The task IDs returned by the Tasks APIs are local to a node, however. By making reindex resilient to node failures, tasks will be respawned on another node where they will have a fresh task ID. These task IDs provided by the tasks API can therefore not be used for cancelling resilient reindex jobs in the future. We will be looking at creating a proper API for tracking status of and cancelling reindex jobs. We have also added support to serialize reindex requests to non-JSON content-types, which will be relevant for failovers as well.
We are making reindex resilient against data node crashes. While iterating on the prototype, we are factoring out code that does not need to stay on a feature branch for now but can go directly to master to simplifies reindex testing there as well.
We merged the new feature in x-pack under a Basic license. Starting in 7.3 users will be able to mark their synonyms filters as updatable (with a simple flag). The flag will ensure that these filters are only used on search analyzers for query time expansions. Updating the synonyms will not require to close/open an index anymore, instead users can send the new synonym file to all nodes and a call to POST /my_index/_reload_search_analyzers will automatically update all search analyzers with the latest version of synonyms. We are now working on adding the new API in the high level rest client.
Prefix and Wildcard intervals
We exposed prefix and wildcard sources in the Intervals query. Both source will throw an error if the prefix/wildcard matches more than 128 terms but the prefix one can take advantage of the index_prefixes option of the text field to limit the number of inverted lists that must be visited.
Ever wish you could write an Ingest node processor that did some expensive work and didn't block ingestion? We have been working on giving developers the ability to fork inside of an ingest node processor and callback on success/failure. This frees up the write thread to do other work while the processor is doing something expensive. This will be used by the enrich project, but has other potential uses. (#43361).
Snapshot and Restore UI
Snapshot Lifecycle Management (SLM)
We started working on SLM retention. Retention for SLM is the next phase for snapshot lifecycle management that allows a user to specify how many snapshots to keep, and for how long. This work will also eventually allow users to replace any periodic snapshotting with something built into ES. It also will allow the same functionality for the k8s operator. There’s now an SLM retention meta issue (#43663) as well as a feature branch (slm-retention), the first PR sets up the framework for future work.
Password Protected Keystore
We extracted coordinate validation from libs/geo classes to make them compatible with XYShape. Work has also begun working on ShapeBuilder refactoring, which we think is a prerequisite to the XYShape. E.g. we want to get the ShapeBuilder story sorted out before we introduce more complexity on top of it with XYShape.
We created a meta issue to track development on how the XYShape feature will be integrated into Elasticsearch. It will introduce a new
geometry field type, used for indexing and searching cartesian geometries. Work is progressing quickly on the GeometryFieldMapper and it's looking like we'll have the initial feature (general cartesian geometry support) ready for a 7.x release. That will move us to working on the next big Geo feature: adding support for multiple projections
Inspired by a feature request, we opened a PR to add a cumulative cardinality pipeline agg. While a regular cardinality agg can tell you the number of distinct items each day, a cumulative cardinality agg can tell you how many "new" distinct items you've seen each day (as opposed to "repeat" items you might have seen before). This is useful for example with tracking entirely new visitors to your site. It leverages the HyperLogLog sketch that we already serialize to the coordinating node to iteratively merge into a cumulative sketch. The agg is housed inside a new "data science" xpack plugin and will be Basic+
We spent some time adding support for mocking out scripts, to support the refactoring of integration tests, and better support for unmapped field types. We also opened a PR to remove the recursive generic on AggregatorFactory to help simplify type signatures.
Maximum clause count check
You might be familiar with the fact that Lucene fails boolean queries that have more than 1024 clauses. As of Lucene 9, this limit won't be applied on a per-BooleanQuery basis but on the entire query tree. Before running a query, IndexSearcher now uses the query visitor API in order to visit all clauses of the query and count them.
Better integration of Block-Max WAND
After improving Lucene's block-max AND to more efficiently run sub-clauses that have two-phase iterators such as phrase queries, We are now tackling a similar issue with Lucene's block-max WAND scorer, the scorer that is used for disjunctions. A first patch was proposed that made sure to run the verification phase of the two-phase iterator lazily. However, contrary to our intuition, benchmarks reported that this made disjunctions that include phrase queries slower. This is due to the fact that the patch assumes that it is cheaper to advance a term query than it is to verify a phrase match, which is not always true. The dataset that the benchmark has run on makes this problem very obvious since it contains documents that are artificially truncated, making phrase queries always fast to check. We are now iterating on a 2nd version of the patch that will hopefully address this issue.
Interestingly, the cost of advancing term queries is an issue we had already encountered with conjunctions, so we started looking into what could be done to make it cheaper. The main reason why it is so expensive is that advancing a clause usually involves moving to a new block, and therefore requires decoding doc IDs and term frequencies for this new block - 128 entries. While doc IDs are always needed, there is a chance that none of the term frequencies will be needed. We are working on a change that decodes frequencies lazily, which could decrease the cost of advancing a clause by up to 2x in some cases. We already had evidence that it would help with conjunctions and will now do testing with disjunctions as well.
Fast range queries on doc values
We contributed a special range query which uses binary search on doc values in order to find matches. This only works when the index is sorted by the field that is being queried, but this opens interesting doors as it performs very well yet doesn't require the field to be indexed, which could be interesting for our space-savvy users.
Better handling of duplicates in BKD trees
We noticed some inefficiencies when storing duplicate values in BKD trees. Storage efficiency is not good since we don't even try to leverage these duplicates for compression unless they fill entire blocks. And query efficiency is not good either since we check whether a given value matches again and again, while we could test it only once. As a consequence Ignacio is contributing 3 changes that will make this better:
Using data dimensions as a tie-breaker when partitioning to make sure that documents that have the same data-only dimensions will be next to each other in the tree as well.
Using run-length compression to store duplicate values on leaves of the BKD tree.
Changing our Points API to allow queries to leverage the fact that there are duplicates on a leaf in order not to do the same computation over and over again.
- Lucene now has utility methods to get a rough estimate of the memory usage of a query. This can be useful when using queries as keys of a cache which is memory-bound.
- A community member is contributing a stemmer for Estonian.
- We are refactoring TopDocs#merge to make it easier to control tie-breaking of documents that have the same sort values.
- We are adding a framework to track memory usage of collectors.
- We could add ways to track per-query I/O usage via a directory wrapper.
- We merged a change that allows completion suggesters to be loaded offheap.
- Can we simplify the FieldComparator API?
- We noticed that there is ambiguity in the paper that describes the minimal english stemmer.
- Luke should show SPI names instead of class names, since this is how users look up analysis factories.
- There is a contribution to see intermediate analysis results with Luke.
Changes in Elasticsearch
Changes in 8.0:
- BREAKING: Remove deprecated sort options: nested_path and nested_filter #42809
- BREAKING: Remove preconfigured delimited_payload_filter #43686
- Remove compile-time dependency on test fixtures #43651
- Reindex remove outer level size #43373
- Fix GET /_snapshot/_all/_all if there are no repos #43558
Changes in 7.3:
- Remove nodeId from BaseNodeRequest #43658
- Add version and create_time to data frame analytics config #43683
- Allow reloading of search time analyzers #43313
- Use preconfigured filters correctly in Analyze API #43568
- Require [articles] setting in elision filter #43083
- [ML] Tag destination index with data frame metadata #43567
- Add painless method getByPath, get value from nested collections with dotted path (#43170) #43606
- Add prefix intervals source #43635
- [ML][Data Frame] Add support for allow_no_match for endpoints #43490
- [ML][Data Frame] improve pivot nested field validations #43548
- Ensure relocation target still tracked when start handoff #42201
- Disable testing conventions on Windows #43532
- Fix failing LicensingDocumentationIT test #43533
- Add voting-only master node #43410
- Revert "Test clusters: convert x-pack qa tests (#43283)" #43549
- Properly serialize remote query in ReindexRequest #43457
- Fix CreateRepository Requeset in HLRC #43522
- Fix score mode of the MinimumScoreCollector #43527
- Set document on script when using Bytes.WithScript #43390
- Add annotations to Painless whitelist #43239
- Do not hang on unsupported HTTP methods #43362
Changes in 7.2:
- Fix propagation of enablePositionIncrements in QueryStringQueryBuilder #43578
- Fix UOE on search requests that match a sparse role query #43668
- Issue deprecation warnings for preconfigured delimited_payload_filter #43684
- [Ml Data Frame] Size the GET stats search by number of Ids requested #43206
- [ML][Data Frame] re-ordering format priorities in dest mapping #43602
- Fix dockerfile for non-local builds #43591
- Fix indices shown in _cat/indices #43286
- [ML][Data Frame] Adjusting error message #43455
Changes in 7.1:
- Fix the bundled jdk flag to be passed through windows startup #43502