02 April 2018

This Week in Elasticsearch and Apache Lucene - 2018-04-02

•

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Highlights

Percolator Bugs

Adrien fixed a few bugs in Percolator that caused queries to be returned that did not actually match the document specified in the request.

Search Optimizations

Jim added an optimization that will speed up composite aggregations in certain cases with range and match_all queries.

Secure LDAP Settings

Tim has changed the LDAP secure_bind_password field to be a secure setting. As LDAP bind users tend to be long-lived service accounts with broad access to search the directory, it is great that these credentials are now more secure.

SAML Improvements

Ioannis improved our SAML support with two recent contributions. These contributions allow us to sign generated SAML metadata as well as fixing a few missing SAML settings.

XContent Refactoring

Lee has moved a large amount of XContent code out of the elasticsearch core server code. This work is important as it helps provide a foundation to one day move to a world where the java high-level rest client code does not depend on the elasticsearch server code itself.

Formal Modeling

David continues to leverage formal modeling to find bugs in elasticsearch - most recently addressing the concurrent execution model of elasticsearch replicas.

Snapshot Restore

Tanguy addressed some bugs with snapshot restore functionality - specifically, the snapshot restore process examines the snapshot’s stored state of the entire cluster even when restoring a single index. In certain cases, this global state may not be readable, failing the restore process. Now, index restores only examine the index data in the snapshot, speeding up the restore and avoiding these issues.

Changes in 6.2:

Bulk processor#awaitClose to close scheduler #29263
Propagate ignore_unmapped to inner_hits #29261
REST client: hosts marked dead for the first time should not be immediately retried #29230
X-Pack:

Adds missing SAML Realm Settings #4221

Changes in 6.3:

REST high-level client: add support for Indices Update Settings API #28892
Fold EngineDiskUtils into Store, for better lock semantics #29156
Move trimming unsafe commits from the Engine constructor to Store #29260
Search: Validate script query is run with a single script #29304
Fix incorrect geohash for lat 90, lon 180 #29256
Fix handling of bad requests #29249
Prune only gc deletes below the local checkpoint #28790
Do not optimize append-only operation if normal operation with higher seq# was seen #28787
Allow _update and upsert to read from the transaction log #29264
Fix a type check that is always false #27726
Optimize the composite aggregation for match_all and range queries #28745
X-Pack:

Improve error if Indices Permission is too complex
Add secure_bind_password to LDAP realm
Replace ThrottlerField → Field in comments and string constants
[Rollup] Make Rollup a Basic license feature
[Rollup] Delegate GetJobs to master
Set order of audit log template to 1000
All logging audit settings updateable
Saml metadata signing
LdapUserSearch rebind with bind DN after bind user
[Rollup] Select best jobs then execute msearch-per-job

Changes in 7.0:

Do not load global state when deleting a snapshot #29278
Remove IndicesOptions bwc serialization layer #29281
Don’t load global state when only restoring indices #29239

Apache Lucene 7.3

The vote is closing today and no issues have been detected, so there are good chances that the 7.3 bits will be released later this week. In the meantime we have upgraded Elasticsearch to a recent 7.3 snapshot to make sure Lucene 7.3 doesn't introduce regressions with Elasticsearch.

Interval queries

Alan and Jim worked on the prototype of interval queries based on Efficient optimally lazy algorithms for minimal- interval semantics. Paolo Boldi and Sebastiano Vigna. Theoretical Computer Science, 2016. A first iteration has been pushed to the Lucene sandbox. These positional queries are similar to span queries, but we hope to take advantage of starting from scratch again to fix some issues with span queries and eventually replace span queries or merge this work into span queries.

Soft-delete-aware IndexWriter

At the moment soft deletes are managed on top of the IndexWriter. Simon is proposing changes to make the IndexWriter aware of soft deletes so that it would be possible to expunge soft deletes through merges just like hard deletes, eg. with a 7-days retention policy.

Nori, a new korean analyzer

Jim has been working hard with members of the Korea team at Elastic in order to build a morphological analyzer for Korean using the same underlying ideas and data-structures as Kuromoji (the Japanese analyzer), even though the implementation is very different due to differences between these languages. The initial prototype received good initial feedback, Jim and Robert are now iterating on improving dictionary compression, memory usage and decompounding.

Give queries an API to iterate positions of matches

Simon and then Alan had been working for a long time on a new API that would allow scorers to expose positions/offsets. This would have several benefits, including making the implementation of highlighters easier since they could directly ask a query for an iterator over the matching positions. Unfortunately this also introduced quite some complexity and never got merged. Alan recently went back with a proposal of a much smaller scope which only consists of returning the matching positions/offsets for a single document.

Other

- TestIndexSorting uses large indexes which occasionnally cause OOMEs with in-memory codecs. We should make these tests simple or only use efficient codecs for this test.

- Can we populate the filter cache asynchronously?

Elasticsearch Platform

ELK Stack

Elastic Cloud

Observability

Security

Search

By industry

By solution

Customer spotlight

Developers

Connect

Learn

Help

See what's happening at Elastic

This Week in Elasticsearch and Apache Lucene - 2018-04-02

Follow us

About us

Join us

Press

Partners

Trust & Security

Investor relations

EXCELLENCE AWARDS