29 January 2018

This Week in Elasticsearch and Apache Lucene - 2018-01-29

•

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Faster prefix queries

Text fields will soon have an option to index prefixes so that prefix queries can run as term queries under the hood, which are much faster. In general, the performance of prefix queries depends on the number of terms that match the prefix, which makes queries on short prefixes more expensive. When enabled, this option will index all prefixes whose length is between 2 and 5 (both included), which we think is a good trade-off between speed (prefix queries on 6 characters or more should not perform too bad in general) and space, since this option indexes an additional field with edge ngrams under the hood.

We are thinking of doing something similar with shingles to speed up phrase queries in the future.

Meltdown Blogpost

After intensive benchmarking we have published a blog post explaining the performance impact of the Meltdown patches on Elasticsearch: https://www.elastic.co/blog/performance-impact-of-meltdown-on-elasticsearch.

Rally 0.9.0

Rally 0.9.0 has been released. It allows users to configure Elasticsearch plugin parameters on the command line. There are also several changes to the "track" file format to allow a more flexible definition of benchmarks. See the migration guide at http://esrally.readthedocs.io/en/stable/migrate.html for a walkthrough of these changes.

Changes in 5.6:

StringTerms.Bucket.getKeyAsNumber detection type #28118
Ensure we protect Collections obtained from scripts from self-referencing #28335
X-Pack:
- Ensure we protect Collections obtained from scripts from self-referencing #3681
- fix trailing backslash in datapath deprecation check #3642

Changes in 6.1:

Fix settting notificaiton for complex setting (affixMap settings) that could cause transient settings to be ignored #28317
Fix peer recovery flushing loop #28350

Changes in 6.2:

Plugins: Use one confirmation of all meta plugin permissions #28366
Update Netty to 4.1.16.Final #28345
Settings: Introduce settings updater for a list of settings #28338
Add information when master node left to DiscoveryNodes' shortSummary() #28197
X-Pack:
- [SAML] add security permission to get the classloader #3720
- Remove production from the message about license installation without TLS #3666
- [SAML] Find all tokens for a realm, not just the first 10 #3689
- Elevate privileges fetching metadata for SAML #3671
- Simplify security manager permissions #3651

Changes in 6.3:

Settings: Reimplement keystore format to use FIPS compliant algorithms #28255
Replace jvm-example by two plugin examples #28339
High level rest client : code clean up #28386
REST high-level client: add support for exists alias #28332
REST high-level client: move to POST when calling API to retrieve which support request body #28342
Add Indices Aliases API to the high level REST client #27876
Always return the after_key in composite aggregation response #28358
Remove Painless Type from MethodWriter in favor of Java Class. #28346
Deprecate the update_all_types option. #28284
[Plugin] Remove redundant argument for buildConfiguration of s3 plugin #28281
Completely Remove Painless Type from AnalyzerCaster in favor of Java Class #28329
Added Put Mapping API to high-level Rest client (#27205) #27869
Adds the ability to specify a format on composite date_histogram source #28310
Provide a better error message for the case when all shards failed #28333
Painless: Replace Painless Type with Java Class during Casts #27847
Trim down usages of ShardOperationFailedException interface #28312
Calculate sum in Kahan summation algorithm in aggregations (#27807) #27848
X-Pack:
- BREAKING: Remove XPackExtension in favor of SecurityExtensions #3734
- Remove the gradle cheatsheet #3708
- Remove legacy files from xpack split #3707
- Expose XPackExtensions via SPI #3530
- Trim down usages of ShardOperationFailedException interface #3662

Changes in 7.0:

BREAKING: Java api clean up: remove deprecated isShardsAcked #28311
BREAKING: Remove the update_all_types option. #28288
X-Pack:
- Fix XPackExtension javadoc #3711

Apache Lucene

Analysis

ShingleFilter doesn't work with synonyms. We would like to be able to speed up phrase queries with a simple mapping setting in the same way that we are doing for prefix queries. This will however require to fix ShingleFilter so that it works on arbitrary token streams and allows to return the same results as a phrase query would.
HyphenationDecompoundTokenFilter should really be thought as the extension of a tokenizer rather than a token filter.

Index

CheckIndex should better check doc-value iterators. This should help find bugs such as LUCENE-8117 (bug in advanceExact on old codecs) earlier in the future.
The ability for codecs to index impacts in getting positive feedback.
We need to disallow downgrading index options on the fly to fix a relevancy bug.

Search

Query parsing is unhappy when a stop filter removes a token of a multi-word synonym.
Block-max WAND uses indexed impacts (LUCENE-4198) in order to speed up selection of the top-k hits of OR queries. A first implementation gives interesting results.

Geo

Coplanarity checks might be wrong with points that are very close to each other due to floating-point errors.
The way that GeoPolygonFactory finds points inside a polygon might sometimes fail on small polygons.
Floating-point errors (them again) can also make plane construction wrong.
Should we have native support for R-trees in order to have better support for shape search?

The Search AI Company

ELK Stack

Elastic Cloud

Generative AI

Search

Security

Observability

By solution

Industries

Customer spotlight

Research

Build

Learn

Connect

This Week in Elasticsearch and Apache Lucene - 2018-01-29

Apache Lucene

Follow us

About us

Join us

Press

Partners

Trust & Security

Investor relations

EXCELLENCE AWARDS