Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Faster prefix queries
Text fields will soon have an option to index prefixes so that prefix queries can run as term queries under the hood, which are much faster. In general, the performance of prefix
queries depends on the number of terms that match the prefix, which makes queries on short prefixes more expensive. When enabled, this option will index all prefixes whose length is between 2 and 5 (both included), which we think is a good trade-off between speed (prefix queries on 6 characters or more should not perform too bad in general) and space, since this option indexes an additional field with edge ngrams under the hood.
We are thinking of doing something similar with shingles to speed up phrase queries in the future.
Meltdown Blogpost
After intensive benchmarking we have published a blog post explaining the performance impact of the Meltdown patches on Elasticsearch: https://www.elastic.co/blog/performance-impact-of-meltdown-on-elasticsearch.
Rally 0.9.0
Rally 0.9.0 has been released. It allows users to configure Elasticsearch plugin parameters on the command line. There are also several changes to the "track" file format to allow a more flexible definition of benchmarks. See the migration guide at http://esrally.readthedocs.io/en/stable/migrate.html for a walkthrough of these changes.
Changes in 5.6:
- StringTerms.Bucket.getKeyAsNumber detection type #28118
- Ensure we protect Collections obtained from scripts from self-referencing #28335
- X-Pack:
Changes in 6.1:
- Fix settting notificaiton for complex setting (affixMap settings) that could cause transient settings to be ignored #28317
- Fix peer recovery flushing loop #28350
Changes in 6.2:
- Plugins: Use one confirmation of all meta plugin permissions #28366
- Update Netty to 4.1.16.Final #28345
- Settings: Introduce settings updater for a list of settings #28338
- Add information when master node left to DiscoveryNodes' shortSummary() #28197
- X-Pack:
Changes in 6.3:
- Settings: Reimplement keystore format to use FIPS compliant algorithms #28255
- Replace jvm-example by two plugin examples #28339
- High level rest client : code clean up #28386
- REST high-level client: add support for exists alias #28332
- REST high-level client: move to POST when calling API to retrieve which support request body #28342
- Add Indices Aliases API to the high level REST client #27876
- Always return the after_key in composite aggregation response #28358
- Remove Painless Type from MethodWriter in favor of Java Class. #28346
- Deprecate the
update_all_types
option. #28284 - [Plugin] Remove redundant argument for buildConfiguration of s3 plugin #28281
- Completely Remove Painless Type from AnalyzerCaster in favor of Java Class #28329
- Added Put Mapping API to high-level Rest client (#27205) #27869
- Adds the ability to specify a format on composite date_histogram source #28310
- Provide a better error message for the case when all shards failed #28333
- Painless: Replace Painless Type with Java Class during Casts #27847
- Trim down usages of
ShardOperationFailedException
interface #28312 - Calculate sum in Kahan summation algorithm in aggregations (#27807) #27848
- X-Pack:
Changes in 7.0:
- BREAKING: Java api clean up: remove deprecated
isShardsAcked
#28311 - BREAKING: Remove the
update_all_types
option. #28288 - X-Pack:
- Fix XPackExtension javadoc #3711
Apache Lucene
Analysis
- ShingleFilter doesn't work with synonyms. We would like to be able to speed up phrase queries with a simple mapping setting in the same way that we are doing for prefix queries. This will however require to fix ShingleFilter so that it works on arbitrary token streams and allows to return the same results as a phrase query would.
- HyphenationDecompoundTokenFilter should really be thought as the extension of a tokenizer rather than a token filter.
Index
- CheckIndex should better check doc-value iterators. This should help find bugs such as LUCENE-8117 (bug in advanceExact on old codecs) earlier in the future.
- The ability for codecs to index impacts in getting positive feedback.
- We need to disallow downgrading index options on the fly to fix a relevancy bug.
Search
- Query parsing is unhappy when a stop filter removes a token of a multi-word synonym.
- Block-max WAND uses indexed impacts (LUCENE-4198) in order to speed up selection of the top-k hits of OR queries. A first implementation gives interesting results.
Geo
- Coplanarity checks might be wrong with points that are very close to each other due to floating-point errors.
- The way that GeoPolygonFactory finds points inside a polygon might sometimes fail on small polygons.
- Floating-point errors (them again) can also make plane construction wrong.
- Should we have native support for R-trees in order to have better support for shape search?