The rollup feature is going through some changes with the aim to improve the user experience so rollup indexes can be used much more like regular indexes. The principal changes that we are planning to make are:

Remove rollup jobs in favour of an ILM action to rollup an index. This will mean that rolling up an index will work similarly to shrinking an index. The rollup will be done when indexing is complete and the action will rollup the entire index at the same time.
Add new field types specifically for rollup data. There will be two new field types:
1. Grouping tuple - The current thinking is this would be analagous to the groups in the current rollups and defines the diminsions you want to use in your rollup documents
2. Rollup metric - This field type will be used for the metrics you want to rollup within each group, i.e. the metrics to calculate the avg, min, max etc.
Remove the _rollup_search endpoint in favour of implementing searching on rollup indexes within the _search API. This will mean we'll need to modify the aggregation to be able to work on rollup fields as well as the currently existing field types.

Scripting Languages and Contexts API

We have opened a PR adding a new API which exposes the types of scripts allows (inline/stored), the language, and the contexts that each language may be used in. This API will allow Kibana to stop hard coding scripting languages in order to provide a selector when creating scripted fields.

Reindex

Resilient reindex sorts by _seq_no in order to be able to resume on failure. We are adding a Rally challenge for reindex to determine the performance impact of the sort. Current results show that resilient reindex is slower than non-resilient reindex, and we are looking into the root causes.

We amended the reindex documentation to clarify that source types are disregarded in reindex in 7.0+.

Als, there is a new setting to allow X-Pack to override the security headers that are required for reindexing, the code of which is OSS.

Faster sorted queries

In Elasticsearch 7.0 we introduced an optimization that allows to skip non-competitive documents when sorting documents by relevance. This optimization is exposed by default and doesn't require any configuration. We've also added new queries that take this optimization into account to give more weight to documents closer to a certain date or location. However, this optimization only works if documents are sorted by score (relevancy) so if you need to sort your documents by date or by a numeric field we have to switch to the slow execution that requires to visit all documents even if you don't need the total hit count of documents that match the query.

Today, we are happy to announce that we have merged a change which allows Elasticsearch to expose the optimization to queries sorted by an indexed numeric field. The idea is to automatically translate the numeric sort into a distance feature query that is able to prune efficiently the documents that are too far from the current top N. Unlike index sorting, this optimization work when sorting in both ascending and descending orders and can be very efficient as shown in our nightly benchmark where sorting by timestamp is now up to 35x faster than before.

We're now working on applying this optimization when using search_after or during a scroll. We hope that this will allow more use cases that require to retrieve a stream of documents from Elasticsearch in a sorted fashion.

We also want to work on another related change which will take into account the min and max time range on the shards and compare them to the sort value on the last competitive hit. If the shard cannot possibly contain a competitive hit it can then be skipped further improving the performance of searches sorted by date fields. This should mean that even when frozen indices are included in the search request, we can efficiently discount them, both improving the performance of the search and allowing Elasticsearch to release resources since we know these shards are unnecessary for the search.

More detail for the curious

The origin of the query is the minimum indexed value in the field for the ascending sort and the maximum when the sort is descending. We then take the average of the field ((max-min)/2) to compute a score based on how close the document is to the origin. We cannot use the numeric values directly since the framework that allows to skip documents during a query is exposed only for float values so the distance is an approximation of the real distance to the origin using the pivot value as a decay. When the queue of top documents is full, the distance feature query will check how many documents can be pruned by looking at the indexed values directly in order to eliminate all documents that are after the current worse top doc.

Lucene

Interval queries will now include the field in their toString representation.
An UnsupportedOperationException with the unified highlighter and interval queries has been addressed.
Interval queries are now capable of telling which terms matched.
Can we speed up the way BM25 scores are computed?
Should concurrent search take the size of the queue into account to decide on the number of slices?
A concurrency bug in polygon queries was fixed.

Changes

Changes in Elasticsearch

Changes in 8.0:

Update randomizedrunner to 2.7.4 #49345
IDs for doc snippets #49008
Adjustments for FIPS 140 testing #49319

Changes in 7.6:

Add a listener to track the progress of a search request locally #49471
Slash missed in indices.put_mapping url #49468
Enable LicenceServiceTests for all jdks #49440
New Histogram field mapper that supports percentiles aggregations. #48580
#48475 Pure disjunctions should rewrite to a MatchNoneQueryBuilder #48557
Make docker build task incremental #49613
Fix typo when assigning null_value in GeoPointFieldMapper #49645
Add templating support to pipeline processor. #49030
BREAKING: Add a cluster setting to disallow loading fielddata on _id field #49166
Add templating support to enrich processor #49093
Add Debug/Trace logging for authentication #49575
Annotated text type should extend TextFieldType #49555
Optimize sort on long field #48804
Introduce on_failure_pipeline ingest metadata inside on_failure block #49076
SQL: Add TRUNC alias for TRUNCATE #49571
Return 400 when handling invalid JSON #49552
print id detail when id is too long. #49433
Fix HLRC parsing of CancelTasks response #47017
Add the simple strategy to cluster settings #49414
Flush instead of synced-flush inactive shards #49126
Deprecate misconfigured SSL server config #49280

Changes in 7.5:

Do not mutate request on scripted upsert #49578
SQL: Fix issue with GROUP BY YEAR() #49559
Fix extraction of notarized Elasticsearch release distribution #49511
Replace required pipeline with final pipeline #49470
Netty4: switch to composite cumulator #49478

Changes in 7.4:

SQL: Fix issue with CASE/IIF pre-calculating results #49553
SQL: Fix issue with folding of CASE/IIF #49449

Changes in 6.8:

Fix iterate-from-1 bug in smart realm order #49473
[Java.time] Retain prefixed date pattern in formatter #48703

Changes in Elasticsearch Hadoop Plugin

Changes in 7.6:

Do not build scala docs as javadocs for scala 2.10 #1394

Changes in Elasticsearch SQL ODBC Driver

Changes in 6.8:

Fix buffer underrun for wrong received buffer size #199
Sign DLL files to be added into the MSI #198
Fix signing function signature #197

Elasticsearch Platform

Suite Elastic

Elastic Cloud

Observability

Security

Search

Par secteur

Par solution

Témoignage client

Développeurs

Communication

Apprentissage

Aide

Actualités d'Elastic

This Week in Elasticsearch and Apache Lucene - 2019-12-02

Elasticsearch

Rollup Refactor and GA

Scripting Languages and Contexts API

Reindex

Faster sorted queries

More detail for the curious

Lucene

Changes

Changes in Elasticsearch

Changes in Elasticsearch Hadoop Plugin

Changes in Elasticsearch SQL ODBC Driver

Nous suivre

À propos de nous

Emplois

Presse

Partenaires

Confiance et sécurité

Relations investisseurs

EXCELLENCE AWARDS