Here are the highlights of what’s new and improved in Elasticsearch 8.7!
8.6 | 8.5 | 8.4 | 8.3 | 8.2 | 8.1 | 8.0
Time series (TSDS) GAedit
Time Series Data Stream (TSDS) is a feature for optimizing Elasticsearch indices for time series data. This involves sorting the indices to achieve better compression and using synthetic _source to reduce index size. As a result, TSDS indices are significantly smaller than non-time_series indices that contain the same data. TSDS is particularly useful for managing time series data with high volume.
Downsampling is a feature that reduces the number of stored documents in Elasticsearch time series indices, resulting in smaller indices and improved query latency. This optimization is achieved by pre-aggregating time series indices, using the time_series index schema to identify the time series. Downsampling is configured as an action in ILM, making it a useful tool for managing large volumes of time series data in Elasticsearch.
Geohex aggregations on both
Previously Elasticsearch 8.1.0 expanded
geo_grid aggregation support from rectangular tiles (geotile and geohash)
to include hexagonal tiles, but for
geo_point only. Now Elasticsearch 8.7.0 will support
Geohex aggregations over
geo_shape as well,
which completes the long desired need to perform hexagonal aggregations on spatial data.
In 2018 Uber announced they had open sourced their H3 library, enabling hexagonal tiling of the planet for much better analytics of their traffic and regional pricing models. The use of hexagonal tiles for analytics has become increasingly popular, due to the fact that each tile represents a very similar geographic area on the planet, as well as the fact that the distance between tile centers is very similar in all directions, and consistent across the map. These benefits are now available to all Elasticsearch users.
Allow more than one KNN search clauseedit
Some vector search scenarios require relevance ranking using a few kNN clauses, e.g. when ranking based on several fields, each with its own vector, or when a document includes a vector for the image and another vector for the text. The user may want to obtain relevance ranking based on a combination of all of these kNN clauses.
Make natural language processing GAedit
From 8.7, NLP model management, model allocation, and support for inference against third party models are generally available. (The new
text_embedding extension to
knn search is still in technical preview.)
Speed up ingest geoip processorsedit
geoip ingest processor is significantly faster.
Previous versions of the geoip library needed special permission to execute
databinding code, requiring an expensive permissions check and
AccessController.doPrivileged call. The current version of the geoip
library no longer requires that, however, so the expensive code has been
removed, resulting in better performance for the ingest geoip processor.
Speed up ingest set and append processorsedit
append ingest processors that use mustache templates are
Improved downsampling performanceedit
Several improvements were made to the performance of downsampling. All hashmap lookups were removed. Also metrics/label producers were modified so that they extract the doc_values directly from the leaves. This allows for extra optimizations for cases such as labels/counters that do not extract doc_values unless they are consumed. Those changes yielded a 3x-4x performance improvement of the downsampling operation, as measured by our benchmarks.
The Health API is now generally availableedit
Elasticsearch introduces a new Health API designed to report the health of the cluster. The new API provides both a high level overview of the cluster health, and a very detailed report that can include a precise diagnosis and a resolution.
Improved performance for get, mget and indexing with explicit `_id`sedit
The false positive rate for the bloom filter on the
_id field was reduced from ~10% to ~1%,
reducing the I/O load if a term is not present in a segment.
This improves performance when retrieving documents by
_id, which happens when performing
get or mget requests, or when issuing
_bulk requests that provide explicit `_id`s.
Speed up ingest processing with multiple pipelinesedit
Processing documents with both a request/default and a final pipeline is significantly faster.
Rather than marshalling a document from and to json once per pipeline, a document is now marshalled from json before any pipelines execute and then back to json after all pipelines have executed.
Support geo_grid ingest processoredit
geo_grid ingest processor supports creating indexable geometries
from geohash, geotile and H3 cells.
There already exists a
circle ingest processor that creates a polygon from a point and radius definition.
This concept is useful when there is need to use spatial operations that work with indexable geometries on
geometric objects that are not defined spatially (or at least not indexable by lucene).
In this case, the string
4/8/5 does not have spatial meaning, until we interpret it as the address
of a rectangular
geotile, and save the bounding box defining its border for further use.
Likewise we can interpret
geohash strings like
u0 as a tile, and H3 strings like
as an hexagonal cell, saving the cell border as a polygon.
frequent_item_sets aggregation GAedit
frequent_item_sets aggregation has been moved from technical preview to general availability.
Release time_series and rate (on counter fields) aggegations as tech previewedit
time_series aggregation and
rate aggregation (on counter
fields) available without using the time series feature flag. This
change makes these aggregations available as tech preview.
Currently there is no documentation about the
This will be added in a followup change.
Intro to Kibana
ELK for Logs & Metrics