IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

› ›

Elasticsearch highlights

edit

IMPORTANT: This documentation is no longer updated. Refer to Elastic's version policy and the latest documentation.

Elasticsearch highlights

edit

This list summarizes the most important enhancements in Elasticsearch 7.9. For the complete list, go to Elasticsearch release highlights.

Fixed retries for cross-cluster replication

edit

Cross-cluster replication now retries operations that failed due to a circuit breaker or a lost remote cluster connection.

Fixed index throttling

edit

When indexing data, Elasticsearch and Lucene use heap memory for buffering. To control memory usage, Elasticsearch moves data from the buffer to disk based on your indexing buffer settings. If ongoing indexing outpaces the relocation of data to disk, Elasticsearch will now throttle indexing. In previous Elasticsearch versions, this feature was broken and throttling was not activated.

EQL

edit

EQL (Event Query Language) is a declarative language dedicated for identifying patterns and relationships between events.

Consider using EQL if you:

Use Elasticsearch for threat hunting or other security use cases
Search time series data or logs, such as network or system logs
Want an easy way to explore relationships between events

A good intro on EQL and its purpose is available in this blog post. See the EQL in Elasticsearch documentaton for an in-depth explanation, and also the language reference.

This release includes the following features:

Event queries
Sequences
Pipes

An in-depth discussion of EQL in ES scope can be found at #49581.

Data streams

edit

A data stream is a convenient, scalable way to ingest, search, and manage continuously generated time series data. They provide a simpler way to split data across multiple indices and still query it via a single named resource.

See the Data streams documentation to get started.

Enable fully concurrent snapshot operations

edit

Snapshot operations can now execute in a fully concurrent manner.

Create and delete operations can be started in any order
Delete operations wait for snapshot finalization to finish, are batched as much as possible to improve efficiency and, once enqueued in the cluster state, prevent new snapshots from starting on data nodes until executed
Snapshot creation is completely concurrent across shards, but per shard snapshots are linearized for each repository, as are snapshot finalizations

Improve speed and memory usage of multi-bucket aggregations

edit

Before 7.9, many of our more complex aggregations made a simplifying assumption that required that they duplicate many data structures once per bucket that contained them. The most expensive of these weighed in at a couple of kilobytes each. So for an aggregation like:

POST _search
{
  "aggs": {
    "date": {
      "date_histogram": { "field": "timestamp", "calendar_interval": "day" },
      "aggs": {
        "ips": {
          "terms": { "field": "ip" }
        }
      }
    }
  }
}

When run over three years, this aggregation spends a couple of megabytes just on bucket accounting. More deeply nested aggregations spend even more on this overhead. Elasticsearch 7.9 removes all of this overhead, which should allow us to run better in lower memory environments.

As a bonus we wrote quite a few Rally benchmarks for aggregations to make sure that these tests didn’t slow down aggregations, so now we can think much more scientifically about aggregation performance. The benchmarks suggest that these changes don’t affect simple aggregation trees and speed up complex aggregation trees of similar or higher depth than the example above. Your actual performance changes will vary but this optimization should help!

Optimize `date_histograms` across daylight savings time

edit

Rounding dates on a shard that contains a daylight savings time transition is currently drastically slower than when a shard contains dates only on one side of the DST transition, and also generates a large number of short-lived objects in memory. Elasticsearch 7.9 has a revised and far more efficient implemention that adds only a comparatively small overhead to requests.

Improved resilience to network disruption

edit

Elasticsearch now has mechansisms to safely resume peer recoveries when there is network disruption, which would previously have failed any in-progress peer recoveries.

Wildcard field optimised for wildcard queries

edit

Elasticsearch now supports a wildcard field type, which stores values optimised for wildcard grep-like queries. While such queries are possible with other field types, they suffer from constraints that limit their usefulness.

This field type is especially well suited for running grep-like queries on log lines. See the wildcard datatype documentation for more information.

Indexing metrics and back pressure

edit

Elasticsearch 7.9 now tracks metrics about the number of indexing request bytes that are outstanding at each point in the indexing process (coordinating, primary, and replication). These metrics are exposed in the node stats API. Additionally, the new setting indexing_pressure.memory.limit controls the maximum number of bytes that can be outstanding, which is 10% of the heap by default. Once this number of bytes from a node’s heap is consumed by outstanding indexing bytes, Elasticsearch will start rejecting new coordinating and primary requests.

Additionally, since a failed replication operation can fail a replica, Elasticsearch will assign 1.5X limit for the number of replication bytes. Only replication bytes can trigger this limit. If replication bytes increase to high levels, the node will stop accepting new coordinating and primary operations until the replication work load has dropped.

Inference in pipeline aggregations

edit

In 7.6, we introduced inference that enables you to make predictions on new data with your regression or classification models via a processor in an ingest pipeline. Now, in 7.9, inference is even more flexible! You can reference a pre-trained data frame analytics model in an aggregation to infer on the result field of the parent bucket aggregation. The aggregation uses the model on the results to provide a prediction. This addition enables you to run classification or regression analysis at search time. If you want to perform analysis on a small set of data, you can generate predictions without the need to set up a processor in the ingest pipeline.

« Beats highlights Kibana highlights »

Elasticsearch highlights

Elasticsearch highlights

Fixed retries for cross-cluster replication

Fixed index throttling

EQL

Data streams

Enable fully concurrent snapshot operations

Improve speed and memory usage of multi-bucket aggregations

Optimize date_histograms across daylight savings time

Improved resilience to network disruption

Wildcard field optimised for wildcard queries

Indexing metrics and back pressure

Inference in pipeline aggregations

Optimize `date_histograms` across daylight savings time