6.6.0 has arrived!
This release has new features across the stack that simplify how you manage and scale your cluster, faster geoshape indexing and querying with more efficient storage, and key improvements to Elasticsearch SQL, machine learning, Auditbeat, and more!
Manage Data Lifecycle at Scale with Index Lifecycle Management
Users with time series use cases like logging, metrics, and APM, typically store data in time-based indexes. As this data ages, there are a number of ways to ensure it’s being stored in the most cost-effective way. For example, as the index ages, the user might want to shrink the number of shards or reduce the number of replicas used to store the index, or move it to nodes deployed on cheaper hardware. Or they might want to delete indices that are older than a certain age. Existing methods for defining policies to manage the lifecycle of the index live outside the cluster (for example, Curator or custom automation scripts), are limited, and introduce management overhead to configure and monitor. The new index lifecycle management feature provides a more integrated and streamlined way to manage this data, making it easier to live with best practices.
The index lifecycle management feature breaks the lifecycle of an index into four phases: hot, warm, cold, and delete phase. You can define an index lifecycle policy which allows you to:
- Have one primary shard on each hot node to maximize indexing throughput.
- Replace the hot index with a new empty index as soon as the existing index is “full” or after a time period.
- Move the old index to warm nodes, where it can be shrunk to a single shard and force-merged down to a single segment for optimized storage and querying.
- Later, move the index to cold nodes for cheaper storage.
In future release, you’ll be able to “freeze” the index, putting it in a state which trades storage density for search latency. And finally, delete the index once it is no longer useful to you.
All of this is handled for you automatically by index lifecycle management.
Frozen Indices Enable Higher Storage to Memory Ratios
Elasticsearch is highly optimized to perform searches as quickly and efficiently as possible. So historically, each open (read: searchable) index used a small amount of memory to ensure that any query hitting that index would execute fast. The bigger the size and number of indexes on a given node, the more memory required to keep indexes in this open state. This effectively means that there are practical limits to the amount of storage a given node would be able to address with a single JVM. For most users and use-cases, this is not an issue. However in some cases, like those requiring long term archive of multiple years of data for regulatory reasons, there is a desire to keep the data online and searchable, but less of a need for peak performance on requests over the older data.
Frozen indices allow for a much higher ratio of disk storage to heap, at the expense of search latency. When an index is frozen, it takes up no heap, allowing a single node to easily manage thousands of indices with very low overhead. When a search targets frozen indices, the query will fully open, search, and then close each index sequentially. Frozen indices are replicated, unlike closed indexes. Frozen indices provide a new set of choices for how to optimize your cluster cost and performance around your needs.
Faster and Smaller Bkd-Backed Geoshapes
The Bkd tree data structure keeps delivering. Back in 5.0, we introduced Bkd-backed geopoints, which resulted in significant storage, memory and performance improvements for querying geopoints. With 6.6.0, we bring the same Bkd-based benefits to geoshapes! We’ve achieved the triple-crown of search - Indexing is faster, it will take up less space on disk, and will use less memory.
Elasticsearch SQL Adds Support for Date Histograms
Elasticsearch SQL continues to march toward GA with a slew of improvements addressing time queries, including native support for date histograms using SQL syntax. These improvements are great for all users of Elasticsearch SQL, but we expect that date histogram support will be most impactful for Canvas users, making it easier to build time series charts in Kibana.
Machine Learning Introduces Annotations
When investigating a potential system or security issue, it’s natural to want to record your findings and progress - recording the root cause of a system issue and steps taken to resolve, etc. Now, directly inside the machine learning UI, you can create annotations that all users can see. This simplifies collaboration and allows you to keep a record of actions taken, without leaving Kibana.
Elastic APM Adds New Agent Metrics
APM is introducing agent metrics in the 6.6 release. The latest version of our agents will now automatically report system and process-level CPU and memory metrics, along with traces and errors.
In other news, Distributed Tracing is now generally available, and all agents are OpenTracing compliant. Lastly, the APM UI is making it effortless to jump from APM to relevant Logging or Infrastructure views, and the Java agent has introduced two new great features. Read the APM blog post for all the details.
But wait, there’s more …
In addition to all these, we also added several new features and improvements in Beats, Logstash, and Kibana.
Auditbeat added a new system module to collect various security related information from the system. This includes data about the operating system, processes, sockets and users existing on a particular host. You can read more about the Auditbeat system module in the dedicated blog post.
Machine learning now comes prepackaged with machine learning jobs for Auditbeat data, giving users a jump start on detecting common anomalies in their audit data.
Filebeat adds a new NetFlow input, which can be used to receive these Netflow and IPFIX records over UDP. It supports NetFlow v1, v5, v6, v7, v8, v9 and IPFIX.
When using Beats Central Management, you can now configure Metricbeat and Filebeat to reject modifications to some parts of their configuration. This allows effective enforcement at the running beat level, of what can be modified by the remote configuration. To enhance secure operation, we now block modifications to the console output and the file output sections by default.
On the Logstash side, the more performant Java execution engine, that was introduced as a beta in 6.1, is now generally available.
On the Kibana side, we introduced a highly requested feature that allows a single Kibana instance to connect to multiple Elasticsearch nodes, which circumvents the previous challenges with having a single point of failure on the Kibana <> Elasticsearch connection. On the visualization front, Kibana dashboards can now be exported as PNGs. You can read the Kibana 6.6 release highlights for details on these and other changes.