Today we are pleased to announce the release of Elasticsearch 7.0.0-beta1, based on Lucene 8.0.0-SNAPSHOT. This is the third in a series of pre-7.0.0 releases designed to let you test out your application with the features and changes coming in 7.0.0, and to give us feedback about any problems that you encounter. Open a bug report today and become an Elastic Pioneer!
|IMPORTANT: This is a beta release and is intended for testing purposes only. Indices created in this version may not be compatible with Elasticsearch 7.0.0 GA. Upgrading 7.0.0-beta1 to any other version is not supported.
DO NOT DEPLOY IN PRODUCTION
- Download Elasticsearch 7.0.0-beta1
- Elasticsearch 7.0.0-beta1 release notes
- Elasticsearch 7.0 breaking changes
You can read about all the changes in the release notes linked above, but there are a few changes which are worth highlighting:
Faster Queries for Top K Hits
When it comes to search, query performance is a key feature; some would say the key feature. We have achieved a significant improvement to search performance in Elasticsearch 7.0 for situations in which the exact hit count is not needed and it is sufficient to set a lower boundary to the number of results. For example, if your users typically just look at the first page of results on your site and don’t care about exactly how many documents matched, you may be able to show them “more than 10,000 hits” and then provide them with paginated results. It’s quite common to have users enter frequently-occurring terms like “the” and “a” in their queries, which has historically forced Elasticsearch to score a lot of documents even when those frequent terms couldn’t possibly add much to the score.
In these conditions (which are the typical conditions for many search usage scenarios) Elasticsearch can now skip calculating scores for records that are identified at an early stage as records that will not be ranked at the top of the result set. This can significantly improve the query speed. The actual number of top results that are scored is configurable, but the default is 10,000. The behavior of queries that have a result set that is smaller than this threshold will not change - i.e. the results count is accurate but there is no performance improvement for queries that match a small number of documents. Because the improvement is based on skipping low ranking records, it does not apply to aggregations. You can read more about this powerful algorithmic development in our blog post Magic WAND: Faster Retrieval of Top Hits in Elasticsearch, or better still, download 7.0.0 beta-1 and try it on your data and queries to see how much your query performance improves!
Intervals Query - The Next-Gen Span Query for Legal and Patent Search
Sometimes our users want to find records in which words or phrases are within a certain distance from each other. In areas like patent and legal search, this is the main way in which experts find documents. It used to be that the only way to do that was span queries, but now we are introducing a brand new way to construct such queries: interval queries. While span queries are a good tool, they are not always easy to use. Span queries do not use the analyzer, so the person performing the query has to be aware of the analyzer’s logic and perform actions like stemming. Since analyzers can be sophisticated, writing span query logic can be just as sophisticated (and complicated).
The new intervals query are not just easier to define, they also use the analyzer, so the person writing them does not have to be familiar with the transformations performed by the analyzer. In addition, intervals queries are based on sound mathematical research, published in the article Efficient Optimally Lazy Algorithms for Minimal-Interval Semantics. This allowed us to accurately deal with a number of edge cases that were not accurately handled with span queries. If you are making queries that relate to the distance between words or phrases, we believe a quick look at the simplicity of the intervals query DSL will have you quickly downloading 7.0.0 beta-1 and trying them out.
Script Score Query (a.k.a. Function Score 2.0)
If you haven’t already gathered, 7.0 is loaded with features for search use cases, and there’s another to add to the list: with 7.0.0-beta1, we are introducing the next generation of our function score capability. This new script_score query provides a new, simpler, and more flexible way to generate a ranking score per record. The script_score query is constructed of a set of functions, including arithmetic and distance functions, which the user can mix and match to construct arbitrary function score calculations. The modular structure is simpler to use and will open this important functionality to additional users. Try it out and let us know what you think!
Smooth Zoom on Maps with Geotile Grid
A new aggregation has been introduced to handle (geo) map tiles in a way that allows a user to zoom in and out on the map without any change to the fringes of the shape. The new geotile_grid aggregation groups geo_points into buckets that represent cells in a grid, with each cell correlating with a tile in a map. Prior to this change, the fringes of the shape could slightly change with the change in resolution (a.k.a the zoom level), because the rectangle tiles would change orientation at different zoom levels. You can try it with Kibana 7.0.0-beta 1 or your own mapping application that uses Elasticsearch 7.0 beta-1.
Nanosecond Precision Support
Up until now Elasticsearch could only store timestamps with millisecond precision., If you want to process events that occur at a higher rate, for example if you want to store and analyze tracing or network packet data in Elasticsearch, you may want higher precision. Historically, we have used the Joda time library to handle dates and times, and Joda lacked support for such high precision timestamps.
With JDK 8, an official Java time API has been introduced which can also handle nanosecond precision timestamps and over the past year, we’ve been working to migrate our Joda time usage to the native Java time while trying to maintain backwards compatibility. As of 7.0.0-beta1, you can now make use of these nanosecond timestamps via a dedicated date_nanos field mapper. Note that aggregations are still on a millisecond resolution with this field to avoid having an explosion of buckets.
Better Integration with Stack Features
We’re taking a look at a variety of Elasticsearch features and how they can have better defaults that really make things more seamless across the stack. In 7.0.0-beta1, we’ve made 4 such improvements.
First, our GeoIP ingest lookup processor is now shipped by default with Elasticsearch. This popular processor takes an IP address and resolves the latitude, longitude, and other structured location information and is widely used by data shipped from our ingest products -- Beats and Logstash. We’re likewise shipping the user_agent processor -- another popular processor -- by default now and in addition, user_agent ingest processor now uses the Elastic Common Schema for its field names. All of this should help to make typical logging and metrics use cases using Elasticsearch much less error prone and more compatible with common visualizations and dashboards users typically set up for these use cases.
Second, for those of you that have used Watcher, our alerting functionality, you may be aware of the cleaner service, which can delete old watch history indices after a period of time. Following on the heels of the initial release of Index Lifecycle Management (ILM), we’ve now integrated that cleaner service with ILM. This way, you can define the lifecycle/retention of these history indices the same way you work with other time-based data.
Finally, JSON logging is now enabled in Elasticsearch in addition to plaintext logs. Starting in 7.0.0-beta1, you will find new files with .json extension in your log directory. This means you can now use filtering tools like jq to pretty print and process your logs in a much more structured manner. You can also expect finding additional information like node.id, cluster.uuid, type (and more) in each log line. The “type” field per each JSON log line will let you to distinguish log streams when running on docker.
We’re working towards using these JSON logs to enable Beats to ingest Elasticsearch logs and provide more details to our Monitoring UI. For backwards compatibility reasons, Elasticsearch 7.0.0 will emit logs in both the old format and in JSON. You can read more on configuring your logging in our reference and migration guides.
Elasticsearch has supported encrypted communications for a long time, however, we recently started supporting JDK 11. JDK 11 now has TLSv1.3 support so starting with 7.0, we’re now supporting TLSv1.3 within Elasticsearch for those of you running JDK 11. In order to help new users from inadvertently running with low security, we’ve also dropped TLSv1.0 from our defaults. For those running older versions of Java, we have default options of TLSv1.2 and TLSv1.1. Have a look at our TLS setup instructions if you need help getting started.
High Level REST Client
If you’ve been following our blog or our GitHub repository, you may be aware of a task we’ve been working on for quite a while now: creating a next-generation Java client for accessing an Elasticsearch cluster. We started off by working on the most commonly-used features like search and aggregations, and have been working our way through administrative and monitoring APIs. Many of you that use Java are already using this new client, but for those that are still using the TransportClient, now is a great time to upgrade to our High Level REST Client, or HLRC.
As of 7.0.0-beta1, the HLRC now has all the API checkboxes checked to call it “complete” so those of you still using the TransportClient should be able to migrate. We’ll of course continue to develop our REST APIs and will add them to this client as we go. For a list of all of the APIs that are available, have a look at our HLRC documentation. To get started, have a look at the getting started with the HLRC section of our docs and if you need help migrating from the TransportClient, have a look at our migration guide.
Cluster Coordination Layer Improvements
In 7.0 Alpha 2, we released our new cluster coordination layer in Elasticsearch. With 7.0 Beta 1, the Elasticsearch cluster coordination layer received several improvements, including enhancements to rolling upgrade capabilities and faster cluster state publishing. Most importantly, users should be aware of the changes in Elasticsearch 7.0 to how a cluster performs discovery and is initially bootstrapped. A complete overview of the new cluster coordination layer in Elasticsearch 7.0 can be found in the Elasticsearch documentation. We love receiving feedback from our users, please don’t hesitate to create discuss topics or Github issues as you begin working with the new cluster coordination layer in Elasticsearch 7.0!
Soft-deletes by default: new indices automatically eligible for replication to other Elasticsearch clusters
In Elasticsearch 6.5, we released Cross Cluster Replication (CCR) as a beta feature. CCR requires any replicated index to maintain a history of document changes (when a document is updated or deleted) through the soft_deletes index setting on the leader index at index creation time. By retaining these soft deletes, a history can be maintained on the leader shards and replayed for replicating index changes to other Elasticsearch clusters. The soft_deletes index setting is required for CCR. Soft deletes will also be valuable for future Elasticsearch data replication improvements outside of CCR. Any newly created index has the soft_deletes setting enabled by default.