Today we are pleased to announce the release of Elasticsearch 6.4.0, based on Lucene 7.4.0. This is the latest stable release, and is already available for deployment via our Elasticsearch Service on Elastic Cloud.
Latest stable release in 6.x:
You can read about all the changes in the release notes linked above, but there are a number of changes which are worth highlighting. The Elasticsearch team has been busy: this is a super feature rich release!
We’ve added the ability to authenticate against Elasticsearch via Kerberos. Elasticsearch already supports a variety of authentication mechanisms and Kerberos is the latest addition. As Elasticsearch relies on HTTP to reach the REST interface, SPNEGO is used to enable Kerberos support. The Kerberos realm only handles authentication; you should use the role management APIs to define the roles assigned to Kerberos users.
While Elasticsearch has supported a number of encryption algorithms for many years, it was time to take this to the next level! Elasticsearch now has the ability to run with a FIPS 140-2 enabled JVM. Running your Elasticsearch cluster with FIPS 140-2 mode enabled isn’t a requirement everyone needs, but if you’re operating in a regulated environment that requires it, we’ve just made it substantially easier to deploy Elasticsearch.
Reloadable secure settings
In previous versions of Elasticsearch, if you needed to update a secure setting stored in the Elasticsearch keystore you had to restart a node to pick up the new settings. Now Elasticsearch plugins have the ability to read updated settings from the keystore. This functionality must be enabled in a plugin that needs to use it, so if you have custom plugins that want to enable this, they'll have to be updated. We have updated our S3, EC2, Azure, and GCS plugins to support this new functionality.
Plugin Signature Verification
In the past, if an external plugin was installed into Elasticsearch it was expected that whoever downloaded the plugin would verify the plugin matched the correct checksums and signatures. As a manual process, this is difficult and error prone. We will now automatically check the signatures of Elastic-provided plugins. We sign our plugins with a GPG key that will be verified during plugin install. A plugin that fails this integrity check will not install.
Security Token Service (STS) Support for EC2 & S3
AWS has the ability to specify temporary credentials using the AWS Security Token Service (STS). STS allows a credential to be provided that has a limited timespan that it is valid for. Support for these temporary credentials are now available for our S3 and EC2 plugins. The new option to use this feature is named
Search and Aggregations
It’s not uncommon for your field names to change over time. If you want to change the source JSON of all the old documents, we’ve had you covered with a reindex API that lets you do exactly that. But sometimes, you just want to be able to query on a new field name that you never indexed. Maybe you or someone on your team went a bit overboard on the brevity and named a hostname-related field
h at the beginning of the year and it’s now in
myindex-2018.01.27. Now you have clients that want to use a more descriptive
hostname field, but it’s not in the old data. You can index the field as
myindex-2018-09.01, but now you need to query 2 different fields. Wouldn’t it be nice to just have a little bridge to alias
hostname back to
h in the older data? In 6.4, now you can, with our new alias type! Just alias “hostname” to “h” in your old indices and your queries will match. Do be aware that this is just for queries (aliases don’t migrate data at index time) and that we don’t change the
_source of the document in the results. If you’re just/mostly running aggregations on the data, you may not need anything else, but if you’re simultaneously trying to read the values of
hostname, your application will want to understand the alias bit of the mapping as well.
Super-fast Korean Analyzer
We’ve introduced an entirely new analyzer (“Nori”) for the Korean language. It uses the same mecab-ko-dic dictionary you’re used to, but uses binary compression that makes it much smaller (only 24MB on disk from a starting size of 219MB) and based on our benchmarks, it’s almost 30 times faster than the popular Seunjeon community plugin for Korean analysis! Oh, and we use an off-heap structure so it won't cause spurious garbage collection events.
New Text Processing Options
As evidenced by Nori, at Elastic we’re always on a mission to make “fast” faster and with 6.4, we’ve worked on making phrase search faster as well. You can now use a new index_phrases option on text fields, which automatically indexes two-term word combinations (shingles) into a separate field so that phrase searches can be run more efficiently (at the cost of some disk space).
We’ve also introduced a new multiplexing token filter, which allows you to run tokens through multiple different token filters and stack the results. For example, you can now easily index the original form of a token, its lowercase form, and a stemmed form: all at the same position. This would allow you to search for stemmed and unstemmed tokens in the same field.
A few others
If all of the above isn’t enough for you, we’ve got a few more new features to let you know about in 6.4:
- We’ve added a new weighted average aggregation which lets specify a weight of each document and calculate the average based on those weights and the corresponding values.
- We’ve added Expected Reciprocal Rank (ERR) to our rank evaluation API, which already supports DCG/nDCG, MRR, and P@k.
- You can now collapse on a second level in the field collapse feature of search.
Clients & SQL
In Elasticsearch 6.3.0, we released our first version of SQL functionality. While it’s still experimental functionality, we’ve received great feedback on additional requests and in 6.4.0, we’ve already added a number of them. Here’s a shortlist of SQL enhancements we’re releasing with 6.4.0:
- The addition of CHAR, UCASE, LCASE, SPACE, LENGTH, and several other text manipulating functions
- Multiple fields can now be used in GROUP BY
- Prepared statement set* methods have been added to the JDBC driver
- The JDBC driver is now a single jar to simplify installation
There are also a variety of bug fixes in the release. Please keep the feedback coming!
Java High-level REST Client
We’ve continued to plug away at our Java high-level REST client, adding a slew of new APIs. You may be interested in keeping track of our progress as we continue to check-off remaining APIs, and a few outstanding community members have helped in contributing to the effort. If you’re a Java developer and haven’t made the switch yet from the Transport client, now would be a good time to start making the switch!
When we first released Painless, we were really excited that we could have a language which was built to handle the types of data and request patterns that search engines have. However, different places that Painless executes in have different data: a script that’s interacting with documents during a reindex has different data than a script that’s operating on a pipeline aggregation. As a result, Painless scripts have different contexts, and we’ve started documenting the various pieces of data and expected return of each of these contexts. With 6.4.0, we’re also adding more contexts to our execute endpoint, so you can test how your script would work in the context it will be in.
In Elasticsearch 6.2, we added support to identify and cancel tasks programmatically via an X-Opaque-Id header. The X-Opaque-Id header, when provided on an HTTP header, can be set to any value, say "client-application-1234", and any tasks then found and cancelled via searching for tasks with "client-application-1234". This is an incredibly useful feature, we're now extending support for the X-Opaque-Id header by adding this value to audit logs. Now operations can be audited with this same X-Opaque-Id value.
In Elasticsearch 6.3, we added an experimental feature to allow "rolling up" data to store the aggregate statistics. We're continuing to add query capabilities to the _rollup_search API, and as of 6.4, it now allows the terms query. 6.4 also addresses a somewhat serious bug in our internal ID generation for the rolled-up documents. If you've already created rollup jobs, we recommend re-running them.