This Week in Elasticsearch and Apache Lucene - 2016-04-25
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Top News
Forbes: Amazing Big Data At NASA with Elasticsearch: Real Time Analytics 150 Million Miles From Earth
— dbaldassano (@dbaldassano) April 15, 2016
Elasticsearch Core
Changes in 2.x:
- The index name was missing from the search slowlog.
- CamelCase is deprecated (and has deprecation logging).
- MoreLikeThis now handles aliases correctly.
Changes in master:
- The .percolator type has been replaced with the percolator field datatype.
- Added a fingerprint token filter and fingerprint analyzer for duplicate detection.
- TransportReplicationAction has been signficantly refactored in order to make it unit testable.
- RPM and Deb packages now set permissions explicitly, instead of relying on umasks.
- Indexed scripts and templates are now stored in the cluster state, and are called "stored" scripts/templates.
- Parameter names in ingest processors are now more consistent.
- IP fields support range queries again.
- readNamedWriteable and writeNamedWriteable are now public, and writable.readFrom is gone.
- UUID generators moved out of Strings, to avoid spooky action at a distance.
- The `action.realtime_get` setting has been removed.
- Support for unquoted JSON keys can be allowed via a system property, for bwc purposes.
- Cross-type mapping updates were not working for boolean fields.
- Empty task IDs are now serialised in 1 byte, so that every task can have a parent ID.
- Reindex child tasks weren't being marked as such.
- Validation failures have been removed from the cluster health response.
- Object fields now inherit their dynamic setting from their parent object or type.
- Thread local leaks when running in web containers have finally been fixed.
- Added a safeguard to protect against too-large rescore windows.
- The elasticsearch-plugin script now prints the download URL of the plugin when in verbose mode, and has friendlier error messages.
- The startup script now fails with an error code if the elasticsearch binary is not found or is not executable.
- CamelCase support has been removed.
- The ICU analyzer now accepts custom rule files.
Ongoing changes:
- Dots in fields names are now supported, but so far only if the parent fields already exist. Tests are being added to make sure supporting dots fully doesn't break anything.
- Persistence of results of long running tasks.
- A `minhash` token filter for estimating the Jacard similarity coefficient between two docs.
- Pipeline aggs are only needed on the coordinating node.
- Adding searchable/aggregatable tags to fields in the field stats API.
- Inner hits will no longer support the top-level syntax as the inline syntax has been improved.
-
It should be possible to pass include/exclude values to the terms aggs using the same format that was used to render bucket keys.
- Deleted index tombstones close to being merged.
Apache Lucene
News about Apache Lucene will be back next week.
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!