29 February 2016

This Week in Elasticsearch and Apache Lucene - Core Changes

•

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Top News

What would life be like without #Elasticsearch? #Elasticon attendees answer: https://t.co/46K48dul4Y pic.twitter.com/3dDINVzDec
— elastic (@elastic) February 19, 2016

Elasticsearch Core

Changes in 2.2:

Snapshot/restore now verifies that the index being restored is compatible with the version of the node doing the restore.
The bulk API no longer broadcasts deletes to all shards, and will fail if custom routing is enabled and no routing value is specified.

Changes in 2.3:

Nodes will only accept transport requests once they are fully initialized.
Groovy accepted our pull request which means that the suppressAccessChecks permission is no longer required.

Changes in master:

Document IDs now have a hard limit of 512 bytes
The HTTP address and port is now available in cat-nodes and cat-nodeattrs
The Painless scripting language is now a module, which means that it will ship by default.
Log4J is now the only supported logger wrapper and may yet be removed in favour of java.util.logging.
Using a custom network.host setting as a proxy for "production cluster" allows us to upgrade soft warnings (in dev mode) to hard exceptions. This change has proved controversial as configuring max open file handles on OSX is overly complex.
Elasticsearch now checks on startup that all data paths are writable.
G1GC on early versions of HotSpot v25 are buggy.
Some hot methods have been refactored so that they can be inlined.
Various unused/unneeded settings have been removed: es.max-open-files, es.netty.gathering, es.useLinkedTransferQueue, line.separator, action.search.optimize_single_shard

Ongoing changes:

Tasks now have timestamps to see how long they have been running.
Task IDs are now represented as single strings instead of tuples of node ID and task ID.
Index names will no longer be tied to the name of the index folder on disk.
Dangling indices will no longer be imported if the cluster UUID of the index is the same as the current cluster UUID (which indicates that the index was deleted while a node was incommunicado).
The segments API will be able to return the disk use by Lucene file-type.
Work continues on trying to allow dots in field names.

Apache Lucene

Lucene 5.5.0 was officially released on February 22nd, but whether a 5.6.0 release will happen even after we switch to 6.x stable releases has proven to be strangely contentious
Lucene 6.0.0 release process will begin early this week, with cutting the 6.x branch
The CheckIndex tool would sometimes hit an exception-during-exception (an exception when trying to throw another exception due to index corruption), because BytesRefBuilder.<wbr>toString is not allowed
Lots of scrutiny and many improvements to the new points queries in preparation for the 6.0.0 release:
Even more verbosity for a non-reproducible test failure that only fails on OS X, rarely
Another fix in the long tail of our switch from Subversion to git
The silly things we must do to silence our overly naggy java compiler
More improvements to MMapDirectoryin preparation for Java 9, but we continue to uncover new Java 9 bugs like this serious bug in method handles though progress is being made towards a fix
The Java 9 bug Lucene's tests uncovered last week has been resolved as a duplicate of another (already fixed but not yet released) bug
Creating a hashCode that does not accidentally cause high collision rate is not easy!
800+ new top-level-domains have been created since we last fixed StandardTokenizer to detect them!
Heavy delete-by-query use in Lucene is costly
Lucene's range faceting can't yet handle multi-valued fields (patches welcome!)
The legacy spatial code will move to a new spatial-extras module soon

Watch This Space

Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!

The Search AI Company

ELK Stack

Elastic Cloud

Generative AI

Search

Security

Observability

By solution

Industries

Customer spotlight

Research

Build

Learn

Connect

This Week in Elasticsearch and Apache Lucene - Core Changes

Top News

Elasticsearch Core

Apache Lucene

Watch This Space

Follow us

About us

Join us

Press

Partners

Trust & Security

Investor relations

EXCELLENCE AWARDS