February 8, 2016

This Week in Elasticsearch and Apache Lucene - Query Profiler and Geopoint Fields

Clinton Gormley Shaunak Kashyap Michael McCandless

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Top News

Elasticsearch 2.2.0 released with a query profiler and supercharged geopoint fields. https://t.co/CKjrpuLKYg Already available on Found
— elastic (@elastic) February 2, 2016

Elasticsearch Core

2.2:

The SmbDirectoryWrapper in the Azure plugin is now an elasticsearch package to avoid hiding bugs like calling ensureOpen on the wrong directory.
The NotSerializableExceptionWrapper now includes the exception class name.

2.x:

Upgrade to lucene 5.5.0-snapshot-4de5f1d
RPM and deb signing is now tested during build.
Requests that shouldn't be allowed according to CORS settings are now rejected before they are executed.

master:

The plugin CLI has been refactored to reduce complexity and ambiguity, and to improve exception handling.
The ingest pipeline adds processor tags to the ingest metadata on failure.
Catch processor/pipeline exceptions and throw structured exceptions.
Added the foreach processor to ingest for dealing with arrays.
Prevent index/delete/flush requests from bouncing between two primary shard copies during relocation.
Shard failure requests for no longer existing shards3 now generate an exception.
Clean handoff during primary relocation now ensures that no index/delete requests are lost. This fixes a long standing issue: Delete might returns false `isFound()` while primary is relocated
Tasks can report their status.
The settings filter to remove private settings is now immutable.
Pluggable custom gateways are no longer supported.
Shard version information is no longer used for shard routing now that we have allocation IDs.
The bin/plugin script is now called bin/elasticsearch-plugin.
The TermVector API no longer supports the DFS option as it was very heavy and added little value.
The cat API now respects the Accepts header instead of the Content-type header, when choosing the response format.
The IndicesFieldDataCache has been simplified and no longer uses Guice.
MessageDigest instances are no longer cloned (as some platforms don't support it) but return thread local instances instead.

Ongoing:

The reindex API can be run in the background with the wait_for_completion parameter, which defaults to `true`. It also supports a progress indicator.
Unify plugin packaging structure across projects.
Index folders will now include the index UUID (and sanitise the index name to avoid problems with different file systems).
Work continues on the monumental search refactoring.

Apache Lucene

The current plan is to cut the 5.5.0 release branch in a few days and once the 5.5.0 release is done we'll get the 6.0.0 release process underway!
More progress on the challenging change to push retrying of file deletion down under the Directory abstraction, instead of making it the caller's job
The new postings-based geo point queries are graduating from the experimental sandbox module into the spatial module, and the previous spatial module classes (with optional spatial4j dependency) are moving to a new spatial-extras module, as a precursor to nice geo point performance gains added in a backwards compatible way
Our copyright headers now appear at the very top of all sources, and our IDE configs are now fixed do so for new source files as well
Randomized tests uncovered a missing try-with-resources in the new SimpleTextPointWriter
We don't need to Files.deleteIfExists when creating a new index file, since we already pass the TRUNCATE_EXISTING option
IndexWriter now logs how long it took to flush each part of a new segment
Now it's possible to fully wrap another MergePolicy
More geo math tweaks to avoid exceeding the legal range for latitude and longitude
A new expectThrows utility uses lambda expressions to compactly expect a test to throw a specific exception and fail otherwise, but we still need to somehow cutover numerous tests
BaseMergePolicyTestCase is now used by more tests, but it caused a reproducible test failure fixed by this issue
TieredMergePolicy had an extra = in an exception message
The new TestSwappedIndexFiles, designed to ensure that copying the same file name from a different index is always detected as corruption, had a scary failure, but it was a simple test bug
Some more small fixes for the new (coming in 6.0.0) point values:
- Point fields failed to detect some misuse
- 2D point values are also exercised in a few more tests
- We now test addIndexes with point values when the field numbers changed, and across different codecs
- The new BasePointFormatTestCase shares common code and makes it easy to test new point formats in the future
- Tests that assert two readers are equal now also verify the point values are the same
Codec level encryption offers fine-grained control over which parts of the index need encryption
Another corner case geo point test failure
FastVectorHighlighter hits Str<wbr>ingIndexOutOfBoundsException in some cases
We should standardize on TimeUnit for time conversions
The points based and postings based geo implementations use different encodings with different quantization errors
MultiCollector might throw NullPointerException when one if its sub-collectors throws CollectTerminatedExcept<wbr>ion
A new utility class runs a TokenFilter on a string and prints the results to help debugging

Watch This Space

Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!