This Week in Elasticsearch and Apache Lucene - Query Profiler and Geopoint Fields
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Top News
Elasticsearch 2.2.0 released with a query profiler and supercharged geopoint fields. https://t.co/CKjrpuLKYg Already available on Found
— elastic (@elastic) February 2, 2016
Elasticsearch Core
2.2:
- The SmbDirectoryWrapper in the Azure plugin is now an elasticsearch package to avoid hiding bugs like calling ensureOpen on the wrong directory.
- The NotSerializableExceptionWrappe
r now includes the exception class name.
2.x:
- Upgrade to lucene 5.5.0-snapshot-4de5f1d
- RPM and deb signing is now tested during build.
- Requests that shouldn't be allowed according to CORS settings are now rejected before they are executed.
master:
- The plugin CLI has been refactored to reduce complexity and ambiguity, and to improve exception handling.
- The ingest pipeline adds processor tags to the ingest metadata on failure.
- Catch processor/pipeline exceptions and throw structured exceptions.
- Added the foreach processor to ingest for dealing with arrays.
- Prevent index/delete/flush requests from bouncing between two primary shard copies during relocation.
- Shard failure requests for no longer existing shards3 now generate an exception.
- Clean handoff during primary relocation now ensures that no index/delete requests are lost. This fixes a long standing issue: Delete might returns false `isFound()` while primary is relocated
- Tasks can report their status.
- The settings filter to remove private settings is now immutable.
- Pluggable custom gateways are no longer supported.
- Shard version information is no longer used for shard routing now that we have allocation IDs.
- The bin/plugin script is now called bin/elasticsearch-plugin.
- The TermVector API no longer supports the DFS option as it was very heavy and added little value.
- The cat API now respects the Accepts header instead of the Content-type header, when choosing the response format.
- The IndicesFieldDataCache has been simplified and no longer uses Guice.
- MessageDigest instances are no longer cloned (as some platforms don't support it) but return thread local instances instead.
Ongoing:
- The reindex API can be run in the background with the wait_for_completion parameter, which defaults to `true`. It also supports a progress indicator.
- Unify plugin packaging structure across projects.
- Index folders will now include the index UUID (and sanitise the index name to avoid problems with different file systems).
- Work continues on the monumental search refactoring.
Apache Lucene
- The current plan is to cut the 5.5.0 release branch in a few days and once the 5.5.0 release is done we'll get the 6.0.0 release process underway!Â
- More progress on the challenging change to push retrying of file deletion down under theÂ
Directory
 abstraction, instead of making it the caller's job - The new postings-based geo point queries are graduating from the experimental sandbox module into the spatial module, and the previous spatial module classes (with optionalÂ
spatial4j
 dependency) are moving to a newÂspatial-extras
 module, as a precursor to nice geo point performance gains added in a backwards compatible way - Our copyright headers now appear at the very top of all sources, and our IDE configs are now fixed do so for new source files as wellÂ
- Randomized tests uncovered a missing try-with-resources in the newÂ
SimpleTextPointWriter
 - We don't need toÂ
Files.deleteIfExists
 when creating a new index file, since we already pass theÂTRUNCATE_EXISTING
 option IndexWriter
 now logs how long it took to flush each part of a new segmentÂ- Now it's possible to fully wrap anotherÂ
MergePolicy
 - More geo math tweaks to avoid exceeding the legal range for latitude and longitudeÂ
- A newÂ
expectThrows
 utility uses lambda expressions to compactly expect a test to throw a specific exception and fail otherwise, but we still need to somehow cutover numerous tests BaseMergePolicyTestCase
 is now used by more tests, but it caused a reproducible test failure fixed by this issueÂTieredMergePolicy
 had an extraÂ=
 in an exception messageÂ- The newÂ
TestSwappedIndexFiles,
Âdesigned to ensure that copying the same file name from a different index is always detected as corruption, had a scary failure, but it was a simple test bug - Some more small fixes for the new (coming in 6.0.0) point values:Â
- Point fields failed to detect some misuseÂ
- 2D point values are also exercised in a few more testsÂ
- We now testÂ
addIndexes
 with point values when the field numbers changed, and across different codecs - The newÂ
BasePointFormatTestCase
 shares common code and makes it easy to test new point formats in the future - Tests that assert two readers are equal now also verify the point values are the sameÂ
- Codec level encryption offers fine-grained control over which parts of the index need encryptionÂ
- Another corner case geo point test failureÂ
FastVectorHighlighter
 hitsÂStr<wbr>ingIndexOutOfBoundsException
 in some cases - We should standardize onÂ
TimeUnit
 for time conversions - The points based and postings based geo implementations use different encodings with different quantization errorsÂ
MultiCollector
 might throwÂNullPointerException
 when one if its sub-collectors throws CollectTerminatedExcept<wbr>ion
Â- A new utility class runs aÂ
TokenFilter
 on a string and prints the results to help debuggingÂ
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!