This Week in Elasticsearch and Apache Lucene: Algorithms that power Lucene and Elasticsearch
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Top News
Video of my #bbuzz talk is now online: Algorithms that power Lucene and Elasticsearch https://t.co/CVQQWQndAC
— Adrien Grand (@jpountz) June 3, 2015
Elasticsearch Core
- Bulk API: Allow null values in the bulk action/metadata line parameters (#11459, 2.0.0, 1.6.0)
- Internal: Use the smallest version rather than the default version (#11475, 1.6.0)
- Core: Reduce shard inactivity timeout to 5m (#11479, 2.0.0, 1.6.0)
- Aggs: Allow
aggregations_binary
to build and parse (#11473, 2.0.0, 1.6.0, 1.5.3) - Aggs: Fix bug where
moving_avg
prediction keys are appended to previous prediction (#11465, 2.0.0) - Network: Default to binding to loopback address (#11483, 2.0.0)
- Internal: Minimize the usage of guava classes in interfaces, return types, arguments (#11501, 2.0.0)
- Transport: ClusterHealth shouldn't fail with "unexpected failure" if master steps down while waiting for events (#11493, 2.0.0, 1.6.0)
- Dependencies: update maven-assembly-plugin to 2.5.5 (#11518, 2.0.0)
- Snapshot/Restore: Blob store shouldn't try deleting the write.lock file at the end of the restore process (#11517, 2.0.0)
- GatewayAllocator: reset rerouting flag after error (#11519, 2.0.0, 1.6.0)
- Build: Don't shade core artifacts (#11522, 2.0.0)
- Settings: Make prompt placeholders consistent with existing placeholders (#11514, 1.6.0)
- Scripting: Execute Scripting Engine before searching for inner templates in template query (#11512, 2.0.0)
- Plugins: deprecate
addQuery
methods that are going to be removed in 2.0 (#11532, 1.6.0) - Plugins: one single global way to register custom query parsers (#11481, 2.0.0)
- Dependencies: use released lucene 5.2 jar (#11534, 2.0.0)
- Recovery: Fix recovered translog ops stat counting when retrying a batch (#11536, 2.0.0)
- Core: Improve exception message when shard has a partial commit (
segments_N
file) due to prior disk full (#11539, 1.6.0) - Core: Add node setting to send SegmentInfos debug output to System.out (#11546, 2.0.0, 1.6.0)
- Suggest API: Deprecate filter option in
PhraseSuggester
collate (#11445, 1.6.0) - Core: Fail shard if search execution uncovers corruption (#11440, 2.0.0, 1.6.0)
- Mapping: Refactor core index/query time properties into FieldType (#11422, 2.0.0)
- Mapping: Added epoch date formats to configure parsing of unix dates (#11453, 2.0.0)
- Snapshot/Restore: Sync up snapshot shard status on a master restart (#11450, 2.0.0, 1.6.0)
- Settings: Require units for time and byte-sized settings, take 2 (#11437, 2.0.0)
- Recovery: Restart recovery upon mapping changes during translog replay (#11363, 2.0.0)
- IdsQueryBuilder: Allow to add a list in addition to array (#11409, 2.0.0)
- Settings: Rename settings to prevent watcher clash (#11359, 2.0.0)
- Internal: Allow ActionListener to be called on the network thread (#10573, 2.0.0, 1.6.0)
- Snapshot/Restore: Improve snapshot creation and deletion performance on repositories with large number of snapshots (#8969, 2.0.0)
- REST: Add all meta fields to the top level json document in search response (#8131, 2.0.0)
Apache Lucene
- 5.2.0 is released!
- There's a sudden push to find and fix the IBM J9 JVM bugs that Lucene's tests uncover, after this response on the Elasticsearch forums: let cyberneko peek into package-protected APIs, temporarily disable this test because something is wrong with how J9 handles unicode filenames, ignore the ClassCache reaper thread,
ArrayIndexOutOfBoundsE
in fieldcache (sidestepped if you passxception -Xint
to J9) - Tests now assert that queries do not compute scores when they were asked not to
- Simplify Lucene's file-based lock API to prevent future bugs like this baddie
- Properly handle co-linear path segments in
geo3d
- Fast experimental BKD tree based geo-spatial search has landed, and there's a new issue to speed up its polygon intersection queries, sharing logic from GeoPointField, with a new London, UK video showing the improvement
- Some small cleanups to the new document-based suggester APIs
- Split up one of Lucene's monster tests so devs with god-like boxes can run them concurrently
Geo3D
can now model the earth more accurately as a slightly squashed sphere- Do not load norms if
termquery
won't compute scores - Lucene's
segments_N
file should directly store the version that wrote it, and the version of the oldest segment in the index - A new expert constructor will let you create
IndexWriter
from an already openedIndexReader
, letting you efficiently upgrade reader to reader+writer QueryNodeImpl.removeFromParent
was doing nothing in a very costly mannerIndexWriter
should not accept a lock timeout: such logic should be done at a higher level- More iterations to add a fast point-in-shape geo-spatial API
- A commit with no user-data changes should also be reflected in NRT reopen
ant idea
was failing to copy code style settings for IDEA- Improve how
SpanMultiTermQueryWrapper
limits which terms to search when there are too many
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole ELK ecosystem including news, learning resources and cool use cases!