Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Top News
We were among the first to use and contribute to @elastic… now, more than 1,500 government websites use #Elasticsearch #BigDataMed
— Taha Kass-Hout (@DrTaha_FDA) May 22, 2015
Elasticsearch Core
- Networking: Default value for socket reuse should not be null (#11255, 2.0.0, 1.6.0)
- Build: Remove useless outdated forbidden-api version in
-Pdev
config. (#11273, 2.0.0) - Transport: Fix NPE when streaming commit stats (#11266, 2.0.0)
- Mappings: Cleanup names handling (#11272, 2.0.0)
- Shard: Check if the index can be opened and is not corrupted on state listing (#11269, 2.0.0, 1.6.0)
- Internal:
Uid#createTypeUids
to accept a collection of ids rather than a list (#11263, 2.0.0) - Transport: remove support for reading/writing list of strings, use arrays instead (#11276, 2.0.0)
- Renaming reducers to Pipeline Aggregators (#11275, 2.0.0)
- Transport: Add profile name to
TransportChannel
(#11261, 2.0.0) - Mappings: Remove generics from FieldMapper (#11292, 2.0.0)
- Startup: Allow disabling of sigar via settings (#11293, 2.0.0, 1.6.0)
- Allocation: Async fetch of shard started and store during allocation (#11262, 1.6.0)
- Dependencies: Upgrade to lucene-5.2.0-snapshot-1681024 (#11296, 2.0.0)
- Mappings: Remove the
compress
/<code>compress_threshold options of the BinaryFieldMapper. (#11280, 2.0.0) - Logging: Add logging for failed TTL purges (#11302, 2.0.0)
- Dependencies: Upgrade to Netty 3.10.3 (#11304, 2.0.0, 1.6.0)
- Search: Minor refactor of
MultiValueMode
removing apply and reduce (#11290, 2.0.0) - Plugins: Only load a plugin once from the classpath (#11301, 2.0.0, 1.6.0)
- Refactoring: Make some sec mgr / bootup classes package private and final. (#11312, 2.0.0)
- Build: Remove build duplication (#11315, 2.0.0)
- Internal: Absorb ImmutableSettings into Settings (#11321, 2.0.0)
- Refactoring: Rename
TransportShardReplicationOperationAction
to <code>TransportReplicationAction (#11332, 2.0.0) - Search: Don't truncate TopDocs after rescoring (#11342, 2.0.0, 1.6.0, 1.5.3)
- Build: Fix paths for final artifact (#11319, 2.0.0)
- Mapping: custom analyzers names and aliases must not start with _ (#11303, 2.0.0, 1.6.0)
- Recovery: Seal indices for faster recovery (#11179, 2.0.0, 1.6.0)
- Search: Make FilteredQuery a forbidden API. (#11224, 2.0.0)
- Allocator: Ensure we mark store as corrupted if we fail to read the segments info (#11230, 2.0.0, 1.6.0)
- Mappings: Remove
SmartNameFieldMappers
(#11216, 2.0.0) - Cleanup: Removed generic types from ContextAndHeaderHolder and
HasHeaders#putHeader()
(#11231, 1.6.0) - Core: Don't allow indices containing too-old segments to be opened (#11072, 2.0.0)
- Internal: Fix CompressedString.equals. (#11233, 2.0.0, 1.6.0, 1.5.3)
- Mappings: Remove document parse listener (#11243, 2.0.0)
- Cleanup: Remove generics need in
ContextAndHeaderHolder
(#11222, 2.0.0) - Pre sync flush cleanups (#11252, 2.0.0)
- Recovery: Add engine failure on recovery finalization corruption back (#11241, 2.0.0)
- Async fetch of shard started and store during allocation (#11101, 1.6.0)
- Snapshot/Restore: Fix
FSRepository
location configuration (#11157, 2.0.0) - Mappings: Make mapping updates atomic wrt document parsing. (#11205, 2.0.0)
- Aggregation fix:
sampler
agg could not be used with Terms agg’s order. (#10785, 2.0.0) - Snapshot/Restore: check that reading indices is allowed before creating their snapshots (#11133, 2.0.0, 1.6.0)
- Snapshot/Restore: Batching of snapshot state updates (#10295, 2.0.0, 1.6.0)
- Build: Use elasticsearch-parent project (#9735, 2.0.0, 1.6.0)
- Packaging: Use of default
CONF_DIR/CONF_FILE
in plugin install (#10721, 2.0.0, 1.6.0) - Aggregations: new
sampler
provides a filter for top-scoring docs (#8191, 2.0.0, 1.6.0)
Apache Lucene
- Finally, after more than 10 years, from back when Lucene issues only needed 3 digits, scoring
FuzzyQuery
is no longer horribly biased by low frequency matched terms</a></li><li>Lucene 5.2.0 is coming (<a href="http://markmail.org/message/sasrttfaxkuij7di">release branch is created</a>)</li><li>Lucene 5.2.x <a href="https://github.com/elastic/elasticsearch/pull/11296">snapshot upgrade</a>, and <a href="https://github.com/elastic/elasticsearch/pull/11311">again</a> (from the release branch!)</li><li>Can we <a href="https://issues.apache.org/jira/browse/LUCENE-6496">avoid building the full global ordinal map</a> after refresh in some cases?</li><li>Lucene <a href="https://issues.apache.org/jira/browse/LUCENE-6169">has commit safety again on Java 1.9</a>!</li><li><a href="https://issues.apache.org/jira/browse/LUCENE-6360"><code>TermsQuery and queries usingMultiTermQuery
</a>now rewrite to disjunction when there are 16 terms or fewer, so skipping can be used</li><li><a href="https://plus.google.com/+MichaelMcCandless/posts/6QveLQs7RW6">This fun visualization</a> shows how the <a href="https://issues.apache.org/jira/browse/LUCENE-6477">upcoming BKD spatial tree</a> recursively finds all index points inside London, UK</li><li>Improve <code>geo3d to more accurately model the earth as an ellipsoid, bulging at the equator, not a sphere MoreLikeThis
should let you specify the <a href="https://issues.apache.org/jira/browse/LUCENE-6493">minimum number of terms to match</a></li><li>Change enumeration of all finite strings from an automaton <a href="https://issues.apache.org/jira/browse/LUCENE-6365">from an entire set to a memory-efficient iterator</a></li><li>More iterations on a common <a href="https://issues.apache.org/jira/browse/LUCENE-6459">suggest API that mirrors Lucene's <code>Query/IndexSearcher
APIs</a></li><li>Fix a tricky interaction, uncovered by randomized tests, between auto-prefix terms and the query cache, <a href="https://issues.apache.org/jira/browse/LUCENE-6491">by preventing fake terms from being sent to <code>TermQuery- This horrible Linux kernel bug may be causing the inexplicable hung builds in Lucene and Elasticsearch recently
- Payloads are now optional when iterating suggest documents using
DocumentDictionary
</li><li>More <code>SpanQuery love:getSpans
moves from <code>SpanQuery toSpanWeight
</a>, <a href="https://issues.apache.org/jira/browse/LUCENE-6371">use <code>SpanCollector to visit all matching spans, generalizePayloadSpanUtil
to collect more than just payloads from postings</a>, but these changes a <a href="https://issues.apache.org/jira/browse/LUCENE-6494?focusedCommentId=14555162&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14555162">little too rushed for 5.2 and have been pulled to 5.3 instead</a></li><li><a href="https://issues.apache.org/jira/browse/LUCENE-6484"><code>EliasFanoDocIdSet is unused and is now removed - In some cases, Lucene could pass a different core cache key when a reader closed listener is invoked, than the original cache key, causing possible memory leaks
HandleTrackingFS
must close the file handle even on an internal exception
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole ELK ecosystem including news, learning resources and cool use cases!