This Week in Elasticsearch and Apache Lucene: 5.2.1 bugfix release is out
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Top News
Getting started w/#Elasticsearch? Join @ZacharyTong on June 24 for an in-depth webinar & Q&A http://t.co/CaZZMlJRTz pic.twitter.com/lXDNhytWny
— elastic (@elastic) June 15, 2015
Elasticsearch Core
- Recovery: Fix
MapperException
detection during translog ops replay (#11583, 2.0.0) - Merging: Remove
MergeScheduler
pluggability (#11585, 2.0.0) - Mapping: Shortcut
exists/missing
queries when no types/docs exist (#11586, 2.0.0) - Internal: Bake in
TieredMergePolicy
(#11588, 2.0.0) - Logging: Use task's class name if not a
TimedPrioritizeRunnable
(#11610, 2.0.0, 1.6.1, 1.5.3) - Core: Cleanup
MergeScheduler
infrastructure (#11602, 2.0.0) - Core: Consistently add one more maxMerge in
ConcurrentMergeSchedulerProvider
(#11613, 1.6.1, 1.5.3) - Internal:
AsyncShardFetch
can hang if there are new nodes in cluster state (#11615, 2.0.0, 1.6.1) - Aggregations: Allow users to perform simple arithmetic operations on histogram aggregations (#11601, 2.0.0)
- Internal: Create
ShardSuggestService
metrics manually outside of guice (#11605, 2.0.0) - Core: Use
System.nanoTime
for ThreadPool's estimated time, since it's less likely to go backwards (#11626, 2.0.0) - Search: Removing top-level filter parameter from search API (#11600, 2.0.0)
- Allocation: Simplify
ShardRouting
and centralize move to unassigned (#11634, 2.0.0, 1.7.0) - Enhancement: Reduce the size of the
XContent
parsing exception (#11642, 2.0.0, 1.7.0, 1.6.1) - Scripting: Remove deprecated script APIs (#11619, 2.0.0)
- Internal: Fold
ShardGetService
creation away from Guice into IndexShard (#11606, 2.0.0) - Dependencies: Upgrade to Lucene 5.2.1 (#11662, 2.0.0)
- Fielddata: Remove non-default fielddata formats (#11669, 2.0.0)
- Mapping: Add equals/hashcode to fieldtypes (#11644, 2.0.0)
- Mapping: Remove
SmartNameObjectMapper
(#11686, 2.0.0) - Build: clean
pom.xml
(#11676, 2.0.0) - Build: mark elasticsearch as provided in plugins (#11637, 2.0.0)
- Build: update maven-shade-plugin to 2.4 (#11622, 2.0.0)
- Mapping: Move null value handling into
MappedFieldType
(#11544, 2.0.0) - Mapping: Make index level mapping apis use
MappedFieldType
(#11559, 2.0.0) - Mapping: Remove leftover sugar methods from
FieldMapper
(#11565, 2.0.0) - Shadow replicas: Return empty CommitID from
ShadowEngine#flush
(#11554, 1.6.1) - Internal: Make
CompressedXContent.equals
fast again (#11428, 2.0.0) - More like this: Add back support for
deprectated percent_terms_to_match
REST parameter (#11574, 1.6.1) - Snapshot/Restore: Change metadata file format (#11507, 2.0.0)
- Snapshot/Restore: Move in-progress snapshot and restore information from custom metadata to custom cluster state part (#11486, 2.0.0)
Apache Lucene
- Lucene's
BaseTokenStreamTestCa
uncovered tse.checkRandomData his nasty new bug in System.arraycopy
</a> in the latest JDK9 snapshot, causing <a href="http://build.elastic.co/" target="_blank">Elasticsearch Jenkins instances</a> to light up as soon as we upgraded. Fortunately <a href="http://cr.openjdk.java.net/~roland/8086046/webrev.00/" target="_blank">a fix is already in progress</a>. </li><li>5.2.1 bugfix release is <a href="http://lucene.markmail.org/thread/hikgsiv5ofkvab6j" target="_blank">out</a> </li><li>A <a href="https://issues.apache.org/jira/browse/LUCENE-6557?jql=labels%20%3D%20IBM-J9" target="_blank">flurry of issues affecting IBM's J9 JVM</a> have been opened, with some good progress towards fixes </li><li><a href="https://issues.apache.org/jira/browse/LUCENE-6558" target="_blank">Highlighters now work with <code>CustomScoreQuery</a> </li><li><a href="https://issues.apache.org/jira/browse/LUCENE-6504" target="_blank">Norms are now accessed off-heap at search time</a> using our <code>RandomAccessInput for reading </li><li>Add a <a href="https://issues.apache.org/jira/browse/LUCENE-6552" target="_blank">couple</a> <a href="https://issues.apache.org/jira/browse/LUCENE-6551" target="_blank">missing getters</a> </li><li>Now you can ask <code>MMapDirectory to preload all pages on open to warm your index - BKD polygon queries are much faster by avoiding per-hit filtering when a leaf cell is fully enclosed, with a new London, UK video showing the improvement
UninvertingReader
(used to create doc-values on-the-fly from postings), had a <a href="https://issues.apache.org/jira/browse/LUCENE-6529" target="_blank">buggy, now gone, optimization</a> affecting numeric fields </li><li><a href="https://issues.apache.org/jira/browse/LUCENE-6537" target="_blank">Speed up spans iteration in <code>NearSpansOrdered </li><li><a href="https://issues.apache.org/jira/browse/LUCENE-6527" target="_blank">Pass a dummy similarity</a> so queries do not load norms if scores are not requested </li><li><a href="https://issues.apache.org/jira/browse/LUCENE-6541" target="_blank"><code>Geo3d</a> <a href="https://issues.apache.org/jira/browse/LUCENE-6535" target="_blank">continues</a> <a href="https://issues.apache.org/jira/browse/LUCENE-6544" target="_blank">baking</a> </li><li>A new <a href="https://issues.apache.org/jira/browse/LUCENE-6539" target="_blank"><code>DocValuesNumbersQuery can sometimes efficiently match documents with a specified set of numbers- Can we make this spatial test less fragile?
- More iterations on how to create an
IndexWriter
from an already opened <code>IndexReader</a>, letting you efficiently upgrade reader to reader+writer </li><li>Lucene's <code>usegments_N file <a href="https://issues.apache.org/jira/browse/LUCENE-5954" target="_blank">now stores the version that wrote it, and the version of the oldest segment in the index</a> </li><li>A <a href="https://issues.apache.org/jira/browse/LUCENE-6481" target="_blank">new, simple geo-point API</a> lets you index lat/lon points and search by polygon or bounding box, and soon <a href="https://issues.apache.org/jira/browse/LUCENE-6532" target="_blank">by distance</a>, but <a href="https://issues.apache.org/jira/browse/LUCENE-6562" target="_blank">it's doing more work than it should</a> </li><li>The geo-point queries will <a href="https://issues.apache.org/jira/browse/LUCENE-6547" target="_blank">soon handle shapes that cross the international date line</a>, as will the <a href="https://issues.apache.org/jira/browse/LUCENE-6560" target="_blank">BKD queries</a> </li><li><code>TimeLimitingCollector now checks for timeouts even when there are no hits - Improve tests that spawn new processes by using Java7's
ProcessBuilder.
, instead of making our own threaded IO baby-sittersinheritIO - Fix sneaky class loader deadlock during codec initialization
- A commit with no user-data changes should also be reflected in near-real-time reopen
- Speed up the default
Terms.intersect
<wbr>implementation specifically for automata that match a limited set of terms, after <a href="https://issues.apache.org/jira/browse/LUCENE-3893" target="_blank">benchmarks showed its performance is worse than the naive <code>seekExact approach </li><li>Live docs are fast to check, so <a href="https://issues.apache.org/jira/browse/LUCENE-6543" target="_blank">they should apply in the first phase in <code>CachingWrapperQuery</a> </li><li>Something makes multi-threaded tests <a href="https://issues.apache.org/jira/browse/LUCENE-6511" target="_blank">horribly slow on windows</a> </li><li><code>ToBlockJoinFieldComparator has a fatal flaw - Now that Lucene has two-phase iteration, we may be able to apply live documents more efficiently and simplify the postings APIs
- Some further discussion on how to integrate geo3d and the new geo-point queries
- Soon
PhraseQuery
will be immutable
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole ELK ecosystem including news, learning resources and cool use cases!