This Week in Elasticsearch and Apache Lucene: Faster Analytics through Elasticsearch
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Top News
Faster Analytics through Elasticsearch (written by me) http://t.co/czM1azFtX0
— Harold Neal (@metacreek) June 22, 2015
Elasticsearch Core
- Refactoring: Replace
Iterators#emptyIterator
by JDK one (#11741, 2.0.0) - Internal: Add
DateTime
ctors without timezone to forbidden APIs (#11743, 2.0.0) - Delete by query: Fix number of deleted/missing documents (#11745, 2.0.0)
- More like this: Support for deprecated
percent_terms_to_match
REST parameter (#11736, 1.7.0, 1.6.1, 1.5.3) - Allocation: Optional delayed allocation on Node leave (#11712, 2.0.0, 1.7.0)
- Security Manager: Allow security rule for advanced SSL configutation (#11751, 2.0.0)
- Refactoring: Add
Iterators.emptyIterator
to forbidden apis (#11758, 2.0.0) - Cluster: Reset
registeredNextDelaySetting
on reroute (#11759, 2.0.0, 1.7.0) - Build: Add
@Repeat
to forbidden APIs (#11762, 2.0.0) - Mapping: Replace
fieldType
access in mappers with getter (#11764, 2.0.0) - Aggregations:
moving_avg
model parser should accept any numeric (#11778, 2.0.0) - Mapping: Hide more fieldType access and cleanup
null_value
merging (#11770, 2.0.0) - Query:
CommonTermsQuery
fix for ignored coordination factor (#11780, 2.0.0, 1.7.0, 1.6.1) - Internal: Mark store as corrupted instead of deleting state file on engine failure (#11769, 2.0.0, 1.6.1)
- Mapping: Move merge simulation of fieldtype settings to fieldtype method (#11783, 2.0.0)
- Snapshot/Restore: Improve logging of repository verification exceptions. (#11763, 2.0.0, 1.6.1)
- Internal: Use
AbstractRunnable
in scheduled ping (#11795, 2.0.0) - Build: Make
rest-spec-api
a project so eclipse build works (#11752, 2.0.0) - Cleanup: Remove reroute with no reassign (#11804, 2.0.0, 1.7.0)
- Percolator: Load percolator queries before shard is marked
POST_RECOVERY
(#11799, 2.0.0) - Mappings: Lockdown
_timestamp
field (#11794, 2.0.0) - Stats: Add OS name to
_nodes
and _cluster/nodes (#11807, 2.0.0) - Packaging: Create
PID_DIR
in init.d script (#11674, 2.0.0, 1.6.1) - Cluster state: Add Unassigned meta data (#11653, 2.0.0, 1.7.0)
- Fielddata: Simplify doc values handling for
_timestamp
(#11693, 2.0.0) - Rivers: Remove from master (#11568, 2.0.0)
- Search: Search
preference
based on node specification (#11464, 2.0.0, 1.7.0) - Plugin: Add
delete-by-query
plugin (#11516, 2.0.0) - Core: Balance new shard allocations more evenly on multiple
path.data
(#11185, 2.0.0) - Packaging: Add LICENSE and NOTICE files for all core dependencies (#11705, 2.0.0)
- More like this: Renamed
ignore_like
to <code>unlike (#11117, 2.0.0) - Indexing: Show human readable Elasticsearch version that created index and date when index was created (#11509, 2.0.0)
- Snapshot/Restore: Add snapshot name validation logic to all snapshot operations (#11617, 2.0.0, 1.6.1)
- Query DSL: Add support for query boost to
SimpleQueryStringBuilder
(#11696, 2.0.0) - Plugins (AWS): Upgrade AWS dependency to 1.10.0 (#11659, 2.0.0)
- Scripting: Allow executable expression scripts for aggregations (#11689, 2.0.0)
- Dates: Allow for negative unix timestamps (#11482, 2.0.0)
- Cluster Health: Add wait time for pending task and recovery percentage (#11393, 2.0.0)
- Aggregations:
moving_avg
forecasts should not include current point (#11641, 2.0.0)
Apache Lucene
- Lucene 5.2.1 bug fix release is out!
- Relax
ToChildBlockJoinQuery
<wbr>so <a href="https://issues.apache.org/jira/browse/LUCENE-6593" target="_blank">it works with <code>BooleanQuery <wbr>approximations </li><li><code>SynonymFilter should produce more accurate graphs <a href="https://issues.apache.org/jira/browse/LUCENE-6582" target="_blank">when the replacement is more tokens than the original</a> </li><li>Both <a href="https://issues.apache.org/jira/browse/LUCENE-6531" target="_blank"><code>PhraseQuery and <a href="https://issues.apache.org/jira/browse/LUCENE-6570" target="_blank"><code>BooleanQu<wbr>ery are now immutable, to make query caching safer - Add defensive asserts to try to figure out whether this spooky JDK9 build failure is a JVM bug or a Lucene bug
- Some improvements to
geo3d
</a>, including a new "surface distance" method, <a href="https://issues.apache.org/jira/browse/LUCENE-6502" target="_blank">fixes to reduce false positive test failures</a>, and a new <a href="https://issues.apache.org/jira/browse/LUCENE-6578" target="_blank"><code>arcDistanceToShape method</a> </li><li><code>FilteredQuery <a href="https://issues.apache.org/jira/browse/LUCENE-6583" target="_blank">has been removed</a>, in favor of <code>BooleanQuery's <code>FILTER <wbr>clause </li><li>What happens <a href="https://issues.apache.org/jira/browse/LUCENE-6576" target="_blank">when a committer experiences bad hardware on his dev box</a>? </li><li><a href="https://issues.apache.org/jira/browse/LUCENE-6577" target="_blank">Add defensive checks against invalid index checksums when writing</a> </li><li>Some span query improvements: <a href="https://issues.apache.org/jira/browse/LUCENE-6371" target="_blank">remove unused <code>collectPayloads <wbr>parameter from <code>SpanNearQuery, <a href="https://issues.apache.org/jira/browse/LUCENE-6567" target="_blank"><wbr>simplify how <code>SpanPayloadCheckQuerychecks payloads, span queries now score more consistently with other queries, remove SpanNearPayloa
, a newdCheckQuery SpanQueryParser
</a> and <a href="https://issues.apache.org/jira/browse/LUCENE-6580" target="_blank">can <code>Sp<wbr>anNearQuery's gaps act more like <code>PhraseQuery's?</a> </li><li><a href="https://issues.apache.org/jira/browse/LUCENE-6569" target="_blank">Don't create excessive arrays</a> in <code>MultiFunction.<wbr>anyExists and <code>.allExists </li><li><code>PrintStreamInfoStream's date formatting </a><a href="https://issues.apache.org/jira/browse/LUCENE-6564" target="_blank">was not thread safe</a> </li><li><code>DocValuesNumbersQuery <a href="https://issues.apache.org/jira/browse/LUCENE-6539" target="_blank">matches all documents containing any of a specified set of numbers in a given doc values field</a> </li><li><code>BKDPointInBoxQuery handles the date line correctly TimeLimitingCollector
<a href="https://issues.apache.org/jira/browse/LUCENE-6559" target="_blank">now checks for timeouts even when there are no hits</a> </li><li><code>IndexWriter's write lock <a href="https://issues.apache.org/jira/browse/LUCENE-6525" target="_blank">no longer accepts a timeout</a> </li><li><a href="https://issues.apache.org/jira/browse/LUCENE-6591" target="_blank">Never write negative vLongs</a> </li><li><code>BooleanQuery <a href="https://issues.apache.org/jira/browse/LUCENE-6585" target="_blank">should flatten nested MUST clauses</a> </li><li>Can we <a href="https://issues.apache.org/jira/browse/LUCENE-6590" target="_blank">simplify how query time boosting is implemented</a>? </li><li><code>ToBlockJoinFieldComparator <a href="https://issues.apache.org/jira/browse/LUCENE-6554" target="_blank">has a fatal flaw</a> </li><li>A <a href="https://issues.apache.org/jira/browse/LUCENE-6586" target="_blank">silly typo in <code>GermanStemmer</a> can cause invalid results </li><li>A new <a href="https://issues.apache.org/jira/browse/LUCENE-6589" target="_blank"><code>CheckJoinIndex verifies you indexed correctly for block joins</a> </li><li><code>BooleanQuery.equals <a href="https://issues.apache.org/jira/browse/LUCENE-6305" target="_blank">should ignore clause order</a> </li><li><code>ToChildBlockJoinQuery fails to compute parent score when the first child document is deleted- Should we move
explain
to <code>Scorer</a>? </li><li>The geo-point queries will <a href="https://issues.apache.org/jira/browse/LUCENE-6547" target="_blank">soon handle shapes that cross the international date line</a> and <a href="https://issues.apache.org/jira/browse/LUCENE-6547" target="_blank">include a new <code>GeoPointDistance query</a>, but <a href="https://issues.apache.org/jira/browse/LUCENE-6562" target="_blank">are doing too much work, now</a> </li><li>Soon, you can <a href="https://issues.apache.org/jira/browse/LUCENE-6524" target="_blank">create an <code>IndexWriter from an already opened <code>IndexReader</a>, letting you efficiently upgrade reader to reader+writer </li><li><a href="https://issues.apache.org/jira/browse/LUCENE-6579" target="_blank"><code>IndexWriter should abort if a merge hits an unexpected exception</a> such as disk full, instead of <a href="http://www.brainyquote.com/quotes/quotes/a/alberteins133991.html" target="_blank">manifesting insanity</a> by retrying over and over </li><li><a href="https://issues.apache.org/jira/browse/LUCENE-6574" target="_blank">Flatten the analyzers module</a> </li><li><a href="https://issues.apache.org/jira/browse/LUCENE-6573" target="_blank">Let's absorb all Lucene modules into a single one</a> because <a href="https://issues.apache.org/jira/browse/LUCENE-6572" target="_blank">cross dependencies are sneaking in</a>? </li><li><a href="https://issues.apache.org/jira/browse/LUCENE-6548" target="_blank">Speed up</a> the default <code>Terms.intersect <wbr>implementation specifically for automata that match a limited set of terms </li><li><a href="https://issues.apache.org/jira/browse/LUCENE-6553" target="_blank">Stop special casing deleted docs in our read-time APIs, and treat them just like a filter</a> </li><li><code>IndexWriter.prepareCommit should probably not be visible to near-real-time readers
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole ELK ecosystem including news, learning resources and cool use cases!