This Week in Elasticsearch and Apache Lucene - October 26th 2015
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Top News
#Elasticsearch 2.0 is coming...join @ZacharyTong this Wed for a demo, feature Qs, and more!
https://t.co/BkIs22EfI4 pic.twitter.com/TjMsagHwgj
— elastic (@elastic) October 26, 2015
Elasticsearch Core
Changes in 2.0:
- Date math index names (eg
<logstash-{now/D}>
) must be properly escaped when used in URLs. - Extensions which should have been bound as singletons were not actually singletons.
Changes in 2.1:
- The count Java API has been deprecated in favour of using search with size=0. The REST API continues to exist as sugar.
- The not query has been deprecated in favour of the bool query's must_not clause.
- The optimize API has been deprecated in favour of the new _forcedmerge API.
- The search-exists API has been deprecated in favour of search with size=0 and terminate_after=1
- Concurrent rebalance constraints were not applied when shards were moved because of changes to shard allocation filtering.
- Sometimes, stats were still counted for indices that were closed or deleted.
- Improved the SysV startup script's ability to determine whether elastiscearch has started correctly.
- MetaDataSerivce and it's semaphores are no longer needed to prevent old/new indices with the same name from colliding.
- Plugins can now request special privileges from the Java security manager to work around third party problems, and the user is warned about this request at installation time.
- Elasticsearch can now bind to multiple interfaces.
- GCE plugin now backs off when too many errors.
- The ability to close indices can be disabled.
- The oal.search.Filter class has been removed and banned.
- More exceptions now preserve their context when being rethrown.
- The @Test annotation has been removed in favour of naming tests "test..."
- Removed "uninverted" and "binary" fielddata support for numeric and boolean fields
- The Sequence IDs project has been kicked off with the addition of primary terms.
- The index/cluster settings cleanup has been kicked off by replacing the @IndexSettings annotation with a full-fledged class.
- IndicesLifecycle has been replaced with a garbage collected per-index IndexEventListener.
Apache Lucene
- Upgrade ANTLR to version 4.5.1 for numerous bug fixes
- Add getters for the query cache and caching policy on
IndexSearcher
SpanOrQuery
is now immutable- Absorb
SpanScorer
into Spans to simplify how span queries are scored - Improve accuracy of
GeoPointDistanceQuery,
and improve test coverage - The low-level API for block-KD trees is added for Lucene 6.0, and phase 2, extending the codec with a new
DimensionalFormat
is started. This will eventually give Lucene first-class support for numerics, spatial, and large binary (BigInteger,
BigDecimal,
IPv6, etc.) values. TermQuery
should clone the incoming term- Lucene's classifier should be able to classify an already indexed document
- Add
GeoPointDistanceRangeQuery
to search a ring instead of a circle RAMDirectory.listAll
should never return null file names; this was a "woops" by yours truly!PayloadScoreQuery
now has anincludeSpanScore
option, just likePayloadTermQuery
- The flexible query parser should allow you to pass in your own
NumberFormat
implementation - Optimize how
BooleanQuery
scores when it has a single clause IndexWriter
should have an API to write segments to disk without refreshing the near-real-time reader to give applications more granular control over heap usage- We need a virus-checker simulating file system to move forward with moving the windows-specific "delete file retry logic" from
IndexWriter
to the directory implementation - Pole-crossing shapes are challenging for geo queries
- Should we add a new
matchCost
method toTwoPhaseDocIdSetIterator
to better optimize query execution? - It's confusing how
PhraseQuery
handles multiple terms at the same position, as a conjunction, versus howMultiPhraseQuery
handles it, as a disjunction, maybe? - We can greatly reduce (96% in one test!) the heap usage for certain doc values by moving the storage for ordinals to disk
OfflineSorter
now uses Lucene'sDirectory
abstraction instead of secretly trying to consume temp directory space
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!