This Week in Elasticsearch and Apache Lucene - 2017-01-09
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Finished write-up over holidays: Everything we know about #elasticsearch for e-commerce sites @elastic @sprysys https://t.co/y1tXjMU5Cw
— Martin Loetzsch (@martin_loetzsch) January 9, 2017
Changes in 5.1:
- The field stats API now has support for geo-point fields.
- Don't close store under CancellableThreads.
- Ensure shrunk indices carry over version information from its source.
- Primary relocation for shadow indices had a hidden bug that caused the source to fail itself before performing a full recovery.
Changes in 5.2:
- The cluster-allocation-explain API now uses the allocation process itself to explain shard allocation decisions, which means that the explanation will always be in sync with reality.
- Snapshot repositories containing pre 2.x snapshots compressed with LZF can now be read again (although the pre-2.x snapshots cannot be restored).
minimum_number_should_matchparameter is deprecated in favour of
- Certain exceptions during index deletion weren't being caught and could cause cluster state application to fail.
- The low level node handshake has been moved from #connectToNode to #openConnection to prevent bypassing.
Changes in 5.3:
- Added infrastructure for storing sensitive settings (eg passwords) in a password-protected keystore.
- The new ToXContentObject interface represents complete objects, while ToXContent is to be used for object fragments which don't output opening and closing curlies.
- Painless strings didn't support escaping of quotes with backslashes.
- Add support for ca-central-1 region to EC2 and S3 plugins.
Changes in master:
- Socket, ServerSocket, and HttpServer usages in tests replaced with mocksocket versions to move SocketPermissions out of core. Also, moved IfConfig.logIfNecessary (which also requires socket perms) into bootstrap, before the security manager is applied.
- Added the first method
pingto the new Java HighLevelRestClient.
- Version now implements Comparable.
- Disable the Netty recycler and pooled allocator as they seem to be more trouble than they are worth.
- #namedObject is replacing SearchExtRegistry, AggregatorParsers, Suggesters, and AllocationCommands.
- The syntax for lower/upper bounds of stddev can be simplified.
- All booleans everywhere should be strictly evaluated.
- Can field collapsing on search hits be done more efficiently and simply with search instead of top hits?
- Custom routing can be used to target a subset of shards instead of just one shard.
- Aggs over indices which return a mix of floats and integers should treat all numbers as doubles.
- Nested and parent-child queries are ignoring the
- Remove unneeded weak reference from prefix logger to resolve potential memory leak.
- S3 is being moved to use the new secure settings infra.
- Remove support for the
_allfield in 6.0.
- Sequence numbers allow for fast recovery when a replica has fallen out of sync with the master.
- The timezone and date format should be normalised when rewriting range queries for caching.
- An adjacency matrix aggregation can show co-occurrence of terms.
- Lucene should better optimize the case of costly multi-term and point queries AND'd with a fast, restrictive query
WordDelimiterFilter,would finally work with positional queries correctly at search time eventually fixing this Elasticsearch issue
- Dimensional points now tries harder to split on all dimension, even in slivery cases
UnifiedHighlightershould let you customize how candidate passages are created, wrapping a sentence
GroupingSearchhelper class could use some improvements
Surroundquery parser should be modernized to use the numerous new Lucene APIs added since it was created
SynonymQuery,which scores multiple synonym terms as a single term,
SpanSynonymQuerywould do the same thing for span queries
BooleanQuerycould quite easily allow for per-document
minimumShouldMatch, instead of the single global value you can provide today
- Now that
LongValuesSourcehas moved to Lucene's core, the suggester module no longer needs to depend on the
queriesmodule. Likewise, the
expressionsmodule should use
DoubleValuesSource, removing its dependency on the
queriesmodule, and the facets module should only use
ComplexPhraseQueryParsershould also handle a single multi-term query in quotes
AnalyzingInfixSuggesterno longer relies on the misc module since we promoted index sorting as a core feature
FlattenGraphFilter,added so multi-token synonyms work correctly, was not correctly handling broken incoming token offsets
AutomatonTermsEnumgave a confusing exception if you passed a special-case
DrillSideways,letting you still see other facet counts even after you've drilled down, now uses threads to gain concurrency
- Lucene was not always enforcing that
PositionLengthAttributewas > 0, possibly causing illegal cyclic token streams
- Query parsers can now handle graph token streams, finally fixing multi-token synonyms with positional queries to behave correctly
- A possible optimization to
DocValuesRangeQuerymay not pan out
- The obscure
mailmapfeature lets us coalesce commits by the same person using different names and/or email addresses over time
- We had to disable a test case on Java 9 but we are not exactly sure why
QueryNode.toQueryStringAPI in the flexible query parser illegally claims to create a string which, when parsed, would result in exactly this same query node
CustomAnalyzerwas only applying character normalization to the last
- Java 9 breaks Lucene's efforts to estimate RAM usage of JDK runtime classes
- A tricky non-reproducing test failure turned out to be a concurrency bug in
DoubleValueSourcehave been promoted to Lucene's core module
LeafFieldComparator.setScoreris now allowed to throw
- Index sorting was failing to ask the codec to create a mutable bit set while flushing a new segment
- Our source code checks (tabs vs whitespace, nocommits, etc.) should also check
- A Jenkins test failure in
Geo3Dresulting in adding a new threshold
Vector.MINIMUM_ANGULAR_RESOLUTIONto reject too-small slivers
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!