This Week in Elasticsearch and Apache Lucene - 2017-01-16

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Protecting Against Attacks that Hold Your Data for Ransom #elasticsearch https://t.co/Fg5wNC2aQJ

— elastic (@elastic) January 13, 2017

Elasticsearch Core

Multi-word synonyms and synonym graphs

There has been much work recently on improving Lucene's handling of graph token streams, where analysis of text, either from a document during indexing, or a query during searching, produces multiple overlapping paths or interpretations for the tokens. Multi-word synonyms do this and have long been buggy when used with proximity queries but thanks to the recent addition of SynonymGraphFilter as well as improvements to Lucene's query parsers to translate the token graph into separate queries, such analysis chains are finally handled correctly at search time. WordDelimiterFilter is also being fixed to produce correct graphs. These changes have already been exposed in Elasticsearch, and then subsequently in Lucene, thanks to Matt Weber. Graph token streams still present challenges, though, such as the need to use FlattenGraphFilter during indexing, but not searching, since a Lucene index cannot represent a graph. There are also a number of token filters that should produce a graph but do not yet, such as ShingleTokenFilter, EdgeNGramTokenFilter and decompounders.

Changes in 5.2:

Changes in 5.x:Changes in master:Upcoming changes:

Apache Lucene

Watch This Space

Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!