17 de outubro de 2016

This Week in Elasticsearch and Apache Lucene - 2016-10-17

Por

•

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Upgrade to #Elasticsearch 5.0.0-rc1 paying back. A small benchmark in my project shows consistent reduction in response time by 25%.
— DoHyung Kim (@_dohyung_) October 13, 2016

Elasticsearch Core

2.4:

The match_phrase_prefix query on the _all field was running a term query instead of a prefix query.
The position_increment_gap on not_analyzed fields should be ignored for bwc instead of throwing an exception.
The multi_match query should not accept an array of query strings.

5.0:

AbstractArrays that release their bytes more than once can lead to incorrect circuit breaker memory counting.
The FVH was not extracting terms from the SynonymQuery.
The results from script queries should not be cached.
Cluster settings updates should not be blocked by circuit breakers.
Changing how we split strings broke parsing of the S3 repository base path.
Shadow replicas should not increment their allocation ID when being promoted to primary.
The elasticsearch-plugin script no longer displays plugin version numbers as this introduces confusion about how plugins should be referred to.
Netty4 was not closing connections correctly.

5.x:

Cat APIs are now sortable.
Mustache gains a {{#url}} function that knows how to do URL escaping.
Source filtering should be able to step into paths in dotted field names.
Update scripts now have the current timestamp available in ctx._now.
Self-referencing objects can result in stack overflow exceptions
Source-filtering automatons should only be compiled once.
The whole multi-get request should not fail if an alias resolves to too many indices.
Alias filters should be parsed on the coordinating node instead of on each shard, so that filters are the same for all shards, and so they can be rewritten.

master:

Keep snapshot restore state and the routing table in sync to avoid failed communication resulting in an unrecoverable state.
Sequence numbers are now written to commits using Lucene's api, so that the max seq id is accurately reflected in the commit, including deletes.
Bulk requests have been simplified and limited to DocumentRequests, which simplifies execution and error handling.
Threadpools no longer impose an artificial limit of 32 processors.
ObjectParser is now used by Score, Field, and ScriptSortBuilder.
Synonyms should be parsed with the analyzer chain specified in the analyzer, not just whitespace.

ongoing:

Logstash and Beats 5.x templates should be tested for bwc against master.
Alias names should undergo the same validation as index names.
Reindex and friends should be parallelisable.
Storing the execution context of a Painless script can make them much faster.
Can the _all field be replaced by an _all query?
Negated index expressions should only be taken into consideration when there is a positive wildcard.
Template validation should take into account other possible matching templates.
The tribe node should be able to store custom cluster state metadata, in order to support licensing.
Searches should be cancellable in the task management API.
The term suggester should have the option to return exact matches.
Rank evaluation should support search templates.

Apache Lucene

SimpleQueryParser now parses * to MatchAllDocsQuery
The 7.0 codec now stores norms sparsely and will soon store sparse doc values more efficiently, taking advantage of the new iterator-based APIs for doc values
Lucene70NormsFormat was over-synchronized
A fun usage of Lucene's Automaton APIs to implement source filtering in Elasticsearch led to Lucene simplifications to make it clear that an automaton's initial state is always 0
FastVectorHighlighter failed to highlight SynonymQuery
The benchmark module now supports all Lucene highlighters
JapaneseNumberFilter should not invoke incrementToken on its input after it had already returned false
Lucene60SegmentInfoFormat gets its own dedicated test case
The code fragment in the javadocs for LRUQueryCache was stale
DisjunctionMaxQuery does not work correctly with sub-queries that return negative scores
Lucene complex efforts to pretend it has no schema were buggy if you suddenly started indexing dimensional points into a pre-existing index; this does not affect Elasticsearch since its mappings ensure we never mix points and no points for a given field in the same index
Our AssertingDocValuesFormat, used to validate all arguments and return results to/from the doc values APIs usage by tests, was too lenient in checking the target argument to advance , and would have prevented this otherwise tricky-to-debug test failure
Should the dimensional points APIs take a field name up front, like most other LeafReader APIs?
A nasty JVM bug causing an unexpected AssertionError in Lucene's ByteSliceReader has finally been fixed
The UAX_URL_EMAIL tokenizer still needs to be modernized to detect all top-level domain names
Lucene's facets module does not let you get facets without hits today
Lack of synchronization or volatile keyword means theoretically an NRT reader refresh might not see a recent change to the index but likely in practice it's a non-issue
Lucene's classic QueryParser mis-parses OR as AND , but it's unfortunately a known limitation
LMDirichletSimilarity incorrectly ignores query terms that do not appear in the current segment

Watch This Space

Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!

Elasticsearch Platform

ELK Stack

Elastic Cloud

Observability

Security

Search

Por setor

Por solução

Cliente em destaque

Desenvolvedores

Conectar-se

Aprender

Ajuda

Veja o que está acontecendo na Elastic

This Week in Elasticsearch and Apache Lucene - 2016-10-17

Elasticsearch Core

Apache Lucene

Watch This Space

Siga-nos

Sobre nós

Junte-se a nós

Imprensa

Parceiros

Confiança e segurança

Relações com investidores

EXCELLENCE AWARDS