23 de novembro de 2015

This Week in Elasticsearch and Apache Lucene

Por

•

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Top News

My blog about using @Elasticsearch as a time series database: https://t.co/KaUitUuYvb
— Felix Barnsteiner (@felix_b) November 4, 2015

Elasticsearch Core

Changes in 2.x:

Benchmarks comparing the old completion suggester using payloads with the new completion suggester using doc values.
The Azure repository plugin now has support for multiple repository credentials.
Transport options are now immutable.
The thread interrupt flag is now restored properly after an InterruptedException. Work still needs to be done to make the BulkProcessor handle interrupts correctly.

Changes in master:

After a number of issues with delayed allocation were fixed, delayed allocation has been refactored to make it easier to follow.
As part of the change to wait for shard failures to be acknowledged, the acknowledgement step now respects a timeout.
The OS stats now include the CPU usage, which will also be exposed in Marvel.
Added a variable-length long encoding that supports negative values.
Response filtering now uses native Jackson streaming support making it faster with less custom code, it handles escaping of dots correctly, and can also be used to filter `_source`.
GeoShapes are now built with ShapeBuilders, much like we do with QueryBuilders. The next step is to make them serializable.
The move to Gradle continues with:

On top of that, Gradle can now build RPMs and debian packages, and we'll soon have a test to check that packages are signed.

Ongoing work:

Twelve aggregations have been refactored in the Aggs Refactoring branch... Only 30 left to do!
A PR for the Query Profiler is looking for reviews.
The tribe node doesn't play well with non-default configuration paths.
Make the BulkProcessor back off and retry after request handling has been rejected due to a full queue.
The new scripting language is much more succinct than it was before and is now able to call methods on string constants.
The first step in ensuring that writes are not lost while the primary is relocated is ready for review: this decouples the actual write from the notion of which node is in charge of ensuring that the write happens.
Sequence numbers are now being added to each write and work has started on local checkpoints.
We will store allocation IDs with each shard to ensure that we choose the most recent shard at recovery time.
A task management prototype has been implemented, but testing is proving harder.
Work has started on splitting the `string` field datatype into separate `text` and `keywords` field types.
Improvements to exceptions to stop swallowing stack traces.

Apache Lucene

We have a volunteer release manager for 5.4.0!
Factor out components of the XML query parser so consumers can inherit from classes without requiring sandbox module code
Optimize stored fields retrieval avoid skipping the last field in the document if it's not needed, but this will only help if the last field is more than 16 KB at default settings
Upgrade randomized testing to version 2.3.1, to get several improvements
Prefix-coding the values in block KD-tree leaf blocks gives a sizable reduction in the already small index size, with only a small slowdown in query performance
Add optimizations/specializations for bulk merging in the common case of 1D dimensional values, resulting in sizable speedups making indexing faster than numeric field
WordDelimiterFilter should respect KeywordAttribute
Don't use null to represent sorting by relevance!
PhraseQuery does, in fact, allow more than one term at the same position, but it's interpreted differently (conjunction) than MultiPhraseQuery (disjunction)
Many similarities incorrectly treat a 0 norm value to be an infinitely long document
A GeoDistanceRangeQuery that overlaps one or both of the poles is problematic, adding risk to Santa Claus's upcoming flight planning
UnicodeWhitespaceTokenizer splits tokens on any Unicode-defined whitespace character
The new matchCost method will let Lucene execute conjunction queries, with multiple two-phased clauses, more efficiently
If you search on a massive shape, such that it manages to wrap around and span the entire earth, we should rewrite that to just match all documents with the field
GeoPointDistanceQuery matches the wrong documents when it has to cross the dateline
IOUtils.fsync should not retry on hitting IOException since that means a great disturbance has occurred
JapaneseTokenizer should offer more then two possible tokenizations
Add some simple optimizations for filters in BooleanQuery.rewrite

Watch This Space

Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!

Elasticsearch Platform

ELK Stack

Elastic Cloud

Observability

Security

Search

Por setor

Por solução

Cliente em destaque

Desenvolvedores

Conectar-se

Aprender

Ajuda

Veja o que está acontecendo na Elastic

This Week in Elasticsearch and Apache Lucene

Top News

Elasticsearch Core

Apache Lucene

Watch This Space

Siga-nos

Sobre nós

Junte-se a nós

Imprensa

Parceiros

Confiança e segurança

Relações com investidores

EXCELLENCE AWARDS