19. Oktober 2015

This Week in Elasticsearch and Apache Lucene - October 19th 2015

Von

•

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Top News

Just released new #elasticsearch #python clients - 2.0.0 for upcoming Elasticsearch 2.0 and 1.8.0 for 1.x versions - https://t.co/VUmgunhAgP
— Honza Král (@HonzaKral) October 13, 2015

Elasticsearch Core

Last week we released Elasticsearch 1.7.3 which contained some good bug fixes for the Tribe node, synced flushing, and snapshot restore.

Changes in 2.0:

Testing of 2.0.0-RC1 uncovered a serious bug with Field/Document Level Security, which caused OOMs during bulk indexing. which required some major refactoring to fix (#14070, #14071, #14084).
File/directory permissions in RPM/Deb packages have been tightened up so that config files/plugin dirs are readable by elasticsearch but writeable only by root.
The `default_index_analyzer` has been renamed to `default_analyzer` to be consistent with the `index_analyzer` -> `search_analyzer` mapping changes.

Changes in 2.x/3.x:

Guava is gone! A big cleanup to remove a dependency which can often clash with a user's own dependencies.
The query part of search requests are now parsed on the coordinating node.

In progress:

There was a lot of debate about the syntax of the simple query string parser and how to make it more intuitive.
The new scripting language is taking shape. Now working on using invokeDynamic to reduce the need for strict typing.
In memory fielddata support is being removed in 3.0 for field types which support doc values.
Trying to offload the memory used by global ordinals for multi-value fields to disk.
Aggs are being refactored to be parsed on the coordinating node, just like queries.
GeoPoints v2 PRs are almost ready to be merged into 2.1
Multi-dimensional BKD trees may be promoted from Lucene sandbox to core, which would allow us to add support for 3D Geo plus much requested features like IPv6, BigInt, and BigDecimal.

Apache Lucene

Upgrade ANTLR to version 4.5.1 for numerous bug fixes
Add getters for the query cache and caching policy on IndexSearcher
SpanOrQuery is now immutable
OfflineSorter now uses Lucene's Directory abstraction instead of secretly trying to consume temp directory space
BooleanQuery hashCode and equals now ignore clause order
BoostQuery now adds parens around the boosted query, for the future Lucene 6.0 only
At long last we can deprecate the Filter class, now that its capabilities are fully folded into Query and all internal usage in Lucene has been cutover
Remove the slow RegexQuery from Lucene's sandbox: Lucene's core RegexpQuery (note the extra p!) is faster
Java 9 has stricter type inference
Simplify the base Query.equals method
Add GeoPointDistanceRangeQuery to search a ring instead of a circle
Refactor recent geo tests to improve test coverage and fix a few accuracy bugs
We should add a DimensionalFormat to Lucene's codec, to enable fast numeric and spatial searching on arbitrary byte[]
LZ4 decompression can be costly if you load too many stored documents
We can greatly reduce (96% in one test!) the heap usage for certain doc values by moving the storage for ordinals to disk
Should we add a new matchCost method to TwoPhaseDocIdSetIterator to better optimize query execution?
TermQuery should clone the incoming term
Lucene's classifier should be able to classify an already indexed document
JapaneseTokenizer should produce the top N possible tokenizations, not just the top 2
Move the "delete file retry logic" down under Directory, from IndexWriter , since this is really a Windows-only limitation

Watch This Space

Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!

Elasticsearch Platform

ELK Stack

Elastic Cloud

Observability

Security

Search

Nach Branche

Nach Lösung

Kunden-Spotlight

Entwickler:innen

Vernetzen

Lernen

Hilfe

Erfahren, was es bei Elastic Neues gibt

This Week in Elasticsearch and Apache Lucene - October 19th 2015

Top News

Elasticsearch Core

Apache Lucene

Watch This Space

Folgen Sie uns:

Über uns

Bei Elastic arbeiten

Presse

Partner

Vertrauen und Sicherheit

Investor Relations

EXCELLENCE AWARDS