16 août 2016

This Week in Elasticsearch and Apache Lucene - 2016-08-16

Par

•

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Top News

Powering Transactions Search with Elastic – Learnings from the Field https://t.co/BbcGgmgeGV
— PayPal Developer (@paypaldev) August 11, 2016

Elasticsearch Core

Changes in 2.x:

The mapper.allow_dots_in_name setting disables the dots-in-fieldname check, which will allow users stuck on 1.x to upgrade to 2.4.0.
Update Jackson to 2.6.6 Final.
Prebuild Japanese stopwords token filter.

Changes in master:

The completion suggester now returns documents as results instead of doc_values/payloads.
Add support for upgrading field mappings which have dots in the fieldname to treat dots as path separators.
Specifying more than one field name in the short query form now results in an exception instead of being silently ignored.
max_local_storage_nodes now defaults to 1 - it must be overridden to start multiple nodes in the same data directory.
Snapshot deletions first check whether a restore is already in progress.
Script compilation is now subject to a circuit breaker to enforce the use of named params.
Explain with DFS query now uses global term statistics.
Scroll requests in 5.0 were not being renewed.
The BoostingQuery didn't work with the fast vector highlighter.
Netty4 wasn't handling Expect: 100-continue headers correctly.
Added workarounds because Docker doesn't handle seccomp calls correctly.
The RoutingNodes interface has been cleaned up and minimised to make it easier to ensure its invariants.
Keyword fields now use binary doc values instead of string to avoid encoding as UTF8 twice (once for indexing and once for doc values).
The analyze API shouldn't result in caching tokenizers or filters.
Analyzer aliases are no longer supported. Old indices using aliases will be upgraded correctly.
Fatal errors such as OOM should cause the JVM to exit, even if thrown in unprivileged code like the scripting engine, but OOM and StackOverflow errors in Painless are safe to catch.
Groovy asserts should not cause the JVM to exit.
Most geo-distance helper methods in scripts have been removed in favour of arcDistance and planeDistance.
VersionFetchSubPhase was fetching the docId, even though it was already known.
The jvm.options file wasn't handling spaces correctly on Windows.
Internet Explorer can't handle multiple CORS headers, but expects comma-separated values instead.
The query slow log was missing node name and shard ID.
The lang-javascript plugin now works with reindex.
NamedWritableRegistry is now immutable and takes all readers at construction time. instead of relying on Guice for injection. Extension points exist for plugins.
Gradle now checks for // norelease again.
We should be explicit about which annotation processors should run.
Benchmarks show that the Java HTTP client performs as well as the TransportClient.

Ongoing:

The rank evaluation framework continues to evolve.
Should we add an option to not return metadata with hits?
Apps like Kibana need to be able to reindex their indices when upgrading.
If we can improve Lucene's caching, we can speed up primary key lookups during indexing.
Should the _all field be disabled by default?
Suppressing AlreadyClosedException with mmap can be disastrous.
Figuring out which shard allocations have been made should be less costly.
Geo-distances calculations should only use arc, to be consistent with geo-distance queries.
Log4j is being upgraded to Log4j2.
Should implicit casting in Painless be implemented as its own phase, instead of piggybacking on the analysis phase?

Apache Lucene

Lucene will soon try harder in its best effort check to detect when MMapDirectory is being used after being closed since that can cause a SIGSEGV which terminates the JVM
A doubt from a user about Lucene's newish query cache leads to adding a clarifying comment to Lucene's sources
The Lagrangian bounds computation in geo3d had a degenerate corner case
Switching even numeric doc values to an iterator API, instead of random access is challenging
MockDirectoryWrapper, used during Elasticsearch and Lucene tests to ensure the store level APIs are being used correctly, will now detect when a clone of a closed IndexInput is being used
PrefixQuery and Automaton now make slightly fewer object allocations
A new regular expression engine using Memory Occurence Automata may lead to better regular expression queries
MoreLikeThisTest test keeps failing
Concordance searching, letting you click through every term hit and its surroundings in your result set, is now available for Lucene via maven
Making delete-by-query work with doc-values queries is horribly complex and it may make more sense to remove doc-values queries instead
The APIs to track external data structures along with Lucene's LeafReaders are trappy
IntRangeField, FloatRangeField and LongRangeField, letting you index a range and search by ranges overlapping the indexed ranges, are coming soon

Watch This Space

Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!

Elasticsearch Platform

Suite Elastic

Elastic Cloud

Observability

Security

Search

Par secteur

Par solution

Témoignage client

Développeurs

Communication

Apprentissage

Aide

Actualités d'Elastic

This Week in Elasticsearch and Apache Lucene - 2016-08-16

Top News

Elasticsearch Core

Apache Lucene

Watch This Space

Nous suivre

À propos de nous

Emplois

Presse

Partenaires

Confiance et sécurité

Relations investisseurs

EXCELLENCE AWARDS