17. Mai 2016

This Week in Elasticsearch and Apache Lucene - 2016-05-17

Von

•

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Top News

Video of my "Ingest Node: (re)indexing and enriching documents within #elasticsearch" talk given at #DevCon16 is up

— Luca Cavanna (@lucacavanna) May 13, 2016

Elasticsearch Core

Changes in 2.x:

Reindex throttling has been backported.
A wildcard on a stopword produced an NPE in simple_query_string.
CORS handling no longer checks for User-Agent.
CORS permits same-origin requests.
Fixed CONTAINS relationship in geo_shape query.

Changes in master:

Dots in field names are supported again!
Shell scripts in Elasticsearch now depend on bash instead of sh.
The plugin script no longer accepts Java system properties as command line params.
The fieldstats API now only returns info for fields that exist in the Lucene index.
Scripting engine settings no longer support the "sandbox" option - they accept only true or false.
Scripting engines can now register only a single script type name and file extension.
The reindex and update-by-query APIs now both return a BulkIndexByScrollResponse.
The Painless scripting language received many low-level performance and usability improvements.
Elasticsearch min/max heap now defaults to 256MB and 2GB.
Shard routing is now immutable.
Fixed a concurrency bug in IndexingMemoryController which could result in miscounts and even OOM.
Iterables.flatten should not pre-cache the first iterator.
Reindex batches default to 1000 docs, instead of 100.
Added missing setting: `discovery.ec2.tag.project`.
The cat-fielddata API now returns fields as rows instead of columns.
The significant terms agg can now be used on fields indexed as points (ie date, numeric, ip).
Dangling indices are no longer imported if a tombstone for the index exists.
The fingerprint analyzer now dedupes tokens after ASCII-folding.
The in_flight requests circuit breaker now excludes PingRequest and MasterPingRequest.
Terms aggs on IP fields return IP addresses as string keys.
The `fuzziness` parameter now throws an exception when used in multi-match cross_types, phrase, or phrase_prefix queries, instead of being silently ignored.
Fuzzy, regexp, prefix, and wildcard queries can now be used only on text/keyword fields. Attempting to use them on numeric, date, ip, _id, or _uid fields will throw an exception.

Ongoing:

Profiler being refactored to make way for profiling more than just queries.
Snapshot index files should be written atomically and reflect the true contents of the snapshot.
Deprecation warnings should be returned to HTTP clients as headers.
Matrix aggregations bring multi-field correlations.
Work continues on making new point-based IP fields backwards compatible.
Splitting scroll requests for processing by multiple consumers.
Block indexing requests until their changes have been refreshed and are visible to search.

Apache Lucene

Index-time sorting is now supported directly in Lucene core, but we still need to take advantage of a sorted index at search time by default, and explore sorting during flush as well
The legacy SlowCompositeReaderWrap<wbr>per, an awful class that inefficiently tries to pretend you have only one segment in your index, can at long last move away from Lucene
Lucene's classic query parser should let the analyzer handle splitting tokens on whitespace if necessary rather than do that itself
The legacy spatial module could wrap the new GeoPointField as a SpatialStrategy
Maybe we should more aggressively compress the terms dictionary?
Our ant clean-jars task struggles with symbolic links
DateRangePrefixTree lets you control the calendar template
JapaneseTokenizer sometimes unexpectedly throws ArrayIndexOutOfBoundsEx<wbr>ceptions
More improvements to geo3d polygon handling
Geo3d needs a doc-values field to enable sort-by-distance
Upgrading our JFlex-based tokenizers to Unicode 8.0 is tricky
WebStart's security manager is angry about Lucene's Constants class checking system properties
Why are equals and hashCode not abstract in the Query base class?
Should MatchNoDocsQuery include a reason for its creation?
TestMoreLikeThis is still failing likely due to these recent changes
A new LSH (locality sensitive hashing) TokenFilter and query is an alternative to the standard MoreLikeThisQuery
We have too many confusing versions in Jira

Watch This Space

Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!

Elasticsearch Platform

ELK Stack

Elastic Cloud

Observability

Security

Search

Nach Branche

Nach Lösung

Kunden-Spotlight

Entwickler:innen

Vernetzen

Lernen

Hilfe

Erfahren, was es bei Elastic Neues gibt

This Week in Elasticsearch and Apache Lucene - 2016-05-17

Top News

Elasticsearch Core

Apache Lucene

Watch This Space

Folgen Sie uns:

Über uns

Bei Elastic arbeiten

Presse

Partner

Vertrauen und Sicherheit

Investor Relations

EXCELLENCE AWARDS