9 mai 2016

This Week in Elasticsearch and Apache Lucene - 2016-05-09

Par

•

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Top News

Detecting Geo-Temporal anomalies with #Elasticsearch pipeline aggs Blog post: http://bit.ly/1TKfoMf #gis

— Dave Erickson (@davebenigno) May 4, 2016

Elasticsearch Core

Changes in 2.x:

Don't try to compute completion stats on a closed reader - can cause the JVM to crash.
HTTP compression didn't work when CORS was enabled.
Using a wildcard to specify which fields to highlight will only return string fields.

Changes in master:

HTTP compression is enabled by default (compression level 3) and compressed HTTP requests can always be accepted regardless of whether compression is enabled or not.
Lucene expressions gained the doc[field].empty property and support for geo_point fields.
Painless field and method accesses got a 20% speed boost, and gained support for .value/.values (with a 400% speed boost) and support for geo_point fields.
Painless now supports single quotes, which makes scripts in JSON much more readable.
Scripting docs had a big rewrite.
Bootstrap checks now check that the server JVM is in use.
Added an escape hatch for those times when don't have control over system properties that are checked in the bootstrap production checks.
Aggregations like `top_hits` which need access to the _score in a sub-aggregation can now use `breadth_first`.
Nodes now exchange a handshake when connecting, to exchange node information, cluster name, and Elasticsearch version. This will allow us to extend APIs that are used during initial cluster recovery without a major version change.
Binary values like the new `ip` fields can now be used for sorting.
The REST query string parser now understands semi-colons as separators, as well as ampersands.
ES_JAVA_OPTS nows passed to elasticsearch-plugin.
The deprecated `string` field cannot be combined with the new `index:true|false` setting.
All nodes requests now include any node failures that occurred.
All uses of Strings#splitStringToArray have been replaced with String#split.
Reindex throttling requests are now parsed more strictly.
Ingest now checks for missing processors instead of throwing an NPE, and a bug was fixed when collecting pipeline stats.
We now test AUTOSENSE source snippets in the docs. This tester is being moved to a plugin to make it available to other projects.
RPMs no longer specify parent directories like /var/run, to avoid clashing with other packages, and edited config files are preserved on upgrade.

Ongoing changes:

Work has started on refactoring snaphot/restore, with the first step being to remove the Snapshot class in favour of SnapshotInfo. More tasks listed here.
The allocation explain API will indicated whether shard stats are still pending.
Delete-by-query plugin is being moved to the reindex infrastructure.
Remove the `es.` prefix from settings on the command line.
Add `exclude_template` parameter to templates to avoid matching hidden indices.

Apache Lucene

geo3d can now run the Russia polygon, with ~11.6 K vertices, producing nearly the same hit count as LatLonPoint
Numerous geo3d polygon optimizations, including improvements to the polygon query tree; geo3d is now included in Lucene's nightly geo benchmarks going forward
Optimize TermsQuery to use a boolean instead of HashSet to record whether only one field is queried
The horrible schema ghost case in Lucene, where some documents in a segment use feature X, but then they were all deleted and merged away, yet the ghosts of feature X remain, continues haunting us
Index-time sorting should be better supported in Lucene's core
XMLQueryParser makes it hard for subclasses to create span queries
Collecting filter hits is now faster by making better use of index statistics up front to save work
A user finally hit one of our particularly satisfying recently added Lucene exceptions, but the exception message was missing some information
A new patch on this long-standing and controversial issue adds support to encrypt doc values
Add Logistic Regression support to Lucene's classification module
We failed to retag master issues in our issue tracker after releasing 6.0
Japanese (Kuromoji) tokenizer should allow filtering tokens from a provided parts-of-speech set
800+ new top-level-domains have been created since we last fixed StandardTokenizer to detect them, but the JFlex release is taking too long so we will proceed with Plan B
QueryParser should let you sometimes use unescaped internal operator characters
Doc values could optimize for the sparse use case, if we change their read-time API to be an iterator like postings
How to grow a bitset as you add hits to it is tricky
It's not clear our Arrays.TimSort is any better than the JDK's builtin Arrays.sort
Can we speed up how to skip fields in compressed space when loading a subset of fields from a document?
Lucene's default offset gap is sometimes surprising

Watch This Space

Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!

Elasticsearch Platform

Suite Elastic

Elastic Cloud

Observability

Security

Search

Par secteur

Par solution

Témoignage client

Développeurs

Communication

Apprentissage

Aide

Actualités d'Elastic

This Week in Elasticsearch and Apache Lucene - 2016-05-09

Top News

Elasticsearch Core

Apache Lucene

Watch This Space

Nous suivre

À propos de nous

Emplois

Presse

Partenaires

Confiance et sécurité

Relations investisseurs

EXCELLENCE AWARDS