1 août 2016

This Week in Elasticsearch and Apache Lucene - 2016-08-01

Par

•

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Top News

The customer asked, "What was that?" We replied, "Oh, that was Elastic." https://t.co/OwY8E4L3eX via @elastic
— Christoph Wurm (@ChristophWurm) July 26, 2016

Elasticsearch Core

Changes in 2.x:

A multi-match query with wildcard field names which result in no matching fields now returns a no-match query.
The plain highlighter should ignore parent/child queries.
Upgraded to Lucene 5.5.2.

Changes in master:

All CRUD requests return an _operation to indicate what action was taken, although this will be renamed to result. The created and found responses have been deprecated, and the Java methods removed.
The foreach ingest processor did not allow mutating other fields in the document.
The template query has been deprecated in favour of search templates.
Reindex-from-remote now supports basic authentication, and doesn't need as many threads as provided by the REST client by default. The _version field is only requested if needed.
An overflow bug in the JVM was causing disk-free-space on massive drives to be reported as negative.
Rolled over index names are zero-padded by default, unless a name for the new index is provided.
The default shard_size used for aggregations was overly aggressive, resulting in more memory use than required.
Snapshot UUIDs are used to identify associated blobs, making blob deletion safer.
The count API now accepts an empty search body, just like search.
forced_refresh should only be included in the response if forced refresh was requested.
The Java REST client has simpler sniffer initialisation.
Refactored variable chains in Painless to make the AST much more natural.
Time values have case-insensitive units except for m (minutes) to avoid confusion with M (months).
Index wildcards in cluster state requests are now also applied to the routing table.
The cat-shards request didn't support index patterns either.
The elasticsearch-translog command line tool allows truncation of corrupted translogs to salvage data in the index.
The netty4 module now depends on a released version of Netty, rather than our own temporary fork.
Jars required by the transport client have been renamed to include -client in the artifact ID.
The get-pipeline request now returns a named hash instead of an array for consistency with other APIs.
Fixed explanation for function_score queries where no filters match.
Index, update, and create PUT requests now return the LOCATION header of the new document.
The _gce_ networking special value is available in the GCE discovery plugin, even when not using gce discovery.
The EC2 discovery plugin now uses the recommended DefaultAWSCredentialsProviderChain to discover credentials.
The YAML REST tests are now called ClientYamlTest... to separate them from Java tests which test the REST layer.

Ongoing changes:

The completion suggester should return documents instead of individual fields.
Matching documents returned by the search relevance framework should include the _index and _type.
Elasticsearch should be able to bind to virtual network interfaces.
Upgrade Jackson to 2.8.0 to fix a bug producing invalid JSON.
The recoverySource field in ShardRouting provides explicit information about where the shard is recovering from.
Inlined parameters in scripts cause too frequent script compilations.
Concurrent Store metadata listing has race conditions with index writes.
A bug could cause dangling indices to be deleted instead of imported.
The write_consistency parameter will be replaced with wait_for_active_shards.
Cancellable threads should not allow Thread#interrupt()
Regular histograms should be split from date histograms to allow fractional and negative buckets.
A primary should only be able to fail a replica if it has the current primary term.
The function score query can use a script to combine scores from other functions.

Apache Lucene

Writing dimensional points will be substantially faster in 6.2 thanks to two separate changes, showing a 38% overall speedup when indexing 1.2 billion NYC taxi rides and obsoleting this prior change
Tokenizing the Myanmar language got unexpectedly worse so we've restored the old syllable tokenization
AssertingPointsFormat had a silly bug preventing it from checking the wrapped points implementation correctly
MinHashFilter intentionally uses fall-through in its switch statements
IndexWriter was way too verbose when indexing threads become stalled because writing new segments can't keep up
The "thin wrapper" Lucene demo server has been moved to its own github project
Near-real-time replication was missing some public APIs, uncovered when folding it into the Lucene demo server
MemoryIndexReader.fields is no longer accidentally 5X slower
Dimensional points were failing to enforce maximum per-dimension byte count correctly
The new RangeFieldQuery, to index intervals and search by overlapping ranges, had a buggy equals implementation
Nested span queries somehow broke between 4.10.x and today
FastVectorHighlighter may have a performance regression since 4.10.x
Efforts to provide analyzers based on OpenNLP project are progressing after years of dying-on-the-vine
A test bug in MoreLikeThisTest still remains tricky to fix
MemoryIndex needs some cleanup, including a builder API to create an immutable instance
A new GeoBoundingBoxField will index a lat/lon bounding box as a single 4D point, and could also index altitude as a 3rd dimension

Watch This Space

Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!

Elasticsearch Platform

Suite Elastic

Elastic Cloud

Observability

Security

Search

Par secteur

Par solution

Témoignage client

Développeurs

Communication

Apprentissage

Aide

Actualités d'Elastic

This Week in Elasticsearch and Apache Lucene - 2016-08-01

Top News

Elasticsearch Core

Apache Lucene

Watch This Space

Nous suivre

À propos de nous

Emplois

Presse

Partenaires

Confiance et sécurité

Relations investisseurs

EXCELLENCE AWARDS