This Week in Elasticsearch and Apache Lucene - 2016-08-01
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Elasticsearch CoreChanges in 2.x:
- A multi-match query with wildcard field names which result in no matching fields now returns a no-match query.
- The plain highlighter should ignore parent/child queries.
- Upgraded to Lucene 5.5.2.
- All CRUD requests return an
_operationto indicate what action was taken, although this will be renamed to
foundresponses have been deprecated, and the Java methods removed.
foreachingest processor did not allow mutating other fields in the document.
templatequery has been deprecated in favour of search templates.
- Reindex-from-remote now supports basic authentication, and doesn't need as many threads as provided by the REST client by default. The
_versionfield is only requested if needed.
- An overflow bug in the JVM was causing disk-free-space on massive drives to be reported as negative.
- Rolled over index names are zero-padded by default, unless a name for the new index is provided.
- The default
shard_sizeused for aggregations was overly aggressive, resulting in more memory use than required.
- Snapshot UUIDs are used to identify associated blobs, making blob deletion safer.
countAPI now accepts an empty search body, just like
forced_refreshshould only be included in the response if forced refresh was requested.
- The Java REST client has simpler sniffer initialisation.
- Refactored variable chains in Painless to make the AST much more natural.
- Time values have case-insensitive units except for
m(minutes) to avoid confusion with
- Index wildcards in cluster state requests are now also applied to the routing table.
- The cat-shards request didn't support index patterns either.
elasticsearch-translogcommand line tool allows truncation of corrupted translogs to salvage data in the index.
netty4module now depends on a released version of Netty, rather than our own temporary fork.
- Jars required by the transport client have been renamed to include
-clientin the artifact ID.
- The get-pipeline request now returns a named hash instead of an array for consistency with other APIs.
- Fixed explanation for function_score queries where no filters match.
- Index, update, and create PUT requests now return the LOCATION header of the new document.
_gce_networking special value is available in the GCE discovery plugin, even when not using
- The EC2 discovery plugin now uses the recommended
DefaultAWSCredentialsProviderChainto discover credentials.
- The YAML REST tests are now called
ClientYamlTest...to separate them from Java tests which test the REST layer.
- The completion suggester should return documents instead of individual fields.
- Matching documents returned by the search relevance framework should include the
- Elasticsearch should be able to bind to virtual network interfaces.
- Upgrade Jackson to 2.8.0 to fix a bug producing invalid JSON.
ShardRoutingprovides explicit information about where the shard is recovering from.
- Inlined parameters in scripts cause too frequent script compilations.
- Concurrent Store metadata listing has race conditions with index writes.
- A bug could cause dangling indices to be deleted instead of imported.
write_consistencyparameter will be replaced with
- Cancellable threads should not allow
- Regular histograms should be split from date histograms to allow fractional and negative buckets.
- A primary should only be able to fail a replica if it has the current primary term.
- The function score query can use a script to combine scores from other functions.
- Writing dimensional points will be substantially faster in 6.2 thanks to two separate changes, showing a 38% overall speedup when indexing 1.2 billion NYC taxi rides and obsoleting this prior change
- Tokenizing the Myanmar language got unexpectedly worse so we've restored the old syllable tokenization
AssertingPointsFormathad a silly bug preventing it from checking the wrapped points implementation correctly
MinHashFilterintentionally uses fall-through in its
IndexWriterwas way too verbose when indexing threads become stalled because writing new segments can't keep up
- The "thin wrapper" Lucene demo server has been moved to its own github project
- Near-real-time replication was missing some public APIs, uncovered when folding it into the Lucene demo server
MemoryIndexReader.fieldsis no longer accidentally 5X slower
- Dimensional points were failing to enforce maximum per-dimension byte count correctly
- The new
RangeFieldQuery,to index intervals and search by overlapping ranges, had a buggy
- Nested span queries somehow broke between 4.10.x and today
FastVectorHighlightermay have a performance regression since 4.10.x
- Efforts to provide analyzers based on OpenNLP project are progressing after years of dying-on-the-vine
- A test bug in
MoreLikeThisTeststill remains tricky to fix
MemoryIndexneeds some cleanup, including a builder API to create an immutable instance
- A new
GeoBoundingBoxFieldwill index a lat/lon bounding box as a single 4D point, and could also index altitude as a 3rd dimension
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!