This Week in Elasticsearch and Apache Lucene - 2016-07-26
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Wrote a post about combining #elasticsearch RestClient with #jtwig templates for creating and executing queries: https://t.co/LO8fCj7lqk
— Jettro Coenradie (@jettroCoenradie) July 25, 2016
Changes in 2.x:
- S3 repository now supports path style access for virtual hosting of buckets.
- S3 and EC2 should allow for different AWS key pairs.
not_analyzedstring fields should reject
Changes in master:
- The new
scaled_floatfield type allows eg percentages to benefit from compression techniques used for
- The result cache can be explicitly enabled per-request even when it returns search hits.
- Reindex throttling is now disabled with
- Automatically created indices should honour
- The analyze API now supports defining custom character filters, token filters, and tokenizers inline.
- Indexing into a relocating primary while replicas are recovering will no longer result in document loss.
- Elasticsearch should reject dynamic templates with unknown
- The Java REST client now supports async and blocking requests, and a benchmark compares transport client to HTTP.
- Blocking tasks should not be run on the Scheduler thread.
- Resetting a recovery respects reference counting and locking, closes streams and removes all files.
- The cardinality aggregation now has a fixed default precision which is easier for the user to understand.
- The request circuit breaker now takes aggregation buckets into account.
- Automatically generated node names now persist after the node restarts.
- Scripts used in ingest pipelines preserve the original exception for easier debugging.
on_failureshould halt any further ingest processing.
- Analyzer aliases now work correctly, but will be removed for 5.0.
- CORS default settings for
allow-headerswere not being used.
- A Netty4 module is available, but depends on as yet unreleased bug fixes in Netty.
- Nested queries can be used inside nested aggregations.
- Static methods on Store class need to be shard lock aware to avoid race conditions.
- All aggs have been moved to use NamedWritable instead of AggregationStreams.
- Plugins registering queries should use the
- Mappings introduced a
_parent#nullfield when parent/child was not used.
- TCP transports should map their internal exceptions to those defined by the TCPTransport class.
- Index, update, and create REST requests return a LOCATION header.
- The search relevance framework gains a REST interface and support for reciprocal rank and discounted cumulative gain.
- A command line tool will allow you to lose data in a corrupt transaction log while recovering data already in the index.
- Histograms and date histograms are being split so that the former can bucket on decimal values.
- Elasticsearch should be able to listen to virtual network interfaces.
write_consistencysetting will be replaced with
wait_for_active_shardswhich better describes the intent.
- The new completion suggester should return matching documents.
- Similarities should be dynamically updatable.
- Reindex from remote should support reading from clusters that require authentication.
- Rally should decouple job scheduling from execution, which will allow support for multiple load generators.
- Configuring network partitions in tests should be easier.
- Shard copies should only be marked as stale after an acknowledged write.
CustomAnalyzerwas accidentally switched to use the wrong default attribute factory, but Uwe fixed it
- A new
DoubleRangeFieldwill index a multi-dimensional range, such as a day range in a calendar, using dimensional points which
RangeFieldQuerycan then search by overlapped range;
LongRangeFieldare coming next
TermQueryno longer seeks the terms dictionary
- Indexing performance tests on the 1.2 B documents New York City taxi rides corpus uncovered performance problems when writing dimensional points to disk using a large indexing buffer
MemoryIndexReader.fieldsbecame accidentally 5X slower recently
- Dimensional points were failing to enforce maximum per-dimension byte count correctly
- The flexible query parse will also be fixed to not pre-split on whitespace, letting the analyzer do that instead
DecimalDigitFilterhas problems with digits that use Unicode's non-BMP supplemental characters
ant1.9.6 somehow breaks Lucene's generated file formats javadocs link
- Now that coord is gone,
BooleanQuerycan be optimized to flatten any nested disjunctions, often created by rewritten queries
- Some small javadocs improvements were made to
- The obsoleted
- Lucene's builds now run to completion even if some tests fail
JGitis upgraded to version 4.4.1 in Lucene's build scripts
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!