This Week in Elasticsearch and Apache Lucene - 2016-05-31
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Using #Elasticsearch for #geohazards & working w' @esa to map ground deformation @terradue https://goo.gl/vcKs4X
— elastic (@elastic) May 27, 2016
Changes in 2.x:
- Highlighting on queries with geo-points no longer throws an exception.
- The _only_nodes search preference now handles multiple node names and attributes, and round robins between all matching nodes.
- Named queries, especially range or fuzzy queries, got a significant speed boost.
- Time settings were not handling decimal points for seconds correctly.
- Ensure that network messages have been successfully sent before trace logging them.
- An empty filter in the percolator no longer throws an NPE.
- The cat-indices API now expands wildcards to include closed indices.
- Joda-time upgraded to fix loading of time zones from scripts.
- Time zone rounding had some bugs in edge cases and DST transitions.
- Reindex API now defaults to using batches of 1,000 hits.
- The Azure repository now deletes files correctly.
- Dynamic templates with a match_mapping_type are no longer ignored if followed by a match: "*".
Changes in master:
- The new shrink API allows a multi-shard index to be merged down to a single shard.
- Command line settings no longer use the `es.` prefix, and system properties can no longer be set with `-D`.
- Dots in field names are supported again.
- Delete-by-query plugin has been removed as the feature has returned to core.
- Epoch datetimes now support the full range of Java Long.
- Painless scripting:
- The status of long running tasks like reindex are now persisted to an index after the task has finished.
- The cluster allocation explain API now reports whether it is still waiting for shard information.
- The _source parameter in nested hits now uses absolute paths.
- Elasticsearch now warns if the minimum_master_node setting is lower than a quorum.
- XContent objects and arrays must be closed explicitly to avoid bugs with incorrect nesting.
- When recovering from the transaction log, don't add the same transactions back into the log.
- Custom plugins path is no longer supported.
- Doc stats are now pulled from IndexWriter instead of IndexReader for more accurate values.
- A relocating replica shard will no longer fail the target shard if the old replica fails.
- Doc-values for the _type field can now be accessed correctly.
- Plugins can no longer register shard allocation commands.
- The internal ingest CompoundProcessor now exposes the actual processor throwing an exception.
- Liveness requests should never trip a circuit breaker.
- Our custom Base64 implementation has been replaced with Java's implementation.
- The percolator has moved to a module, and is able to optimise more queries.
- Nodes log their OS and JVM version during startup.
- Delayed shard allocation has been refactored and simplified.
- Lists of modules and plugins are now maintained in generated resource files.
- The Java HTTP client needs to target Java 7 as well as 8 and 9.
- The cluster name should not be appended to path.data.
- How should we deal with empty queries in 5.x?
- How can we implement scroll slicing without requiring users to provide a field to slice on?
- We should be able to reindex from a remote elasticsearch cluster.
- It should be possible to sort/aggregate ip fields coming from 2.x/5.0 indices.
- REST responses should include warnings when deprecated syntax has been used.
- CRUD requests will be able to wait until their changes are visible to search.
- Index deletion requests should only be acknowledged after the cluster state has been persisted.
- Global check points for sequence IDs should be merged soon.
- Function score query should be able to combine scores from different queries using a script.
- Index templates should be validated when created or updated.
- Reindex and update-by-query should support document deletion.
- Rally's benchmark script should be separate from the tracks being benchmarked, which allows benchmarking different versions of Elasticsearch. Also adding logging data set.
- 6.0.1 vote has passed so the bits will be set free shortly
- The leaky-abstraction
SlowComposit<wbr>eReaderWrapperis finally gone from Lucene, a process at least 7 years in the making when Lucene first switched to near-real-time searching!
- Lucene now supports half-float points, using 2 bytes to represent a floating point number
- Some as yet unexplained scary bug lurks deep inside
IndexWriterwhen you mix updates of documents and doc values
- A new Ukrainian lemmatizer leads to discussions about how it differs from the existing hunspell-based tokenizer
- The confusion matrix computation in Lucene's classifier module now uses the macro average to avoid bias
- Don't throw a
NullPointerExceptionif you try to run
on a non-existing field
- A new directory wrapper uses hard links when possible to optimize copying files
- More release script improvements
- Lucene's internal
BytesRefHashclass, used in several hotpots including buffering postings in RAM, gets a nice speedup by switching to a radix sort, and it looks like dimensional points, also sort intensive during indexing, will get a nice speedup as well
ArrayUtilhad accumulated some rust
explain now includes the explanation from its children
- Can we use doc values instead of a heap-resident bitset to implement block-joins?
- Another user falls into the unfortunately common trap of thinking Lucene's stored fields store all information about a field
- A Java 1.9 javadocs bug causes our javadocs to fail
TermAutomatonQuery,a fun query letting you query with complex graph-like phrases, had a ridiculously costly
DocIdSetBuilderBuilder,a hot spot used to efficiently gather matching docIDs, gets a nice speedup, so much so that we were able to switch the
LatLonPointqueries back to it
IndexWritershould tell you the effective order of concurrent operations
- We upgraded
ForbiddenAPIsto version 2.1 to get a number of improvements including better Java 1.9 support
methods are now abstract in the
- Remove the added dependency from highlighter to spatial
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!