This Week in Elasticsearch and Apache Lucene - 2016-04-05
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
#Elasticsearch 5.0.0-alpha1 release w/#Lucene 6, ingest node, & more!
— elastic (@elastic) April 5, 2016
Changes in 2.3:
- Check that a translog is still open when asking for a new view on it.
- Some columns in the cat APIs had duplicate column aliases.
- Fixed an ArrayOutOfBounds exception when running aggregations on shards without values.
Changes in master:
- The new /_cluster/allocation/explain API explains why a shard can or cannot be allocated to nodes in the cluster.
- Type filters no long impact query time when there is only one type.
- Dynamically added string fields now add a main "text" field and a sub "keyword" field. Text fields have fielddata disabled by default.
- New dynamically settable soft limits added to protect unaware users from dangerous practices:
- Node attributes must now be specified with `node.attr.xxx` instead of `node.xxx`.
- The node.client setting has been removed in favour on node.master|data|ingest.
- Throttling of an in-flight reindex request can now be updated dynamically.
- The task management API can now return tasks grouped by parent task.
- Explain on percolator queries now only runs on queries which could match.
- The percolator query now supports scoring.
- Fixed a bug allowing OOMs when recovering from the translog.
- Removed the deprecated "reverse" option from sorting.
- Don't hide stack traces when throwing exceptions.
- Translog configuration is now immutable.
- Cluster health checks should wait for the state to be applied, not ignore in-flight requests.
- Inner hits has been refactoredz which means that the search refactoring is now complete, bar some minor cleanups.
- The convert ingest processor now supports an auto option to auto-detect date, boolean, and numeric types.
- The IndexOperationListener now reports whether a document was created or not.
- The Painless code has been cleaned up moving all Java code out of the ANTLR grammars, improving error messages, and optimizing access to _score.
- Work continues on removing PROTOTYPE from our code base.
- Adding index deletion tombstones to the cluster state to prevent old indices from popping back into existence.
- The task management API should indicate which tasks can be cancelled.
- The function_score query will learn how to combine scores from multiple queries.
- It looks like we will release Lucene 6.1.0 before 6.0.0!
- The second release candidate for 6.0.0 is out! Go test it and vote!
- Distance queries get much faster with a better test for whether a BKD cell overlaps a circle on the earth's surface, but required this cool whole-earth debugger to help understand the tricky cases
- We now have much better
Polygonsupport, including multi-polygons, optionally containing holes, such that we can run real-world polygons, like Russia, without exhausting a 10 GB java heap
- The newly created
GeoTestUtilnow has useful APIs for making random surprise-me polygons like these exotic nuclear-warfare-like shapes, and the base test class is now simpler
- Spatial tests now use
or better reproducibility
- The bare essential geo spatial utility APIs are moving to core and being consolidated so all spatial modules can share them
should quantize in exactly the same way
- We now use precisely the same constant for the mean radius of the earth when it's modeled (approximately) as a sphere
- It's tricky to get javadocs working across our spatial modules
- Our release tools still have remnants of subversion, and struggle with how we name our release branches
- Geo3d gets easy-to-use APIs matching our geo2d APIs
- The document classifier confusion matrix had buggy accuracy and precision calculations
spatial-extrasmodule has cutover to points
OfflineSortermore efficiently handles fixed-width values used by dimensional points
- We are struggling with query-time quantization issues with
- Reduce the number of polygon utility methods
- We now sometimes test triangle shapes in our geo tests
s not work with multi-valued documents
MoreLikeThisQueryshould keep track of which terms came from which fields, but this seems to cause at least one test failure
- Improve testing for long ordinals in
BKDWriterwithout having to index 2.1 billion points
OfflineSortershould not always merge down to one segment in the end
GeoPointFieldshould use the same full 64 bit encoding as
- Geo3d will also support polygons with holes, but handling "sideness" of a polygon is somewhat tricky for
- We can optimize polygon queries with faster checks for whether BKD cells overlap the query polygon
explain method can lie about its score
- Document classifier should also look at numeric fields
- Should Lucene support boolean subset matching?
- The legacy
class won't get multi-valued points support
SpanNearQuerycan assign the wrong score when inner clauses overlap
- Our web site still embarrassingly shows the latest subversion commits!
- Another randomized geo test failure, this time on a tiny radius (14.3 cm!)
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!