This Week in Elasticsearch and Apache Lucene - 2016-03-14
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
How does @Zymergen build, test, & analyze DNA mods to microbes at scale? https://t.co/7OUmnHavz8 #Elasticsearch pic.twitter.com/dfPZCPmPwc
— elastic (@elastic) March 8, 2016
Changes in 2.x:
- The Tribe node now passes an explicit whitelist of settings through to the client nodes which connect to each cluster. Later, plugins will have an extension point for adding plugin-specific settings to the whitelist.
- Any deprecated parameters parsed by ParseFieldMatcher now get deprecation logging for free.
- Trying to close or delete an index while it is being restored will now fail the close/delete request.
- The `lat_lon` and `precision_step` parameters to `geo_point` fields are deprecated as they are no longer configurable with the new geo-point format. The`validate` and `normalize` parameters now have deprecation logging.
- The geo distance and geo range distance queries no longer support the `.geohash` suffix as it is not needed and makes the query ambiguous.
- The `has_child` query now respects the configured similarity.
- Multi-index expressions starting with `*` were ignoring exclude expressions.
Changes in master:
- Index lookups now use the index UUID instead of by name, and index names are resolved to UUID as early as possible.
- `string` fields will be replaced by `text` and `keyword` fields in 5.0, with the following bwc layer:
- String mappings in old indices will not be upgraded.
- Text/Keyword mappings can be added to old and new indices.
- String mappings on new indices will be upgraded automatically to text/keyword mappings, if possible, with deprecation logging.
- If it is not possible to automatically upgrade, an exception will be thrown.
- Norms can no longer be lazy loaded. This is no longer needed as they are no longer loaded into memory. The `norms` setting now take a boolean. Index time boosts are no longer stored as norms.
- Command line settings can no longer use the -- style. Instead, they should be specified with a `-E` prefix.
- Trying to close or delete an index while it is being snapshotted will now fail the close/delete request.
- Scripting engines no longer try to compile hidden files in the script directory.
- The `-XX+AlwaysPreTouch` flag means all memory pages are now committed to memory at startup.
- The deprecated `ignore_unmapped` parameter has been removed from sorting.
- Queries deprecated in 2.0 have now been removed.
- The `multi_field` field datatype, deprecated in 1.0, has been removed.
- The generic thread pool is now bound to 4x the number of processors.
- The `collect_payloads` parameter to `span_near` is deprecated. Payloads are now loaded when needed.
- The cat-recovery API now supports the raw values `bytes_recovered` and `files_recovered`, and the `translog` and `translog_ops` columns have been renamed to be more explicit.
- Dynamic field addition now happens at the end of doc parsing, in preparation for supporting dots in field names.
- The search refactoring is nearing its end with only suggesters, sort, and inner hits outstanding.
- The percolator API will be deprecated in favour of a percolator query, which will deliver a number of requested features to the percolator.
- Once "primary terms" have been added to master, we will be able to enable the acked indexing test.
- The reindex API will support throttling.
- Index data folders will be named according to the index UUID, rather than the index name.
- Storing the cluster UUID in index metadata will allow Elasticsearch to no longer import dangling indices which were deleted while a node was disconnected from the cluster.
- The new dimensional points feature for 6.0.0 is getting intense pre-release scrutiny, which is uncovering number of fun bugs and API usability issues, causing us to delay the first 6.0.0 release candidate:
method was broken, returning false when queries were in fact the same
- Sparse points fields were not always handled correctly on merge; a dedicated sparse points test should help uncover any other sparse points issues
- The copy constructor for
FieldTypecompletely ignored points!
- A newly added
,to ensure you can index more than 2.1 billion points in a single segment, uncovered an int overflow bug after running for 22 hours
- The default codec's points implementation was missing some
- The legacy
SlowCompositeReaderWrap<wbr>per,an awful class that inefficiently tries to pretend you have only one segment in your index, does not support points, and is now moved out of Lucene's core
SimpleTextcodec falsely failed its
CheckIndexif a points field has zero points
CHANGES.<wbr>txtdescriptions of dimensional points is better now
newSetQueryAPI now also conveniently accepts a
Collectionof boxed values in addition to existing varargs of each primitive type
CheckIndexforgot to tell you it was in fact checking points
- Dead code is being removed
- Cutover existing users from legacy numeric fields to the new dimensional points:
- The legacy uninverting
FieldCachecan now un-invert single-valued points fields
- Both the flexible and XML query parsers now support points
UninvertingReaderstill needs to support multi-valued points
joinmodule still needs to support points
l needs to switch to points
MemoryIndexdoes not yet support points
- The legacy uninverting
- Lucene's default codec will now also use prefix compression on fixed-width doc values data (e.g. derived from
- A new
finds the nearest indexed point to a query point, something KD trees excel at, but the latest patch is still vulnerable to adversaries
- Spatial3d now exposes only the WGS84 planet model which is the most accurate one it supports
OfflineSorterwill be faster in 6.1.0, by reducing unnecessary byte copying
- Group search hits by hamming distance
- All queries should be immutable since they can be enrolled as a cache key
ant precommitnow fails on code comparing already identical values, and also on useless assignments
TopDocsby docs while keeping the result ranks
- A rare test bug, which only happens when we randomly generate exactly the same bytes already in an index file, is fixed
NRTCachingDirectoryoptionally logs to
- Codec level encryption remains controversial
- Add doc values support to
- 800+ new top-level-domains have been created since we last fixed
StandardTokenizerto detect them!
- The nightly smoke tester was confused by newly old back compat indices
- This test was too evil, taking more than 2 hours to run with just the right seed
PointRangeQuerynow optimizes the likely common case when all documents will match
- A few missing
s'stouched a lot of source files
- Split out the geo3d math-only APIs under a
- A troublesome facets test has been removed since it tested floats (and struggled with 1 ulp differences) when in fact facets only supports doubles
MemoryIndexnow also accepts
IndexDoc<wbr>umentinstead of a document
FilterXclasses are now abstract
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!