This Week in Elasticsearch and Apache Lucene - 2016-03-07
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Wondering why queries don't always work? @gmoskovicz dives into the details of phrase-matching in #Elasticsearch: https://t.co/b5N96J246P
— elastic (@elastic) March 7, 2016
Changes in 2.x:
- Debian's init script was not waiting for the pidfile.
- GCE Discovery plugin was missing permissions and tests.
- Index deletions missed by disconnected nodes will no longer be re-imported when the node rejoins.
- Terms queries are now considered costly, which means they will be cached more eagerly.
- A has_parent query on non-parent types no longer causes an NPE.
- Update mapping should update the metadata for all affected types.
- Speeded up shard allocator when using include/exclude shard allocation rules.
- Fixed a bug with empty buckets in the stats aggregator.
- Azure Storage client upgraded to 4.0.0.
- Deprecation logs added for:
Changes in master:
- Upgrade to Lucene 6 snapshot.
- Reindex API has landed, and now supports ingest pipelines.
- The index stats API now supports the include_segment_file_sizes to report on how much disk space is used by each Lucene file.
- Ingest nodes and available processors are reported by nodes info and by the _cat/nodes API, and the ingest_took time is available in bulk requests.
- Ingest metadata now uses cluster state diffs for lighter weight updates.
- Usage of guice has been reduced by removing DiscoveryService.
- Cygwin is not tested and not supported, so a cygwin block in bin/elasticsearch has been removed.
- Replacing string fields with text/keyword fields:
- The mapper attachment plugin has been deprecated in favour of the ingest attachment plugin.
- Client nodes are no longer special, and will be connected to (and report stats) like all other nodes.
- Bootstrap checks are now in their own class and are enforced if networking is configured. The check for file handles is lower on OS/X because of the difficulty of setting it and the low likelihood of using OS/X in production. Checks added for max processes and that mlockall was successful.
- The _optimize end point has been removed in favour of _forcemerge.
- Index-time field boosting is now applied as a query time boost, and payloads for per-field boosts in the _all field now use 1 byte instead of 4.
- The shard writeLockTimeout is no longer required.
- Rewrite range queries to match_all/match_none where the range covers all or none of the docs in a shard for better result caching.
- You shouldn't be able to delete or close an index while it is being restored.
- Keyword fields should support limited analysis.
- Removing node.client setting in favour of setting other node roles to false.
- Add ingest stats to node stats API.
- Both 6.x and 6.0.x branches are now cut, requiring fun changes to switch Lucene's master branch to a 7.x world, including the rare but exciting time when
TestBackwardsCompatibilit<wbr>yhas no indices to test!
- Point values finally support earth surface distance queries, with a delightfully simple and accurate implementation, allowing for exact accuracy for testing (no fuzz!) besides quantization error. It also has performance on par with
despite potent possible future optimizations if we can make the 2D geo math more accurate.
- Merging 2D point values across segments is suddenly 21% faster
MultiPhraseQueryis now immutable
- Clean up the overlapping methods in
Leg<wbr>acyNumericUtils,and bring back lost test cases
- Add missing getters for various queries
- Optimize point range queries that match all documents, likely a common case in time-based indices
- Point values now expose
statistics per field, for example letting us compute whether a point field is multi-valued, in addition to the existing per-dimension global min and max values
spatial4jdependency for the
spatial-extrasmodule is now upgraded to version 0.6
- The semantics of point values intersect API is now sharper: in the 1D case, all points are visited in order
- The new (in 6.0) point queries get a simpler API
- Duplicate code from
- Uwe tweaks
TSTLookupto dodge an old javac compiler bug
- The useful
checkReader,used in many Lucene tests, was failing to check points
LatLonPointAPI becomes simpler
- Sometimes randomized tests are a bit too evil
RandomCodecnow also randomizes the points format
- The legacy spatial code, with optional external
has moved to a new
spatial-extrasmodule, but required some javadocs hacks since the same package name appears in two modules now
- Improve randomized testing for the new point distance query
- Don't try to estimate match count while collecting: it's inaccurate in multi-valued cases, and doesn't seem to help performance
- The sometimes costly
TermsQueryand point queries are now cached more aggressively
- Make it easier to understand why your environment prevents Lucene's
MMapDirectoryunmap hack from working
- Lucene now always sorts in unicode order, allowing us to consolidate and remove some of the the numerous
- The debate rages on about how to refactor the spatial3d module
- Global ordinals query time join does not explain itself very well
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!