This Week in Elasticsearch and Apache Lucene - 2016-12-12
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
End-to-end Recommender System with #ApacheSpark and #Elasticsearch https://t.co/g6OmcVaz0D
— Spark Tech Center (@apachespark_tc) December 5, 2016
Changes in 2.x:
- Add a HostFailureListener to the Transport client to notify client code if a node is disconnected.
- Fix NPE when referencing a non-existent field data type in an
Changes in 5.1:
- Field names with dots should not allow intermediate
- Netty should be able to read the system-wide configuration for socket connection backlog on Linux from
- Preserve the original hostname in DiscoveryNode and TransportAddress, and when pinging.
- SearchTemplateRequest should work with wildcard indices when running with Security enabled.
- Reduce memory pressure and garbage collection when sending very large term queries.
- Wildcard and span queries were not working on specialised fields like
- Add an option to skip install-time configuration of vm.max_map_count on systemd distros.
- Set JVM thread stack size on Windows for 64bit JVMs.
- Fixed handling of multiple spaces in paths on Windows.
- Return correct term statistics when a field is not found in a shard.
- In-sync shard lists should only be trimmed when the list grows, to avoid removing valid shard copies too early.
- Create requests should reject external versioning.
- A shard marked locally as relocated should be allowed to flush and forced_merge.
- The slow log should be single line, not pretty-printed.
- The FiltersAggregationBuilder should rewrite filters early to avoid exceptions when using
Changes in 5.x:
synonyms_graphtoken filter now allows multi-token synonyms to work correctly with phrase and proximity queries.
- Added new numeric and date range fields, along with
- Synonyms in cross-field multi-match query were not being expanded to all fields.
- Replaced connectToNodeLight with connection profiles, and use profiles to reduce the number of connections needed for non-data nodes. Also support per-node connection timeouts.
- Fail node joins, index restore, open, or upgrade for index versions which are not supported by all nodes in the cluster.
- When reindex and friends have to retry a request, they should do so with the same context as the original request.
- Enabled system call filters which cannot be applied should prevent node startup.
- The task manager now returns human readable descriptions for ongoing snapshot and restore tasks, reindex, delete-by-query, and update-by-query tasks, search tasks, and bulk tasks.
- Scripts should treat
ipfields as strings.
- Unindexed fields are now visible in the field-stats API.
- X-Content parsing has been moved from RestAction to be centralised in RestRequest.
- Nodes should complete the handshake process before publishing the connection.
- Fuzzy query is no longer deprecated.
Changes in master:
- Removed the old
defaultstore type (hybrid NioFS and MMAPFS)
- Removed the deprecated
- Do not reply to ping requests from other clusters.
- Do not update nodes list when stepping down as master.
- Work has started on adding a high level Java REST client with Java dev-friendly requests, builders and response parsers.
- When promoting a replica to be primary in a mixed node cluster, choose a replica on a node with the lower version.
- Deprecate shadow replicas.
- Single index and delete operations can be replaced by the bulk API internally.
- Enable strict duplicate checks in JSON.
- Expose disk usage in node stats.
- A low-level protocol handshake would exchange version API info whenever a new transport connection is made, allowing for version changes without cluster restarts.
indices-boostquery learns to support wildcard indices and aliases.
- The latest Java 9 early access build broke Lucene's best-effort attempt to use unmap with
MMapDirectory, causing Uwe Schindler to post a stern email to the Jigsaw dev list
- Dimensional points now require substantially less search-time heap in some cases, as seen by the Lehman Brothers magnitude drop here (~59% reduction, annotation
- Numeric doc values should not let outliers, such as taxi cabs that can drive faster than the speed of light, blow up index storage of all documents
UnifiedHighlighternow lets you highlight text from other fields
- Lucene should sort new segments when they are initially written, giving a nice speedup to sorted indexing throughput
- Buffering up small leaf-block writes gives a small boost to dimensional points indexing (annotation
- If a merge hits a tragic exception while commit is running it may cause
- Lucene should not let you update a doc values field if it is used in the index sort
- A new
SpansTreeQueryrecurses the tree of span queries to compose the score based on the type of sub queries, implementing a 9 year old idea suggested on Lucene's users list
MultiFacetQueryare new sugar classes to make it easier to implement facet drill downs
- Programs like virus checkers can cause
committo fail when it tries to
- We should enable more pre-commit checks from the Eclipse
ecjcompiler we already use in our build
DrillSideways,letting you see other facet counts even after you've drilled down, should use threads to gain concurrency
- We should use the same string constants for the same parameter names across our analysis factories
Terms.intersectAPI is less trappy now
Terms.intersectAPI is trappy
IndexWriter'sjavadocs concerning NFS and
PrefixCodedTermsshould perhaps cache its hash-code, though it's controversial and perhaps only queries should do so
UnifiedHighlightershould use the
SpanCollectorAPI to be more accurate with nested span queries, and it should let you highlight text from other fields too
IndexWriterhits a tragic event when too many merges are running it can lead to deadlock
- Lucene's maven build fails to generate the javadocs jar
smartcntokenizer does not handle appended Chinese punctuation marks correctly
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!