This Week in Elasticsearch and Apache Lucene - September 15 2015
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Top News
Looking for some weekend #elasticsearch reading? Check out @BennacerSamir’s primer on “Hot-Warm” architecture http://t.co/yCuXIyTkyf
— elastic (@elastic) September 11, 2015
Elasticsearch Core
- Aggregations: Pipeline Aggregations at the root of the agg tree are now validated (#13475, 3.0.0, 2.1.0)
- Parent/Child: Deprecate
score_type
option in favour of the score_mode option (#13478, 2.1.0) - CAT API: Rename start to verify_index in cat test (#13491, 3.0.0, 2.1.0, 2.0.0-beta2)
- Internal: Remove and forbid use of
com.google.common.base.Preconditions#checkNotNull
(#13493, 3.0.0) - Internal: Remove and forbid use of
com.google.common.collect.Queues
(#13498, 3.0.0) - Tests: Reorganize sharing of constants for mock cluster info service (#13497, 3.0.0, 2.1.0)
- Plugin Cloud GCE: ec2/azure discovery plugins must declare their
UnicastHostsProvider
(#13501, 3.0.0, 2.1.0, 2.0.0-beta2) - Doc: Added documentation for default execution value (range filter) (#13504, 1.7.2)
- Query DSL: Remove deprecated support for
boost_factor
in function score query (#13510, 3.0.0) - Allocation: Take relocating shard into consideration during awareness allocation (#13512, 3.0.0, 2.1.0, 2.0.0-beta2, 1.7.2)
- Docs:: Fix for
mappings->_source
example in docs (#13515, 3.0.0, 2.0.0-beta2, 1.7.2) - Settings: Stop
o.e.c.s.Settings
from leaking Guava dependency (#13517, 3.0.0, 2.1.0, 2.0.0-beta2) - Internal: Cleanup
SearchRequest & SearchRequestBuilder
(#13518, 3.0.0) - Internal: Forbid Guava in all instead of core (#13523, 3.0.0)
- Internal: Remove and forbid use of several `com.google.common.util.`` classes (#13524, 3.0.0)
- Internal: Remove and forbid use of
com.google.common.collect.ImmutableSortedMap
(#13525, 3.0.0) - Internal: Ban
setAccessible
from core code, restore monitoring stats under java 9 (#13531, 3.0.0) - Internal: Hack around aws security hole of accessing
sun.security.ssl
, s3 repository works on java 9 again (#13538, 3.0.0) - Tests: Remove easy uses of
setAccessible
in tests. (#13537, 3.0.0) - Tests: Remove all
setAccessible
in tests and forbid (#13539, 3.0.0) - Recovery: Failed to properly ack translog ops during wait on mapping changes (#13535, 3.0.0, 2.1.0, 2.0.0-beta2)
- Allocation: Fix
AwarenessAllocationIT.testAwarenessZones
test (#13519, 3.0.0) - Internal: Use
Supplier
instead of Reflection (#13545, 3.0.0) - Internal: Cleanup
InternalClusterInfoService
(#13543, 3.0.0, 2.1.0) - Internal: Remove and forbid use of guava
Function, Charsets, Collections2
(#13533, 3.0.0) - Internal: Replace
LoadingCache
usage with a simple ConcurrentHashMap (#13552, 3.0.0) - Packaging: Fix
service.bat
start/stop issues (#13398, 3.0.0, 2.1.0, 2.0.0-beta2) - Build: Fix compiler warnings (#13410, 3.0.0, 2.1.0)
- Internal: Do not dump stack traces of threads on test failure (#13440, 3.0.0)
- Internal: Remove and forbid use of
com.google.common.collect.Maps
(#13438, 3.0.0) - Tests: Remove noise when IDE test runners try to System.exit (#13442, 3.0.0)
- Core: Remove support for deprecated queries. (#13418, 3.0.0)
- Mapping: Parent field mapper should always store doc values join field. (#13430, 3.0.0)
- Nested Docs: If sorting by nested field then the
nested_path
should always be specified (#13429, 3.0.0, 2.1.0) - Internal:
has_child/has_parent
query parsers shouldn't require search context (#13432, 3.0.0, 2.1.0) - Internal: Several other parent/child cleanups (#13470, 3.0.0)
- Internal: Remove and forbid use of
com.google.common.collect.Sets
(#13463, 3.0.0) - Tests: Move static bwc indexes to a shared location (#13443, 3.0.0, 2.1.0)
- Internal: upgrade lucene to r1702265 (#13439, 3.0.0)
- Docs: Document meaning of "FST" and "FSTs". (#13455, 3.0.0, 2.1.0, 2.0.0-beta2, 1.7.2)
- Aggregations: Weighted centroid for
geohash_grid
(#13433, 2.0.0-beta2) - Snapshot/Restore: Simplify the
BlobContainer
blob writing interface (#13434, 3.0.0, 2.1.0) - Packaging: Packaging tests use Java 8 (#13422, 3.0.0, 2.1.0, 2.0.0-beta2)
- Packaging: Test that packages don't start elasticsearch (#13275, 3.0.0, 2.1.0, 2.0.0-beta2)
- Packaging: Start plugins in package tests (#13258, 3.0.0, 2.1.0, 2.0.0-beta2)
- Tests: Reenable spaces in paths during integration tests (#13211, 2.1.0, 2.0.0-beta2)
- Allocation: Remove
DisableAllocationDecider
(#13313, 3.0.0) - Snapshot/Restore: Snapshot restore request should accept indices options (#13357, 3.0.0, 2.1.0, 2.0.0-beta2, 1.7.2)
- Discovery: Add a dedicate queue for incoming
ClusterStates
(#13303, 3.0.0) - Stats: Adds stats counter for failed indexing requests (#13130, 3.0.0, 2.1.0)
- : Fix test for _cat/nodeattrs (#13189, 3.0.0)
- Search: Limit the size of the result window to a dynamic property (#13188, 3.0.0, 2.1.0)
- Plugin Cloud AWS: Move integration tests to unit tests (#12844, 2.1.0)
- Allocation: Take initializing shards into consideration during awareness allocation (#12551, 3.0.0, 2.1.0, 2.0.0-beta2, 1.7.2)
- Plugins: Output plugin info only in verbose mode (#12908, 3.0.0, 2.1.0)
- Discovery: Add two phased commit to Cluster State publishing (#13062, 3.0.0)
Apache Lucene
- Unfortunately, if you attempt to negate the smallest negative integer (
Integer.MIN_VALUE
, in Java) you still have a negative number! - At long last, we can change Lucene's default scoring to BM25, which is known to give higher relevance than the current TF/IDF default
- Soon we will finally be able to deprecate
Filter
- Why does a trailing space change how JapaneseTokenizer tokenizes an input?
- 5.3.1 bugfix release is coming soon
- The new
BoostQuery
decouplesQuery
from boosting TermsQuery.toString
did not work with binary terms- Geo3d now handles planets more squashed than earth, also handles degenerate tiny-radius circles and uses factory APIs to create
XYZSolid
and circles - Reduce GC load for the new geo point queries by re-using
BytesRef
instances - Early access Jigsaw testing resulted in many small Lucene fixes
TestSecurityManager
now passes within popular IDEs- Simplify
IndexFileDeleter
: remove the dangerous refresh
method that deletes unreferenced per-segment files on exception - Contain reflection on internal JDK classes by
MockFileSystem
to Lucene's test-framework - Enable
IndexSearcher
query cache by default - Nested conjunctions are now flattened
- Even javadocs cannot use forbidden APIs
- Fix Lucene's default scoring (
DefaultSimilarity
) to not assign zero score when one of theSHOULD
clauses in aBooleanQuery
is against a field with no documents FuzzyLikeThisQuery.rewrite
should not have side effects, altering the state of the original query GeoPointDistanceQuery
is buggy with large distances- Highlighting with nested span queries can highlight the wrong terms
- Can we simplify Lucene's search APIs by merging
rewrite
andcreateWeig<wbr>ht
? SpanNearQuery
behaves differently fromPhraseQuery
when there is a run of more than one stopword between terms- The
hashCode
implementation forNumericRangeQuery
produces frequent collisions - An initial patch is up for a fast point-within-distance query implemented with BKD trees
GeoPointDistanceRangeQuery
will match points within a min/max distance range AssertingIndexSearcher
should detect when a query'sexplain
implementation lies- When handling a tragic exception
IndexWriter
fails to wait for any concurrent commits to first finish - Improve
FSDirectory
javadocs to explain why symlinks to an index directory are problematic BooleanQuery
should discard duplicate non-scoring clausesBooleanQuery.equals
is unfortunately sensitive to clause order, but fixing it is controversial- Reduce heap used by
CompressingStoredFieldsWrit<wbr>er
when writing large strings during indexing
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole ELK ecosystem including news, learning resources and cool use cases!