This Week in Elasticsearch and Apache Lucene - July 15, 2015
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Elasticsearch Core
- Cluster: Remove double call to elect primaries (#12147, 2.0.0)
- Index: Use
IndexWriter.hasPendingChanges()
to detect if a flush is needed (#12146, 1.7.0, 1.6.1) - Testing: allow settings to be passed to client for external test cluster (#12155, 2.0.0)
- Plugins: Fix pluginmanager permissions for
bin/
scripts (#12157, 2.0.0) - Plugins: remove elasticsearch- from name of official plugins (#12158, 2.0.0)
- Plugins: strip elasticsearch- and es- from any plugin name (#12160, 2.0.0)
- Backport to 1.7: Snapshot info should contain version of elasticsearch that created the snapshot (#12162, 1.7.0)
- Internal: Make
2.0.0.beta1-SNAPSHOT
the current version (#12151, 2.0.0) - Aggregations: Adds new script API to
ValuesSourceMetricsAggregationBuilder
(#12152, 2.0.0) - Search: Free all pending search contexts if index is closed or removed (#12180, 2.0.0, 1.7.0, 1.6.1)
- Build: Allow rpm to be build as part of package phase (#12181, 2.0.0)
- Aliases: Don't require fields in alias filters to exist in the mapping (#12150, 2.0.0)
- Packaging: jsr166e was left out of shaded jar (#12194, 2.0.0)
- Term Query: Be more strict during parsing (#12195, 2.0.0)
- Core: Only clear open search ctx if the index is delete or closed via API (#12199, 2.0.0, 1.7.0, 1.6.1)
- Exceptions: Throw
LockObtainFailedException
exception when we can't lock index directory (#12203, 2.0.0, 1.7.0, 1.6.1) - Versioning: updated the elasticsearch versioning format (#12210, 2.0.0)
- Stats: Failure during the fetch phase of scan should invoke the failed fetch… (#12087, 2.0.0)
- Packaging: Give a better exception when a jar contains same classfile twice. (#12093, 2.0.0)
- Stats: Update fs stats (#12053, 2.0.0)
- Scripting: Simplify
CacheKey
used for scripts (#12092, 2.0.0) - Packaging: Allow use of bouncycastle (#12102, 2.0.0)
- Tests: Add unit tests for
JarHell
(#12106, 2.0.0) - Query DSL: Default fuzzy transpositions to true (#12090, 2.0.0)
- Testing: Harden integration tests setup/teardown more (#12107, 2.0.0)
- Internal: Change
JarHell
to operate on Path instead of URL (#12109, 2.0.0) - Test: print PID when starting ES in integ tests too. (#12111, 2.0.0)
- Mapping: Remove
AbstractFieldMapper
(#12089, 2.0.0) - Testing: Fix tophints noise (#12112, 2.0.0)
- Testing: Get integration tests working on windows. (#12117, 2.0.0)
- phonetic plugin: move integration tests to REST tests (#12095, 2.0.0)
- Internal: Consolidate
ShardRouting
construction (#12125, 2.0.0) - Internal: Cleanup
ShardRoutingState
uses and hide implementation details of ClusterInfo (#12126, 2.0.0) - Stats: Update process stats (#12043, 2.0.0)
- Plugins: move integration tests to REST tests (#12128, 2.0.0)
- Query DSL: Expose Lucene's new
TopTermsBlendedFreqScoringRewrite
(#12129, 2.0.0) - Stats: Update OS stats (#12049, 2.0.0)
- Internal: Remove mapper references from Engines (#12130, 2.0.0)
- Bulk: Add support for retrieving fields in bulk updates (#12114, 2.0.0)
- Snapshot/Restore: Add validation of snapshot
FileInfo
during parsing (#12108, 2.0.0, 1.7.0, 1.6.1) - Search: Clean up handling of missing values when merging shard results on the coordinating node. (#12127, 2.0.0)
- Build: Calculate artifact checksums in maven (#12085, 2.0.0)
- Internal: Refactor MetaData to split off the concrete index name logic to a dedicated service (#12058, 2.0.0)
- Plugins: Simplify Plugin Manager for official plugins (#11805, 2.0.0)
- Shadow Replicas: Fail engine without marking it as corrupt when recovering on SharedFS (#11933, 2.0.0, 1.7.0)
- Query API: Fix FuzzyQuery to properly handle Object, number, dates or String. (#12020, 2.0.0)
- Query DSL: Special case the
_index
field in queries (#12027, 2.0.0) - Build: cloud-aws doesn't register s3 repos anymore (#12036, 2.0.0)
- Term Vectors: Only load term statistics if required (#11737, 2.0.0, 1.7.0, 1.6.1)
- Startup: Do not prompt for node name twice (#11668, 2.0.0, 1.7.0, 1.6.1)
- Aggregations: Add cost minimizer to tune
moving_avg
parameters (#11881, 2.0.0) - Inner hits: Properly support named queries for both nested and parent child inner hits (#11880, 2.0.0)
- Snapshot/Restore: Snapshot info should contain version of elasticsearch that created the snapshot (#11985, 2.0.0)
- Allocation: Shard Started messages should be matched using an exact match (#11999, 2.0.0, 1.7.0)
- Java API: Treat path object as a simple value instead of
Iterable
in XContentBuilder (#11903, 2.0.0) - Settings: change CORS allow origin default to allow no origins (#11890, 2.0.0)
- Cluster: Add
MetaData.uuid
to ClusterState.toXContent (#11832, 2.0.0) - Query DSL: Minor fixes to the
match
query (#8352, 2.0.0) - Aggregations: add serial differencing pipeline aggregation (#11196, 2.0.0)
- Scripting: Add script type and script name to error messages (#11449, 2.0.0)
Apache Lucene
- The number of occurrences of "the the" and "a a" in our javadocs is scary
- IndexUpgrader fails to upgrade the index if there are 0 segments
SortedSet
andSortedNumeric
doc values should optimize the case when the number of unique values is low - Remove the hard limit on maximum concurrency in
IndexWriter
- Speed up how Lucene computes union of many small postings lists
- List directory files only once when creating an
IndexWriter
- Fix false test failures when the JVM and/or OS can't handle non-ascii filenames
- The new command-line
CheckJoinIndex
tool confirms you indexed your documents correctly for block joins MappingCharFilter
produces broken offsets, but fixing the bug is tricky and contentious- Solr's
ValueSourceRangeFilter
should move to the Lucene queries
module - A new
SynonymGraphFilter
fixes sneaky graph bugs in the currentSynonymFilter
, but requires a separate graph flattening filter if synonyms are applied during indexing - Maybe
geo3d
should be factored out to its ownspatial3d
module SynonymFilter
clears custom token attributes, but it's not clear we should fix it- Query cache was failing to notice a query was used if the first segment in the index had no hits
Filter
is eradicated from thejoin
module, and soon also spatial module- Switch to a more efficient iterator API to pull all finite strings from an automaton, and add a
toposort
implementation to Lucene's automaton APIs TermAutomatonQuery
should not throwNullPointerException
when one segment is missing some terms - Add K nearest neighbor and simple naive bayes document classifiers to Lucene's classifier module
- How can a given query opt out of caching?
- A spooky new IBM J9 crash, uncovered by Lucene's test
- We have an off-by-a-factor-of-12 bug in how we enforce the maximum determinized states for an automaton
- More tests mysteriously hang, triggering a build timeout at 2 hours
- Soon an exception while merging will be considered tragic, and
IndexWriter
will close itself to safeguard the index
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole ELK ecosystem including news, learning resources and cool use cases!