15 July 2015

This Week in Elasticsearch and Apache Lucene - July 15, 2015

•

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Elasticsearch Core

Cluster: Remove double call to elect primaries (#12147, 2.0.0)
Index: Use IndexWriter.hasPendingChanges() to detect if a flush is needed (#12146, 1.7.0, 1.6.1)
Testing: allow settings to be passed to client for external test cluster (#12155, 2.0.0)
Plugins: Fix pluginmanager permissions for bin/ scripts (#12157, 2.0.0)
Plugins: remove elasticsearch- from name of official plugins (#12158, 2.0.0)
Plugins: strip elasticsearch- and es- from any plugin name (#12160, 2.0.0)
Backport to 1.7: Snapshot info should contain version of elasticsearch that created the snapshot (#12162, 1.7.0)
Internal: Make 2.0.0.beta1-SNAPSHOT the current version (#12151, 2.0.0)
Aggregations: Adds new script API to ValuesSourceMetricsAggregationBuilder (#12152, 2.0.0)
Search: Free all pending search contexts if index is closed or removed (#12180, 2.0.0, 1.7.0, 1.6.1)
Build: Allow rpm to be build as part of package phase (#12181, 2.0.0)
Aliases: Don't require fields in alias filters to exist in the mapping (#12150, 2.0.0)
Packaging: jsr166e was left out of shaded jar (#12194, 2.0.0)
Term Query: Be more strict during parsing (#12195, 2.0.0)
Core: Only clear open search ctx if the index is delete or closed via API (#12199, 2.0.0, 1.7.0, 1.6.1)
Exceptions: Throw LockObtainFailedException exception when we can't lock index directory (#12203, 2.0.0, 1.7.0, 1.6.1)
Versioning: updated the elasticsearch versioning format (#12210, 2.0.0)
Stats: Failure during the fetch phase of scan should invoke the failed fetch… (#12087, 2.0.0)
Packaging: Give a better exception when a jar contains same classfile twice. (#12093, 2.0.0)
Stats: Update fs stats (#12053, 2.0.0)
Scripting: Simplify CacheKey used for scripts (#12092, 2.0.0)
Packaging: Allow use of bouncycastle (#12102, 2.0.0)
Tests: Add unit tests for JarHell (#12106, 2.0.0)
Query DSL: Default fuzzy transpositions to true (#12090, 2.0.0)
Testing: Harden integration tests setup/teardown more (#12107, 2.0.0)
Internal: Change JarHell to operate on Path instead of URL (#12109, 2.0.0)
Test: print PID when starting ES in integ tests too. (#12111, 2.0.0)
Mapping: Remove AbstractFieldMapper (#12089, 2.0.0)
Testing: Fix tophints noise (#12112, 2.0.0)
Testing: Get integration tests working on windows. (#12117, 2.0.0)
phonetic plugin: move integration tests to REST tests (#12095, 2.0.0)
Internal: Consolidate ShardRouting construction (#12125, 2.0.0)
Internal: Cleanup ShardRoutingState uses and hide implementation details of ClusterInfo (#12126, 2.0.0)
Stats: Update process stats (#12043, 2.0.0)
Plugins: move integration tests to REST tests (#12128, 2.0.0)
Query DSL: Expose Lucene's new TopTermsBlendedFreqScoringRewrite (#12129, 2.0.0)
Stats: Update OS stats (#12049, 2.0.0)
Internal: Remove mapper references from Engines (#12130, 2.0.0)
Bulk: Add support for retrieving fields in bulk updates (#12114, 2.0.0)
Snapshot/Restore: Add validation of snapshot FileInfo during parsing (#12108, 2.0.0, 1.7.0, 1.6.1)
Search: Clean up handling of missing values when merging shard results on the coordinating node. (#12127, 2.0.0)
Build: Calculate artifact checksums in maven (#12085, 2.0.0)
Internal: Refactor MetaData to split off the concrete index name logic to a dedicated service (#12058, 2.0.0)
Plugins: Simplify Plugin Manager for official plugins (#11805, 2.0.0)
Shadow Replicas: Fail engine without marking it as corrupt when recovering on SharedFS (#11933, 2.0.0, 1.7.0)
Query API: Fix FuzzyQuery to properly handle Object, number, dates or String. (#12020, 2.0.0)
Query DSL: Special case the _index field in queries (#12027, 2.0.0)
Build: cloud-aws doesn't register s3 repos anymore (#12036, 2.0.0)
Term Vectors: Only load term statistics if required (#11737, 2.0.0, 1.7.0, 1.6.1)
Startup: Do not prompt for node name twice (#11668, 2.0.0, 1.7.0, 1.6.1)
Aggregations: Add cost minimizer to tune moving_avg parameters (#11881, 2.0.0)
Inner hits: Properly support named queries for both nested and parent child inner hits (#11880, 2.0.0)
Snapshot/Restore: Snapshot info should contain version of elasticsearch that created the snapshot (#11985, 2.0.0)
Allocation: Shard Started messages should be matched using an exact match (#11999, 2.0.0, 1.7.0)
Java API: Treat path object as a simple value instead of Iterable in XContentBuilder (#11903, 2.0.0)
Settings: change CORS allow origin default to allow no origins (#11890, 2.0.0)
Cluster: Add MetaData.uuid to ClusterState.toXContent (#11832, 2.0.0)
Query DSL: Minor fixes to the match query (#8352, 2.0.0)
Aggregations: add serial differencing pipeline aggregation (#11196, 2.0.0)
Scripting: Add script type and script name to error messages (#11449, 2.0.0)

Apache Lucene

The number of occurrences of "the the" and "a a" in our javadocs is scary
IndexUpgrader fails to upgrade the index if there are 0 segments
SortedSet and SortedNumeric doc values should optimize the case when the number of unique values is low
Remove the hard limit on maximum concurrency in IndexWriter
Speed up how Lucene computes union of many small postings lists
List directory files only once when creating an IndexWriter
Fix false test failures when the JVM and/or OS can't handle non-ascii filenames
The new command-line CheckJoinIndex tool confirms you indexed your documents correctly for block joins
MappingCharFilter produces broken offsets, but fixing the bug is tricky and contentious
Solr's ValueSourceRangeFilter should move to the Lucene queries module
A new SynonymGraphFilter fixes sneaky graph bugs in the current SynonymFilter, but requires a separate graph flattening filter if synonyms are applied during indexing
Maybe geo3d should be factored out to its own spatial3d module
SynonymFilter clears custom token attributes, but it's not clear we should fix it
Query cache was failing to notice a query was used if the first segment in the index had no hits
Filter is eradicated from the join module, and soon also spatial module
Switch to a more efficient iterator API to pull all finite strings from an automaton, and add a toposort implementation to Lucene's automaton APIs
TermAutomatonQuery should not throw NullPointerException when one segment is missing some terms
Add K nearest neighbor and simple naive bayes document classifiers to Lucene's classifier module
How can a given query opt out of caching?
A spooky new IBM J9 crash, uncovered by Lucene's test
We have an off-by-a-factor-of-12 bug in how we enforce the maximum determinized states for an automaton
More tests mysteriously hang, triggering a build timeout at 2 hours
Soon an exception while merging will be considered tragic, and IndexWriter will close itself to safeguard the index

Watch This Space

Stay tuned to this blog, where we'll share more news on the whole ELK ecosystem including news, learning resources and cool use cases!

Elastic Search AI Platform

ELK Stack

Elastic Cloud

Observability

Security

Search

By industry

By solution

Customer spotlight

Developers

Connect

Learn

Help

See what's happening at Elastic

This Week in Elasticsearch and Apache Lucene - July 15, 2015

Elasticsearch Core

Apache Lucene

Watch This Space

Follow us

About us

Join us

Press

Partners

Trust & Security

Investor relations

EXCELLENCE AWARDS