This Week in Elasticsearch and Apache Lucene - August 25 2015
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Top News
Why did we remove the Delete by Query API in #Elasticsearch 2.0?
https://t.co/jnlVVi6rRN
— elastic (@elastic) August 18, 2015
Elasticsearch Core
- Network: Remove support for address resolving in InetSocketTransportAddress (#13020, 2.0.0-beta1)
- Network: Don't print lots of noise on IPv4 only hosts. (#13026, 2.0.0-beta1)
- Exceptions: Fix formatting of startup/configuration errors (#13029, 2.0.0-beta1)
- Network: Move multicast discovery to a plugin (#13027, 2.1.0, 2.0.0-beta1)
- Search: Never cache
match_all
queries. (#13032, 2.0.0-beta1) - Aggregations: Aggregation: Fix
AggregationPath.subPath()
to not throw ArrayStoreException (#13035, 2.0.0-beta1) - Packaging: Improve java version comparison and explicitly enforce a version format (#13010, 2.1.0)
- Java API: Add missing support for escape to
QueryStringQueryBuilder
(#13016, 2.1.0, 2.0.0) - Index APIs: Limit type name length (#13036, 2.0.0-beta1, 2.0.0)
- Settings: Do not swallow exceptions thrown while parsing settings (#13039, 2.0.0-beta1)
- Exceptions: Use
StartupError
to format all exceptions hitting the console (#13041, 2.0.0-beta1) - Settings: Do not permit multiple settings files (#13043, 2.1.0, 2.0.0-beta1)
- Exceptions: Improve startup exceptions (especially file permissions etc) (#13050, 2.0.0-beta1)
- Plugins: Lucene SPI support for plugins. (#13051, 2.1.0, 2.0.0-beta1)
- Internal: Cleanup bootstrap package. (#13053, 2.1.0, 2.0.0-beta1)
- Internal: Remove
SpawnModules
(#13034, 2.0.0) - Internal: Add plugin modules before (almost all) others (#13061, 2.1.0, 2.0.0-beta1)
- Stats: The
queue_size
value should be shown as an integer. (#13063, 2.1.0) - Release: Prevent running whole mvn lifecycle twice (#13066, 2.0.0-beta1)
- Query DSL: Remove unsupported
rewrite
option from match query builder (#13069, 2.1.0) - Core: Don't check if directory is present to prevent races (#13049, 2.1.0, 2.0.0)
- REST: Add favicon (#13054, 2.1.0, 2.0.0)
- Exceptions: Make
mlockall
configuration easier. (#13057, 2.1.0, 2.0.0) - Query DSL: Remove unsupported
rewrite
from multi_match query builder (#13073, 2.1.0) - : Fix some issues developing Elasticsearch with Eclipse (#13070, 2.1.0, 2.0.0)
- Plugin Cloud AWS: [cloud-aws] Update AWS SDK to 1.10.12 (#13090, 2.1.0, 2.0.0)
- Core: Detect duplicate settings keys on startup (#13086, 2.0.0)
- Query DSL: Query DSL: remove attempted (not working) support for array in not query parser (#12890, 2.1.0, 2.0.0)
- Network: Remove usage of
InetAddress#getLocalHost
(#12959, 2.0.0-beta1) - Internal: Simplify custom repository type setup (#12948, 2.0.0-beta1)
- Plugins: Ensure
additionalSettings()
do not conflict (#12967, 2.0.0-beta1) - Dependencies: Drop commons-lang dependency (#12972, 2.0.0-beta1)
- Java: Workaround JDK bug 8034057 (#12970, 2.0.0-beta1)
- Plugins: Simplify Plugin API for constructing modules (#12952, 2.0.0-beta1)
- Exceptions: Improve console logging on startup exception (#12976, 2.0.0-beta1)
- Exceptions: Add serialization support for
InterruptedException
(#12981, 2.0.0-beta1) - Dates: Add millisecond parser for dynamic date fields mapped from
yyyy/MM/dd
(#12977, 2.0.0-beta1) - Network: Log network configuration at debug level (#12979, 2.0.0-beta1)
- Plugins: Add build short hash to the download manager headers to identify staging builds (#12936, 2.0.0-beta1)
- Network: Only resolve host if explicitly allowed. (#12986, 2.0.0-beta1)
- Aggregations: Throw error if cardinality aggregator has sub aggregations (#12989, 2.0.0-beta1)
- Testing: fail with better error message if elasticsearch was started already (#12965, 2.0.0-beta1)
- Aggregations: Make
ValueParser.DateMath
aware of timezone setting (#12886, 2.1.0, 2.0.0) - Internal: Remove
CachedDfSource
(#12973, 2.1.0) - Network: Deduplicate addresses from resolver. (#12995, 2.0.0-beta1)
- Build: RPM module should be build on machine with
/usr/bin/rpmbuild
(#12990, 2.0.0-beta1) - Scroll: Optimize sorted scroll when sorting by
_doc
. (#12983, 2.1.0) - Search: Deprecate the
scan
search type. (#12994, 2.1.0) - Discovery: Default to unicast discovery, with default host list of
127.0.0.1, [::1]
(#12999, 2.0.0-beta1) - Allocation: Add
expectedShardSize
to ShardRouting and use it in path.data allocation (#12947, 2.1.0, 2.0.0-beta1) - Plugin Cloud AWS: Remove
cloud.account/key
settings (#12978, 2.1.0) - Index Templates: Accumulate validation errors when validating index templates (#12901, 2.1.0)
- Build: Refactor script for RC creation (#12985, 2.1.0, 2.0.0-beta1)
- REST: Suppress rest exceptions by default and log them instead (#12991, 2.1.0, 2.0.0-beta1)
- Packaging: Clean up the tar tests (#12903, 2.1.0)
- Java API: Remove execution from
TermsQueryBuilder
as it has no effect (#12884, 2.1.0, 2.0.0) - Build: [maven] Remove
elasticsearch-
prefix from artifacts (#12879, 2.0.0-beta1) - Plugins: Use 'name' from plugin descriptor file to determine plugin name (#12775, 2.1.0, 2.0.0-beta1)
- Plugin Cloud AWS: Don't show
access_key/filter_key
in S3 repository settings (#12845, 2.0.0-beta1) - Plugin Cloud AWS: Update AWS SDK to 1.10.10 (#12859, 2.1.0, 2.0.0)
Apache Lucene
RAMInputStream
'sclone
methodis not thread safe, and we fixed that, then discovered that neither is MMapDirectory
's, which is hard to fix, so we changed our mind and decided thatIndexInput.clone
should not be expected to be thread safe- Don't cache trivial queries
- Improve testing of
ToChildBlockJoinQue
ry - A scary randomized test failure turned out to be a simple test bug, not using a deterministic merge policy
- Remove the esoteric
get/
methods fromsetIndexingChain IndexWriterConfig - Fix a few test bugs in
BKD
andRangeTree
tests LovinsStemmer
andFinnishStemm
were completely broken, ander ant regenerate
created un-compilable java code, both discovered/fixed thanks to a big effort to reduce compiler warningsRangeTree
andBKDTree
would sometimes fail to close their open file handles, uncovered thanks to randomized testing on Windows - Make
Math.random
forbidden BooleanQuery
with aSHOULD
clause on an empty field returns 0.0 score withDefaultSimilarity
MatchAllDocsQuery
gets a dedicated bulk scorerFingerprintFilter
emits a single token which is a sorted, de-duplicated set of all of its input tokens, to normalize text for use cases such as clustering- Can we improve
Directory.openInput
to avoid cryptic thread-safety issues like this one? - Many fast iterations to integrate
BKDTree
andGeo3D
to provide accurate and fast earth-surface "point in shape" queries - Recent graph improvements to
WordDelimiterFilter
may have caused this regression - Ivy has an improved option for file-locking, to sidestep the "leftover lock" problem you can hit when building Lucene/Solr in two different directories
- Optimize
IndexSearcher.count
for simple queries like MatchAllDocsQuery
and TermQuery
- Randomized tests have uncovered several recent failures
- Some compound queries fail to create sub-weights via
IndexSearcher
, and miss out on caching - When implementing
equals
method should you use instanceof, or compare classes directly? - Requiring version 1.9 of
ant
, versus 1.8 that we require today is controversial - This scary test failure, showing that
IndexWriter
is trying to delete a file that does not exist, only happens on Windows, and we still don't know why
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole ELK ecosystem including news, learning resources and cool use cases!