This Week in Elasticsearch and Apache Lucene - August 12 2015
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Top News
Same topic, new expert tips! .@ZacharyTong
presents 'Staying in Control w/Moving Avgs - Pt 2.' http://t.co/iQka1Hdmya pic.twitter.com/34V7SLcdFp
— elastic (@elastic) August 12, 2015
Elasticsearch Core
- Bats testing: Remove useless
systemctl
check (#12724, 2.0.0-beta1) - Stats: Add script compilation stats (#12733, 2.0.0-beta1)
- Tests: Move
qa
's convert-plugin-name macrodef to dev-tools. (#12765, 2.0.0-beta1) - PluginManager: Do not try other URLs if specific URL was configured (#12766, 2.0.0-beta1)
- Settings: Remove lenient store type parsing (#12735, 2.0.0-beta1)
- Packaging: Makes sure all POMs contain a description (#12771, 2.0.0-beta1)
- Core: Remove
Settings.getAsClass
(#12744, 2.0.0-beta1) - Plugins: Apply additional plugin settings only if settings are not explicit (#12796, 2.0.0-beta1)
- Build: Add multi-node IT infrastructure (#12797, 2.0.0-beta1)
- Query DSL: Remove (not working) support for alternate formats in simple query string (#12798, 2.0.0)
- Packaging: Fix
rpm -e
removing /etc/elasticsearch (#12785, 2.0.0-beta1) - Query DSL: Add support for
disable_coord
param in terms query (#12756, 2.0.0-beta1) - Build: allow specifying additional arguments in startup-elasticsearch macrodef (#12802, 2.0.0-beta1)
- Build: use spaces liberally in integration tests and fix space handling (#12710, 2.0.0-beta1)
- Mapping: Fix field type compatiblity check to work when only one previous type exists (#12779, 2.0.0-beta1)
- Logging: Adds a setting to control source output in indexing slowlog (#12806, 2.0.0-beta1)
- Internal: Flatten Allocation modules and add back ability to plugin ShardsAllocators (#12818, 2.0.0-beta1)
- Tests: improve integration tests output when ES cannot be started. (#12660, 2.0.0-beta1)
- PluginManager: Fix elastic.co download URLs, add snapshot ones (#12641, 2.0.0-beta1)
- Recocvery: Rethrow exception during recovery finalization even if source is not broken (#12667, 2.0.0-beta1, 1.7.2, 1.6.3)
- Core: Use explicit flag if index should be created on engine creation (#12671, 2.0.0-beta1)
- Suggestions: Prevent
DirectCandidateGenerator
to reuse an unclosed analyzer (#12670, 2.0.0-beta1, 1.7.2, 1.6.3) - Scripting: Add
path.scripts
directory (#12668, 2.0.0-beta1) - Testing: Get plugin smoketester running in jenkins. (#12681, 2.0.0-beta1)
- Tests: Add unittest for
DiskThresholdDecider#getShardSize / #sizeOfRelocatingShards
(#12687, 2.0.0-beta1) - Search: Speed up the
function_score
query when scores are not needed. (#12693, 2.0.0-beta1) - Query DSL: Do not track named queries that are null (#12691, 2.0.0-beta1, 1.7.2)
- Allocation: Avoid extra reroutes of delayed shards in
RoutingService
(#12678, 2.0.0-beta1) - Core: Improve jvmcheck error failure (#12696, 2.0.0-beta1)
- Scripting: Allow scripts to expose whether they use the
_score
. (#12695, 2.0.0-beta1) - Tests: Support jenkins randomization in integration tests (#12703, 2.0.0-beta1)
- SearchL Only compute scores when necessary with
FiltersFunctionScoreQuery
(#12707, 2.0.0-beta1) - Allocation: Make
RoutingNodes
read-only by default (#12690, 2.0.0-beta1) - Packaging: rpm and deb create scripts directory (#12704, 2.0.0-beta1)
- Testing: Run package tests in vagrant (#12646, 2.0.0-beta1)
- Packaging: Let OSX build rpms for linux (#12706, 2.0.0-beta1)
- Snapshot/Restore: Add support for bulk delete operation in snapshot repository (#12587, 2.0.0-beta1)
- Packaging: Fix upgrade RPM script (#12630, 1.7.2)
- Release: Update build release script to reflect latest changes (#12553, 2.0.0-beta1)
- Core: Improve toString on
EsThreadPoolExecutor
(#12535, 2.0.0-beta1) - Allocation: Avoid extra reroutes of delayed shards in
RoutingService
(#12532, 1.7.2) - Transport: allow to de-serialize arbitrary objects given their name (#12571, 2.0.0-beta1)
- Mapping: Move the
_size
mapper to a plugin. (#12582, 2.0.0-beta1) - Aggregations: Fix setting timezone on default DateTime formatter (#12581, 2.0.0-beta1)
- Mapping: Disallow type names to start with dots for new indices except for
.percolator
(#12561, 2.0.0-beta1) - AWS plugin: Adding retry when checking s3 snapshot repository (#12498, 2.0.0-beta1)
- Recovery: Check for incompatible mappings while upgrading old indices (#12406, 2.0.0-beta1)
- PluginManager: Add Support for basic auth (#12445, 2.0.0-beta1)
- Cluster state: Changes in unassigned info and version might not be transferred as part of cluster state diffs (#12387, 2.0.0-beta1)
- Query DSL:
multi_match
query applies boosts too many times. (#12294, 2.0.0-beta1, 1.7.2, 1.6.3) - Packaging: Do not kill process on service shutdown (#12298, 2.0.0-beta1)
- Exceptions: Include stacktrace in rendered exceptions (#12260, 2.0.0-beta1)
- Packaging: Improve JVM Arch Detection (#12274, 2.0.0-beta1)
- Aggregations: Preparing ValuesSourceAggregatorFactory/Parser for refactoring (#12233, 2.0.0-beta1)
Apache Lucene
- Lucene 5.3.0 branch is cut
- A user struggles with a JVM bug causing crashes on Solaris
- It's too early to require Java 1.8 in Lucene 5.x
- Should
ant precommit
also runant clean
? - Upgrade
ANTLR
(used by the expressions module) from version 3.5 to 4.5 - The new
FunctionRangeQuery
provides a range filter on any ValueSource
, butValueSourceScorer
does not useFunctionValues.exists
- Simplify Lucene's over-specialized
TopFieldColle
classes, and don't callctor score
more than once per hit NumericUtils.getMin
andgetMax
should not throw NullPointerException
when the field has no terms - Work around a Java 1.9 ea date parsing bug
TooComplexToDeterminizeExcepti
claims to be serializable but isn'ton GeoPointField
now uses doc values to filter hits on boundary cells, giving a nice speedup and smaller index, and greatly reducing the worst-case large heap usage- Randomized testing uncovered a failure caused by an unsorted flushed segment when the test expected a sorted merged segment
JoinUtil.createJoinQuery
failed to rewrite queries when creating its Weight
s- Allow defined-width gaps in
SpanNearQuery
- Use much less heap when sorting by a
ValueSource
- Payloads should be able to change the score of
SpanOrQuery
, andPayloadTermQuery
andPaylo
are now deprecated in favor of the more generaladNearQuery PayloadScoreQuery
, which may soon be moved to the sandbox module CorruptIndexException
was missing getters EarlyTerminatingSortingCollect
should unwrapor FilterLeafReader
to see if the underlying reader is sortedKNearestNeighborClassifier
should also use the class ranking - Another example of the powers of randomized testing, this time finding a corner-case bug in the stupid-slow-yet-hopefully-bug-
free numeric range filter that the test used to check for correctness GeoPointField
now uses the full 64 bits, up from 62, to encode lat/lon- Utility apis to compute geo hashes, and to find the neighboring geohash cells
- Add a new
RangeTree
data structure, a 1D version of the BKD spatial tree, for fast and small numeric andbyte[]
range filters GeoPointDistanceQuery
was visiting too many termsContextSuggestField
now allows adding contexts dynamically via subclass- A randomized test caught inaccurate RAM accounting for the new table-encoded
SortedSetDocValu
es TermAutomatonQuery
, which generalizes on positional queries likePhraseQuery
allowing you to run an arbitrary automaton, should implement two-phased andneedsScores
support- Integrate
Geo3D
andBKD
trees to provide accurate and fast earth-surface "point in shape" queries - Javadoc errors uncover silly APIs
- The new
SynonymGraphFilter
, which correctly handle multi-token synonyms at least at query time, is stalled until we work out a backwards compatibility policy for analysis chains and graph token filters - Change
SpanPayloadCheckQuery
from Collection<byte[]>
toList
<BytesRef> - Similarity classes should use
docCount
, if available, notmaxDoc
- More test failures with JDK 1.9 ea-b72
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole ELK ecosystem including news, learning resources and cool use cases!