This Week in Elasticsearch and Apache Lucene - August 04 2015
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Top News
Staying in Control with Moving Avgs: outlier detection via new #elasticsearch pipeline aggs https://t.co/SnmY07QgSs pic.twitter.com/c4FZ28kPNX
— elastic (@elastic) August 3, 2015
Elasticsearch Core
- Build: Remove profile to create attached RPM (#12530, 2.0.0)
- Build: Add a
run.sh
to run from current source code with debugger (#12540, 2.0.0) - Internal: Add
RealtimeRequest
marker interface to group realtime operations together (#12537, 2.0.0) - Cat API: Add
_cat/nodeattrs
API (#12534, 2.0.0) - Aggs: Upgrade HDRHistogram to version
2.1.6
. (#12554, 2.0.0) - Mapping: Updating default
position_offset_gap
to 10. (#12538, 2.0.0) - Plugins: don't represent site plugins with
null
anymore (#12577, 2.0.0) - Internal: Fix
ShardUtils#getElasticsearchDirectoryReader()
(#12594, 2.0.0) - Startup: Fix Bootstrap to not call
System.exit
(#12586, 2.0.0) - Aggs: Full path validation for pipeline aggregations (#12595, 2.0.0)
- Packaging: Fix shaded jar packaging (#12589, 2.0.0)
- Build: rename site example plugin to site-example (#12604, 2.0.0)
- SecurityManager: improve sanity of securitymanager file permissions (#12609, 2.0.0)
- Build: Give unit tests and integ tests separate load balancing (#12615, 2.0.0)
- Testing: Cut over some remaining integration tests to IT (#12616, 2.0.0)
- Testing: cleanup more abstract test class ->
TestCase
and integ -> IT (#12624, 2.0.0) - Testing:
NamingConventionTests
should test subclasses of ESIntegTestCase end with IT (#12625, 2.0.0) - Build: Fix coverage analysis. Two versions of
jacoco
were being used and creating jar hell (#12626, 2.0.0) - Testing: Improve site-example integ test to test served contents (#12627, 2.0.0)
- Core: Move `Streams.copyTo(String|Bytes)FromClasspath()`` into StreamsUtils (#12598, 2.0.0)
- Internal: Fix concurrency issue in PrioritizedEsThreadPoolExecutor. (#12599, 2.0.0, 1.7.2, 1.6.3)
- Allocation: Add more debugging information to the Awareness Decider (#12490, 2.0.0)
- Dates: Default date formats to use underscores via PUT (#12509, 2.0.0)
- Aggs: Protected against
size
and offset larger than total number of document in a shard in top hits agg (#12518, 2.0.0) - Allocation: Fix messaging about delayed allocation (#12515, 2.0.0, 1.7.2)
- Index API: Add date math support in index names (#12209, 2.0.0)
- Startup: Remove getopt parsing in shell script, use java CLITool (#12165, 2.0.0)
- Build: Consolidate .gitignore entires for eclipse (#12521, 2.0.0)
- Allocation: Reroute shards when a node goes under disk watermarks (#12452, 2.0.0)
- Snapshot/Restore: Create a directory during repository verification (#12323, 2.0.0)
- Search: Make fetch sub phases pluggable (#12400, 2.0.0)
- Plugin script: Fix
ES_HOME
with spaces (#12508, 1.7.2) - Cleanup: Remove
Environment.homeFile()
(#12351, 2.0.0)
Apache Lucene
- Lucene 5.3.0 release process will start soon
- Here's a fun video showing how
GeoPointDistanceQuery
breaks its shape into cells ContextSuggestField
now allows adding contexts dynamically via subclass- Remove some unused variables from the Polish Stempel stemmer
PhraseQuery
'sBuilder
now allows chainingFastVectorHighlighter
now handlesPhraseQuery
matches across more than one value of a multi-valued fieldBlendedTermQuery
blends term statistics across all terms- When
IndexWriter
resolves a deletedQuery
to document IDs, it now uses theQuery
API not theFilter
API - Rework
build.xml
files to avoid running out of permgen space - Use 1d version of spatial BKD trees for faster, smaller numeric range filtering, including support for arbitrary
byte[]
to handle values larger than 64 bits - More iterations on
SynonymGraphFilter
, to correctly handle multi-token synonyms, at least at query time - Simplify Lucene's over-specialized
TopFieldColle
classes, and don't callctor score
more than once per hit - Payloads should be able to change the score of
SpanOrQuery
- How can we integrate BKD tree and
geo3d
to provide fast earth-surface shape intersection queries? - Add K nearest neighbor and simple naive bayes document classifiers to Lucene's classifier module
KNearestNeighborClassifier
should also use the class ranking GeoPointDistanceQuery
sometimes uses way too much heap FunctionValues
should be able to set their values into an externalMutableValue
- Query parsers can throw
IllegalArgumentException
if you try to parse an invalid regexp EarlyTerminatingSortingCollect
should unwrapor FilterLeafReader
to see if the underlying reader is sorted- Pull query boosting out of the
Query
into a newBoostQuery
class - Allow defined-width gaps in
SpanNearQuery
- We should move all Lucene queries into the
queries
module - Don't use symlinks for a Lucene index directory
- Make
SpanMultiTermQueryWrapper
smarter about how it picks which terms to keep when there are too many - BKD tree should implement distance queries very well
- Upgrade
ANTLR
(used by the expressions module) from version 3.5 to 4.5
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole ELK ecosystem including news, learning resources and cool use cases!