This Week in Elasticsearch and Apache Lucene - 2016-01-18
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Top News
Looking to upgrade your #Elasticsearch deployment from 1.x to 2.x? We’ve got the video for you & it’s OnDemand now! https://t.co/YX5v4H0yaX
— elastic (@elastic) January 15, 2016
Elasticsearch Core
Changes in 2.2:
- An unrecognised content type passed to an update request used to throw a NPE.
- The transport client now throws an exception when plugin.types is used, to help point users to addPlugin.
- Support for secondary accounts on Azure plugins broke setups with only a primary account.
- Filter/Filters aggregations were creating weights more often than needed, resulting in a performance regression.
- Pending tasks were reporting incorrect (by 1000x) time-in-queue because of a bad conversion from nano to milliseconds.
- A circular reference on an AlreadyClosedException could cause a stack overflow during rendering.
- ignore_unavailable wasn't being respected when applied to aliases with closed indices.
- Multiples types in the search URL were not properly filtering when an unknown type was present.
Changes in 2.x:
- A URL filter on type could leak the type name into a highlighting request.
- Percolate queries which use "now" in a date range were not working with the mpercolate API.
- Cross-fields queries on non-string fields were broken.
- The disk allocator didn't play nicely with file systems that don't report file system usage.
Changes in master:
- 5-minute and 15-minute load averages are now available on Linux again, and now on FreeBSD as well, but the format will probably change from an array to an object.
- Shards with heavy indexing loads will get a greater share of the indexing buffer.
- Master stopped using Java serialization a long time ago and, to guard against reintroduction, Serializable is now banned.
- All dynamic index settings have been moved from the shard level to the index level as part of the great settings cleanup.
- Get-alias and Cat-alias now return open and closed indices by default.
Ongoing:
- Ingest node:
- Pipeline configuration is now stored in the cluster state, instead of in an index, in order to simplify update notifications.
- Ingest requests (which specify a pipeline) will now be forwarded to ingest nodes.
- Proper ingest methods added to the Java API.
- Ingest now uses the indexing threadpool instead of having a dedicated threadpool.
- Added the de-dot processor for converting dots in fieldnames to underscores.
- The simulate API now supports tracking of processor IDs across on_failure/compound processors, for easier tracking client site.
- Search refactoring:
- Validation of geoshapes now happens in ShapeBuilders.
- All aggregations, highlighters, and rescorers are now refactored!
- Still waiting on suggesters, sort, rescore, and inner hits, which depends on everything else.
- The reindex API has been merged into feature/reindex, but still needs to be integrated with the task management API.
- The task management API will soon be able to connect parent tasks with their children.
- The new scripting language is gaining throw and try/catch functionality, and the ability to detect infinite loops.
- Possibly adding a fixed-point mapping type.
Apache Lucene
- Lucene continues to migrate from Subversion to git and we still have improvements to the workaround script in the meantime
- A number of improvements to
TeeSinkTokenFilter
, including removing the confusingSinkFilter
- We are simultaneously releasing Lucene 5.3.2 and 5.4.1 and discussing the next major (6.0.0) Lucene release, exposing interesting challenges
- A rare corner-case bug in reading 5.4.0 doc values, uncovered by Lucene's randomized testing, is quite nasty, prompting the upcoming 5.4.1 release
- Lucene's release smoke tester should not check future versions for backwards compatibility
- An invalid long-to-int cast causes broken
ArrayIndexOutOfBoundsEx<wbr>ception
when loading large (2.1+ GB) field cache entries - The confusion matrix in the classifier module can now give you its overall precision and recall
- The
SimpleText
codec was not writing dimensional values correctly LuceneTestCase
will now use standardized language tags to represent the randomizedLocale
- Our default
BytesTermAttribute
implementation hits NullPointe<wbr>rException
if the term is null StemmerOverrideFilter
may be buggy- Minimum should match and synonyms struggle to co-exist in query parsers in Lucene 5.x
- More tricky geo query test failures
- We should add a query to test for precisely equals dimensional values
- Remove
StoredDocument
and friends before releasing Lucene 6.0.0 - Missing
@Override
annotationsshould fail the build PrefillTokenStream
lets you specify exactly which tokens to iterateJapaneseAnalyzer's
decompounding messes up PhraseQuery
matching \JapaneseTokenizer
now offers more than two possible tokenizations- A new LSH (locality sensitive hashing)
TokenFilter
and query is an alternative to the standardMoreLikeThisQuery
MoreLikeThisQuery
should keep track of which terms came from which fieldsRAMDirectory
sometimes fails to throwEOFException
if you try to seek beyond the end of the file- Unordered span queries differ in how they measure the allowed span from ordered span queries
SpanPositionQueue
could be specialized to improve JIT performance- Codec level encryption offers fine-grained control over which parts of the index need encryption
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!