This Week in Elasticsearch and Apache Lucene - 2016-01-18
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Looking to upgrade your #Elasticsearch deployment from 1.x to 2.x? We’ve got the video for you & it’s OnDemand now! https://t.co/YX5v4H0yaX
— elastic (@elastic) January 15, 2016
Changes in 2.2:
- An unrecognised content type passed to an update request used to throw a NPE.
- The transport client now throws an exception when plugin.types is used, to help point users to addPlugin.
- Support for secondary accounts on Azure plugins broke setups with only a primary account.
- Filter/Filters aggregations were creating weights more often than needed, resulting in a performance regression.
- Pending tasks were reporting incorrect (by 1000x) time-in-queue because of a bad conversion from nano to milliseconds.
- A circular reference on an AlreadyClosedException could cause a stack overflow during rendering.
- ignore_unavailable wasn't being respected when applied to aliases with closed indices.
- Multiples types in the search URL were not properly filtering when an unknown type was present.
Changes in 2.x:
- A URL filter on type could leak the type name into a highlighting request.
- Percolate queries which use "now" in a date range were not working with the mpercolate API.
- Cross-fields queries on non-string fields were broken.
- The disk allocator didn't play nicely with file systems that don't report file system usage.
Changes in master:
- 5-minute and 15-minute load averages are now available on Linux again, and now on FreeBSD as well, but the format will probably change from an array to an object.
- Shards with heavy indexing loads will get a greater share of the indexing buffer.
- Master stopped using Java serialization a long time ago and, to guard against reintroduction, Serializable is now banned.
- All dynamic index settings have been moved from the shard level to the index level as part of the great settings cleanup.
- Get-alias and Cat-alias now return open and closed indices by default.
- Ingest node:
- Pipeline configuration is now stored in the cluster state, instead of in an index, in order to simplify update notifications.
- Ingest requests (which specify a pipeline) will now be forwarded to ingest nodes.
- Proper ingest methods added to the Java API.
- Ingest now uses the indexing threadpool instead of having a dedicated threadpool.
- Added the de-dot processor for converting dots in fieldnames to underscores.
- The simulate API now supports tracking of processor IDs across on_failure/compound processors, for easier tracking client site.
- Search refactoring:
- The reindex API has been merged into feature/reindex, but still needs to be integrated with the task management API.
- The task management API will soon be able to connect parent tasks with their children.
- The new scripting language is gaining throw and try/catch functionality, and the ability to detect infinite loops.
- Possibly adding a fixed-point mapping type.
- Lucene continues to migrate from Subversion to git and we still have improvements to the workaround script in the meantime
- A number of improvements to
TeeSinkTokenFilter, including removing the confusing
- We are simultaneously releasing Lucene 5.3.2 and 5.4.1 and discussing the next major (6.0.0) Lucene release, exposing interesting challenges
- A rare corner-case bug in reading 5.4.0 doc values, uncovered by Lucene's randomized testing, is quite nasty, prompting the upcoming 5.4.1 release
- Lucene's release smoke tester should not check future versions for backwards compatibility
- An invalid long-to-int cast causes broken
ArrayIndexOutOfBoundsEx<wbr>ceptionwhen loading large (2.1+ GB) field cache entries
- The confusion matrix in the classifier module can now give you its overall precision and recall
SimpleTextcodec was not writing dimensional values correctly
LuceneTestCasewill now use standardized language tags to represent the randomized
- Our default
NullPointe<wbr>rExceptionif the term is null
StemmerOverrideFiltermay be buggy
- Minimum should match and synonyms struggle to co-exist in query parsers in Lucene 5.x
- More tricky geo query test failures
- We should add a query to test for precisely equals dimensional values
StoredDocumentand friends before releasing Lucene 6.0.0
should fail the build
PrefillTokenStreamlets you specify exactly which tokens to iterate
decompounding messes up
JapaneseTokenizernow offers more than two possible tokenizations
- A new LSH (locality sensitive hashing)
TokenFilterand query is an alternative to the standard
MoreLikeThisQueryshould keep track of which terms came from which fields
RAMDirectorysometimes fails to throw
EOFExceptionif you try to seek beyond the end of the file
- Unordered span queries differ in how they measure the allowed span from ordered span queries
SpanPositionQueuecould be specialized to improve JIT performance
- Codec level encryption offers fine-grained control over which parts of the index need encryption
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!