Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Top News
Why multiple #Elasticsearch clusters can be easier to manage than one giant cluster: https://t.co/1yqZNMDAID pic.twitter.com/L3mKk7HHXU
— elastic (@elastic) December 1, 2015
Elasticsearch Core
Misc:
- The mapper attachments plugin has been released for 2.0.1 and 2.1.0.
- The Hombrew formula for Logstash has been updated to 2.1.0 and building from master is also supported.
- The Hombrew formula for Kibana has been updated to 4.3.0.
Changes in 2.1:
- It was possible in 1.x to add an alias that had the same name as an index, which caused failures when upgrading to 2.x. We now throw a more meaningful exception, and prevent users from creating aliases with the same name as an index template.
- Multi-fields accepted field names containing dots.
- The Lucene request cache sometimes threw NPEs when refreshing.
- Indexing into an index, then deleting the index and indexing again could lead to a dramatic slowdown of indexing speed.
- Kibana 4.3 uses the field stats API but didn't have permissions to do so in Shield.
- If we can't get a MAC address for the node, we use a dummy one, eg to generate UUIDs.
Changes in 2.x:
- A number of important mapping bugs have been fixed:
- Don't ignore mapping merge failures.
- Don't treat _default_ as a regular type.
- Treat mappings as an index-level feature.
- Check mapping compatibility up-front.
- Remove MergeMappingException.
- Simplify MetaDataMappingService.
- Cluster state batching is making progress. Update tasks have been split into distinct roles, which are being used for shard-started and shard-failed events.
- "Modules" are plugins that will be shipped by default with Elasticsearch core, such as lang-expression and lang-groovy.
- The plugin manager used to throw a confusing exception when a plugin was missing the plugin-descriptor.properties file.
- The mapper attachment plugin was adding dots in field names. which are illegal.
- The Java bulk loader no longer accepts "refresh" at an item level, but does accept it at the top level.
- RuntimePermission("
getClassLoader") has been banned.
Changes in master:
- The ?fields parameter no longer loads fields from _source, just from stored fields, which means it can now support wildcards.
- Allocation IDs are now being persisted to index metadata. Next step is to use them to choose the most recent shard copy during recovery.
- Type unsafe empty Collections fields are now forbidden.
Ongoing:
- More mapping fixes in the works:
- It is possible for fields to be specified more than once in the same mapping.
- Multi-fields should throw an exception if they attempt to copy_to another field, as should completion fields.
- The ingest node continues to make progress:
- It now supports operations on lists.
- Custom Grok patterns can be specified within a processor.
- A Grok pattern which doesn't match will throw an exception.
- All values will be deep-copied to avoid issues when modifying data structures.
- Shards will soon maintain local check points for sequence numbers.
- Google Cloud will be usable for snapshot restore repositories.
- S3 repository adding path style access for virtual hosting of buckets.
- Azure snapshots will support timeouts.
- Removing ancient deprecated and alternative recovery settings.
- An early PoC is available for the reindex API (or index-by-query).
Apache Lucene
- Lucene/Solr's subversion-to-git mirror will soon be turned off unless we can help the Apache infrastructure team find a workaround for excessive resource usage by theÂ
git-svn
 mirroring tool - An audit of GC and heap usage from the nightly Elasticsearch benchmarks uncovered excessive heap used by static tokenizers generated by JFlexÂ
- The 5.4.0Â release is still iteratingÂ
- Grouping collectors are more careful in theirÂ
needsScores
 methodsÂinstead of always returning true
 - Fix an integer overflow issue in doc values updatesÂ
- Do not callÂ
LRUQueryCache.<wbr>onDocIdSetEviction
 when nothing was evicted - Remove dead code fromÂ
CheckIndex
 and the benchmark module'sÂTrecContentSource,
Âand the important sounding but actually a no-op method, StandardQueryParser.<wbr>getMultiFields
 BooleanQuery
 now optimizes certain cases such asÂ+*:* #filter
 to justÂ#filter
ÂJapaneseTokenizer
 now offers more then two possible tokenizationsÂ- Upgrade morfologik to version 2.0.1, to improve how dictionary URIs are passedÂ
- Improve the accuracy of the sandbox geo utility APIs to enable distance-based dimensional values queriesÂ
- Can we improve DisjunctionScorer to advance more lazily but without making specialized copies of it?Â
SpanQueryParser
 is popular, with many votes, living on its own branches, and people using it and asking questions, but has not yet been committedÂ- Another test failure seems to have hit the strange timeouts after 2 hours bugÂ
JoinUtil
 should support joins on numeric doc values fieldsÂNumericField
 andÂNumericRangeQ<wbr>uery
 will move to theÂbackward-codecs
 module in 6.0, replaced by dimensional valuesÂDecimalDigitFilter
 has problems with digits that use Unicode's non-BMP supplemental charactersÂ- Use try-with-resources inÂ
BaseDire<wbr>ctoryTestCase
 Equals
 methods are trickyÂ
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!