This Week in Elasticsearch and Apache Lucene - 2015-12-07
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
Top News
Why multiple #Elasticsearch clusters can be easier to manage than one giant cluster: https://t.co/1yqZNMDAID pic.twitter.com/L3mKk7HHXU
— elastic (@elastic) December 1, 2015
Elasticsearch Core
Misc:
- The mapper attachments plugin has been released for 2.0.1 and 2.1.0.
- The Hombrew formula for Logstash has been updated to 2.1.0 and building from master is also supported.
- The Hombrew formula for Kibana has been updated to 4.3.0.
Changes in 2.1:
- It was possible in 1.x to add an alias that had the same name as an index, which caused failures when upgrading to 2.x. We now throw a more meaningful exception, and prevent users from creating aliases with the same name as an index template.
- Multi-fields accepted field names containing dots.
- The Lucene request cache sometimes threw NPEs when refreshing.
- Indexing into an index, then deleting the index and indexing again could lead to a dramatic slowdown of indexing speed.
- Kibana 4.3 uses the field stats API but didn't have permissions to do so in Shield.
- If we can't get a MAC address for the node, we use a dummy one, eg to generate UUIDs.
Changes in 2.x:
- A number of important mapping bugs have been fixed:
- Don't ignore mapping merge failures.
- Don't treat _default_ as a regular type.
- Treat mappings as an index-level feature.
- Check mapping compatibility up-front.
- Remove MergeMappingException.
- Simplify MetaDataMappingService.
- Cluster state batching is making progress. Update tasks have been split into distinct roles, which are being used for shard-started and shard-failed events.
- "Modules" are plugins that will be shipped by default with Elasticsearch core, such as lang-expression and lang-groovy.
- The plugin manager used to throw a confusing exception when a plugin was missing the plugin-descriptor.properties file.
- The mapper attachment plugin was adding dots in field names. which are illegal.
- The Java bulk loader no longer accepts "refresh" at an item level, but does accept it at the top level.
- RuntimePermission("
getClassLoader") has been banned.
Changes in master:
- The ?fields parameter no longer loads fields from _source, just from stored fields, which means it can now support wildcards.
- Allocation IDs are now being persisted to index metadata. Next step is to use them to choose the most recent shard copy during recovery.
- Type unsafe empty Collections fields are now forbidden.
Ongoing:
- More mapping fixes in the works:
- It is possible for fields to be specified more than once in the same mapping.
- Multi-fields should throw an exception if they attempt to copy_to another field, as should completion fields.
- The ingest node continues to make progress:
- It now supports operations on lists.
- Custom Grok patterns can be specified within a processor.
- A Grok pattern which doesn't match will throw an exception.
- All values will be deep-copied to avoid issues when modifying data structures.
- Shards will soon maintain local check points for sequence numbers.
- Google Cloud will be usable for snapshot restore repositories.
- S3 repository adding path style access for virtual hosting of buckets.
- Azure snapshots will support timeouts.
- Removing ancient deprecated and alternative recovery settings.
- An early PoC is available for the reindex API (or index-by-query).
Apache Lucene
- Lucene/Solr's subversion-to-git mirror will soon be turned off unless we can help the Apache infrastructure team find a workaround for excessive resource usage by the
git-svn
mirroring tool - An audit of GC and heap usage from the nightly Elasticsearch benchmarks uncovered excessive heap used by static tokenizers generated by JFlex
- The 5.4.0 release is still iterating
- Grouping collectors are more careful in their
needsScores
methodsinstead of always returning true
- Fix an integer overflow issue in doc values updates
- Do not call
LRUQueryCache.<wbr>onDocIdSetEviction
when nothing was evicted - Remove dead code from
CheckIndex
and the benchmark module'sTrecContentSource,
and the important sounding but actually a no-op method, StandardQueryParser.<wbr>getMultiFields
BooleanQuery
now optimizes certain cases such as+*:* #filter
to just#filter
JapaneseTokenizer
now offers more then two possible tokenizations- Upgrade morfologik to version 2.0.1, to improve how dictionary URIs are passed
- Improve the accuracy of the sandbox geo utility APIs to enable distance-based dimensional values queries
- Can we improve DisjunctionScorer to advance more lazily but without making specialized copies of it?
SpanQueryParser
is popular, with many votes, living on its own branches, and people using it and asking questions, but has not yet been committed- Another test failure seems to have hit the strange timeouts after 2 hours bug
JoinUtil
should support joins on numeric doc values fieldsNumericField
andNumericRangeQ<wbr>uery
will move to thebackward-codecs
module in 6.0, replaced by dimensional valuesDecimalDigitFilter
has problems with digits that use Unicode's non-BMP supplemental characters- Use try-with-resources in
BaseDire<wbr>ctoryTestCase
Equals
methods are tricky
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!