This Week in Elasticsearch and Apache Lucene - 2016-06-27
Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.
“How Airbnb manages to monitor customer issues at scale” by @AirbnbEng https://t.co/5BtU7Yc9Y6 #nodejs
— Joe McCann (@joemccann) June 15, 2016
Changes in 2.x:
- The .scripts index now obeys the number_of_shards setting.
- Deprecation logging for `_timestamp` and `_ttl`.
- Failed synced flushes were reporting an incorrect number of failures.
- The index-exists request shouldn't fail if the index is being recovered.
- A valid translog file can be deleted incorrectly after a disk full exception and multiple attempts to recover.
Changes in master:
- The low-level Java REST client has landed. It is functionally equivalent to the REST clients available in other languages.
- The `index.store.preload` setting can preload the specified Lucene files (eg doc values, norms) into MMAP before a segment comes online. This completes the replacement of warmers.
- The cluster health no longer turns red when creating an index, unless there is a problem assigning shards.
- The default similarity is now BM25.
- The `_timestamp` and `_ttl` fields will not be supported on indices created in 5.x.
- The `fields` parameter has been removed in favour of `stored_fields`, `docvalue_fields` and (for `text` fields only)`fielddata_fields`.
- Some percolator queries don't need in-memory validation to ensure that they match.
- Painless now has capturing lambdas, supports adding static methods like `each` to whitelisted classes, has syntax for initialising arrays, lists and maps,
- Nested inner hits no longer return _index, _type, and _id, and parent/child inner hits doesn't return _index.
- `string` fields weren't upgraded to `text`/`keyword` if `include_in_all` was specified.
- Getting a task with wait_for_completion will return the task result.
- Nodes info returns the calculated size of the total indexing buffer.
- Analysis factories are now MultiTermAware, which will help to remove the lowercase_expanded_terms from the query string query, and to support keyword analyzers on the `keyword` field.
- JNA is now a required dependency.
- Guice has been removed from the script service,
- Sequence number checkpoints are persisted to disk when a segment is flushed.
- Reindex-from-remote now uses the Java REST client.
- Ensure that primary handover while indexing does not cause a dead lock.
- The index file which lists the snapshots in a repository should be written atomically.
- The `discovery-azure` plugin doesn't work with the security manager.
- It shouldn't be necessary to wait for status yellow before working with a newly created index.
- Add helpers to make JSON easier to render in Mustache.
- The SynonymQuery should be used for alternative terms, instead of the Bool query.
- More time zone edge case bug fixes.
- Changes to shard store fetching are required in order to allow for inline rerouting during node join.
- Analysis components should implement AnalysisPlugin instead of calling registerTokenizer, allowing Guice to be removed from Hunspell.
- 5.5.2 RC2 release vote is underway
- A tricky randomized
explaintest failure turns out to be a test bug in a recently added test case
Math.toRadiansand Math.toDegrees are now banned, since their implementation changes slightly across java versions, impacting our geo tests
RandomAccessFilterStrategycomes back to life for faster filter intersection in some cases
- Multi term queries that match no terms rewrite to
MatchNoDocsQueryinstead of an empty
BooleanQuery, making it much simpler to add a helpful reason to
- The new Ukrainian lemmatizer uses
MorfologikFilterwith a custom dictionary for efficient dictionary-based Ukrainian analysis
- Lucene's confusing and bushy
IndexReaderhierarchy strikes again
RAMDirectorynow also enforces write-once files, and
MockDirectoryWrappernow tries harder to corrupt unsync'd index files on close
GeoPointgets some code cleanups
- Eclipse now also fails on unused imports
- Auto-prefix terms have been removed since dimensional points is better
CompressionToolshas been removed
ForbiddenAPIsis upgraded to version 2.2
- It's important to fsync files after copying them via Lucene's
- A tricky test failure was holding up the 5.5.2 release process
- Some minor code improvements to
- Can we improve the default behavior of query parsers and multi-term queries?
- A test bug in
MoreLikeThisTeststill remains tricky to fix
MoreLikeThisshould not invoke
ScandinavianNormalizationFilterFactoryare safe for multi-term queries
- In the possibly not-rare case where many document share the same point value, we can better compress the
- The ancient query norm and coord blocks progress and should be removed
- Should we add a light weight Ukrainian stemmer?
- Updating doc values and then using delete-by-query with a doc values query doesn't always work, but fixing it is likely not feasible
Watch This Space
Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!