27 juin 2016

This Week in Elasticsearch and Apache Lucene - 2016-06-27

Par

•

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Top News

“How Airbnb manages to monitor customer issues at scale” by @AirbnbEng https://t.co/5BtU7Yc9Y6 #nodejs
— Joe McCann (@joemccann) June 15, 2016

Elasticsearch Core

Changes in 2.x:

The .scripts index now obeys the number_of_shards setting.
Deprecation logging for `_timestamp` and `_ttl`.
Failed synced flushes were reporting an incorrect number of failures.
The index-exists request shouldn't fail if the index is being recovered.
A valid translog file can be deleted incorrectly after a disk full exception and multiple attempts to recover.

Changes in master:

The low-level Java REST client has landed. It is functionally equivalent to the REST clients available in other languages.
The `index.store.preload` setting can preload the specified Lucene files (eg doc values, norms) into MMAP before a segment comes online. This completes the replacement of warmers.
The cluster health no longer turns red when creating an index, unless there is a problem assigning shards.
The default similarity is now BM25.
The `_timestamp` and `_ttl` fields will not be supported on indices created in 5.x.
The `fields` parameter has been removed in favour of `stored_fields`, `docvalue_fields` and (for `text` fields only)`fielddata_fields`.
Some percolator queries don't need in-memory validation to ensure that they match.
Painless now has capturing lambdas, supports adding static methods like `each` to whitelisted classes, has syntax for initialising arrays, lists and maps,
Nested inner hits no longer return _index, _type, and _id, and parent/child inner hits doesn't return _index.
`string` fields weren't upgraded to `text`/`keyword` if `include_in_all` was specified.
Getting a task with wait_for_completion will return the task result.
Nodes info returns the calculated size of the total indexing buffer.
Analysis factories are now MultiTermAware, which will help to remove the lowercase_expanded_terms from the query string query, and to support keyword analyzers on the `keyword` field.
JNA is now a required dependency.
Guice has been removed from the script service,

Ongoing changes:

Sequence number checkpoints are persisted to disk when a segment is flushed.
Reindex-from-remote now uses the Java REST client.
Ensure that primary handover while indexing does not cause a dead lock.
The index file which lists the snapshots in a repository should be written atomically.
The `discovery-azure` plugin doesn't work with the security manager.
It shouldn't be necessary to wait for status yellow before working with a newly created index.
Add helpers to make JSON easier to render in Mustache.
The SynonymQuery should be used for alternative terms, instead of the Bool query.
More time zone edge case bug fixes.
Changes to shard store fetching are required in order to allow for inline rerouting during node join.
Analysis components should implement AnalysisPlugin instead of calling registerTokenizer, allowing Guice to be removed from Hunspell.

Apache Lucene

5.5.2 RC2 release vote is underway
A tricky randomized explain test failure turns out to be a test bug in a recently added test case
Math.toRadians and Math.toDegrees are now banned, since their implementation changes slightly across java versions, impacting our geo tests
RandomAccessFilterStrategy comes back to life for faster filter intersection in some cases
Multi term queries that match no terms rewrite to MatchNoDocsQuery instead of an empty BooleanQuery , making it much simpler to add a helpful reason to MatchNoDocsQuery
The new Ukrainian lemmatizer uses MorfologikFilter with a custom dictionary for efficient dictionary-based Ukrainian analysis
Lucene's confusing and bushy IndexReader hierarchy strikes again
RAMDirectory now also enforces write-once files, and MockDirectoryWrapper now tries harder to corrupt unsync'd index files on close
GeoPoint gets some code cleanups
Eclipse now also fails on unused imports
Auto-prefix terms have been removed since dimensional points is better
CompressionTools has been removed
ForbiddenAPIs is upgraded to version 2.2
It's important to fsync files after copying them via Lucene's Directory!
A tricky test failure was holding up the 5.5.2 release process
Some minor code improvements to SearchGroup
Can we improve the default behavior of query parsers and multi-term queries?
A test bug in MoreLikeThisTest still remains tricky to fix
MoreLikeThis should not invoke toString on a Field object
ScandinavianFoldingFilterFactory and ScandinavianNormalizationFilterFactory are safe for multi-term queries
In the possibly not-rare case where many document share the same point value, we can better compress the docIDs
The ancient query norm and coord blocks progress and should be removed
Should we add a light weight Ukrainian stemmer?
Updating doc values and then using delete-by-query with a doc values query doesn't always work, but fixing it is likely not feasible

Watch This Space

Stay tuned to this blog, where we'll share more news on the whole Elastic ecosystem including news, learning resources and cool use cases!

Elasticsearch Platform

Suite Elastic

Elastic Cloud

Observability

Security

Search

Par secteur

Par solution

Témoignage client

Développeurs

Communication

Apprentissage

Aide

Actualités d'Elastic

This Week in Elasticsearch and Apache Lucene - 2016-06-27

Top News

Elasticsearch Core

Apache Lucene

Watch This Space

Nous suivre

À propos de nous

Emplois

Presse

Partenaires

Confiance et sécurité

Relations investisseurs

EXCELLENCE AWARDS