20 July 2018

This Week in Elasticsearch and Apache Lucene - 2018-07-20

Tom Callahan

•

•

•

•

•

Colin Goodheart-Smithe

Elasticsearch

Field aliases

We have merged a new field aliases feature which allows users to specify a new alias field type. This new field type points has a property, “path”, which points to a different field and will resolve to that field whenever it is used in search and aggregations. This helps users migrate from old mappings to new ones when only field names are changing, allowing them to set field aliases on the old indices which use the new field names, so the old indices are still searchable to applications using the new schema.

Kerberos

The kerberos work is ready to be merged. The initial implementation will rely on native role mapping from the kerberos principal name. Lookups for users from LDAP/AD will come as a follow up based on lookup realms.

Zen2

This week in the internal Zen2 PoC we merged the node retirement API that allows for safely reducing a 2-node cluster to a 1-node one. This is an important API for Elastic Cloud, where many users use a 1-node cluster which can grow to 2 and back to 1 during upgrades. This is the last missing feature in our PoC.

We are making good progress on the zen2 branch of the public ES repo, which has seen PRs for the addition of terms to cluster state , the heart of the coordination layer , the beginning of testing framework, and a gossip discovery protocol.

JDK Bug

Several of our users recently encountered a JVM Bug that manifests on machines supporting AVX-512 (e.g., Skylake X) when running Elasticsearch on JDK 10. As our 6.3.0 and 6.3.1 docker images contain JDK 10, this issue affects potentially many users. A workaround is described on the Elasticsearch Github repo.

Low Memory Resiliency

We made an important step forward this week in our efforts to run on smaller heaps. To date, when evaluating whether or not to service a user’s request, Elasticsearch examined memory explicitly reserved by other queries as compared to a total limit (70% of heap). However, these explicit reservations did not accurately capture memory usage. With a rebuilt circuit breaker, Elasticsearch now measures real heap memory usage before servicing the request. The result? Elasticsearch is now more resilient to OutOfMemoryErrors, while at the same time being able to use more of the heap (95% now).

Fetch tasks are no longer rejected

The search thread pool needs to run two main kinds of tasks: query tasks and fetch tasks. We have pushed a change that force-adds fetch tasks to the queue even in the event that the queue is already full. The reasoning behind this is that fetch tasks may only be follow-up to query tasks, so the number of additional fetch tasks that may enter the thread pool is expected to be reasonable.

Rollup capabilities now available on index-level

We have added the ability to request the rollup capabilities for a rollup index in addition to the existing API which lists the rollup capabilities for all jobs targeting a pattern of source indexes (i.e. indexes where the original non-rolled up data lives). This will help the rollup UI determine the fields and aggregations available to Kibana index patterns which include rollup indexes.

Changes in 6.0:

Fix put mappings java API documentation #31955

Changes in 6.3:

A replica can be promoted and started in one cluster state update #32042
Adjust translog after versionType is removed in 7.0 #32020
Disable C2 from using AVX-512 on JDK 10 #32138

Changes in 6.4:

Rest HL client: Add put watch action #32026
Add support for field aliases. #32172
add support for write index resolution when creating/updating documents #31520
ECS Task IAM profile credentials ignored in repository-s3 plugin #31864
Fix rollup on date fields that don’t support epoch_millis #31890
Revert "Introduce a Hashing Processor (#31087)" #32179
Call setReferences() on custom referring tokenfilters in _analyze #32157
Add more contexts to painless execute api #30511
Fix range queries on _type field for singe type indices #31756
BREAKING: Configurable password hashing algorithm/cost(#31234) #32092
Handle missing values in painless (#30975) #31903
Painless: Fix Bug with Duplicate PainlessClasses #32110
Ensure to release translog snapshot in primary-replica resync #32045
Check that client methods match API defined in the REST spec #31825
Update monitoring template version to 6040099 #32088
Add exclusion option to keep_types token filter #32012
Add Index UUID to /_stats Response #31871
Bypass highlight query terms extraction on empty fields #32090
Core: Backport java time date formatters #31997
[Rollup] Add new capabilities endpoint based on concrete rollup indices #30401
SQL: allow LEFT and RIGHT as function names #32066
Watcher: Store username on watch execution #31873
Scripting: Remove Dead Code from Painless Module #32064

Changes in 7.0:

Handle missing values in painless #32207
Revert "Introduce a Hashing Processor (#31087)" #32178
Adjust SSLDriver behavior for JDK11 changes #32145
Remove versionType from translog #31945
Replace TokenizerFactory with Supplier #32063
Relax TermVectors API to work with textual fields other than TextFieldType #31915

Lucene

Low-level highlighting components

One reason (out of many!) why building a highlighter is challenging is that many things need to be configured: what is a good snippet, how much context should be kept around matches, should snippets that occur next to each other be merged, etc. We are exploring building highlighter components that can be used together rather than a full-fledged highlighter. As a start, we are using the matches API to break text up into passages that contain hits.

Other

When sorting by a field, it would be more efficient to run a second search to compute scores of the top hits compared to doing it at collection time.
Since computing sort values is cheap, we don't need a boolean to enable it.
Soft deletes support had an optimization that never kicked in because it relied on outdated assumptions.
The matches API is getting support for extracting sub matches. This is especially useful for phrase queries with slops in order to know where terms of the query occurred.
Should we allow term vectors to only store a subset of the terms that exist in the inverted index?
A missing double cast made TieredMergePolicy's getter for the max segment size return a wrong value.

Elasticsearch Platform

ELK Stack

Elastic Cloud

Observability

Security

Search

By industry

By solution

Customer spotlight

Developers

Connect

Learn

Help

See what's happening at Elastic

This Week in Elasticsearch and Apache Lucene - 2018-07-20

Follow us

About us

Join us

Press

Partners

Trust & Security

Investor relations

EXCELLENCE AWARDS