This Week in Elasticsearch and Apache Lucene - 2018-07-20

Elasticsearch

Field aliases

We have merged a new field aliases feature which allows users to specify a new alias field type. This new field type points has a property, “path”, which points to a different field and will resolve to that field whenever it is used in search and aggregations. This helps users migrate from old mappings to new ones when only field names are changing, allowing them to set field aliases on the old indices which use the new field names, so the old indices are still searchable to applications using the new schema.

Kerberos

The kerberos work is ready to be merged. The initial implementation will rely on native role mapping from the kerberos principal name. Lookups for users from LDAP/AD will come as a follow up based on lookup realms.

Zen2

This week in the internal Zen2 PoC we merged the node retirement API that allows for safely reducing a 2-node cluster to a 1-node one. This is an important API for Elastic Cloud, where many users use a 1-node cluster which can grow to 2 and back to 1 during upgrades. This is the last missing feature in our PoC.

We are making good progress on the zen2 branch of the public ES repo, which has seen PRs for the addition of terms to cluster state , the heart of the coordination layer , the beginning of testing framework, and a gossip discovery protocol.

JDK Bug

Several of our users recently encountered a JVM Bug that manifests on machines supporting AVX-512 (e.g., Skylake X) when running Elasticsearch on JDK 10. As our 6.3.0 and 6.3.1 docker images contain JDK 10, this issue affects potentially many users. A workaround is described on the Elasticsearch Github repo.

Low Memory Resiliency

We made an important step forward this week in our efforts to run on smaller heaps. To date, when evaluating whether or not to service a user’s request, Elasticsearch examined memory explicitly reserved by other queries as compared to a total limit (70% of heap). However, these explicit reservations did not accurately capture memory usage. With a rebuilt circuit breaker, Elasticsearch now measures real heap memory usage before servicing the request. The result? Elasticsearch is now more resilient to OutOfMemoryErrors, while at the same time being able to use more of the heap (95% now).

Fetch tasks are no longer rejected

The search thread pool needs to run two main kinds of tasks: query tasks and fetch tasks. We have pushed a change that force-adds fetch tasks to the queue even in the event that the queue is already full. The reasoning behind this is that fetch tasks may only be follow-up to query tasks, so the number of additional fetch tasks that may enter the thread pool is expected to be reasonable.

Rollup capabilities now available on index-level

We have added the ability to request the rollup capabilities for a rollup index in addition to the existing API which lists the rollup capabilities for all jobs targeting a pattern of source indexes (i.e. indexes where the original non-rolled up data lives). This will help the rollup UI determine the fields and aggregations available to Kibana index patterns which include rollup indexes.

Changes in 6.0:

  • Fix put mappings java API documentation #31955

Changes in 6.3:

  • A replica can be promoted and started in one cluster state update #32042
  • Adjust translog after versionType is removed in 7.0 #32020
  • Disable C2 from using AVX-512 on JDK 10 #32138

Changes in 6.4:

  • Rest HL client: Add put watch action #32026
  • Add support for field aliases. #32172
  • add support for write index resolution when creating/updating documents #31520
  • ECS Task IAM profile credentials ignored in repository-s3 plugin #31864
  • Fix rollup on date fields that don’t support epoch_millis #31890
  • Revert "Introduce a Hashing Processor (#31087)" #32179
  • Call setReferences() on custom referring tokenfilters in _analyze #32157
  • Add more contexts to painless execute api #30511
  • Fix range queries on _type field for singe type indices #31756
  • BREAKING: Configurable password hashing algorithm/cost(#31234) #32092
  • Handle missing values in painless (#30975) #31903
  • Painless: Fix Bug with Duplicate PainlessClasses #32110
  • Ensure to release translog snapshot in primary-replica resync #32045
  • Check that client methods match API defined in the REST spec #31825
  • Update monitoring template version to 6040099 #32088
  • Add exclusion option to keep_types token filter #32012
  • Add Index UUID to /_stats Response #31871
  • Bypass highlight query terms extraction on empty fields #32090
  • Core: Backport java time date formatters #31997
  • [Rollup] Add new capabilities endpoint based on concrete rollup indices #30401
  • SQL: allow LEFT and RIGHT as function names #32066
  • Watcher: Store username on watch execution #31873
  • Scripting: Remove Dead Code from Painless Module #32064

Changes in 7.0:

  • Handle missing values in painless #32207
  • Revert "Introduce a Hashing Processor (#31087)" #32178
  • Adjust SSLDriver behavior for JDK11 changes #32145
  • Remove versionType from translog #31945
  • Replace TokenizerFactory with Supplier #32063
  • Relax TermVectors API to work with textual fields other than TextFieldType #31915

Lucene

Low-level highlighting components

One reason (out of many!) why building a highlighter is challenging is that many things need to be configured: what is a good snippet, how much context should be kept around matches, should snippets that occur next to each other be merged, etc. We are exploring building highlighter components that can be used together rather than a full-fledged highlighter. As a start, we are using the matches API to break text up into passages that contain hits.

Other