16 December 2019

This Week in Elasticsearch and Apache Lucene - 2019-12-13

Jay Modi

•

•

•

•

•

Elasticsearch Highlights

Asynchronous Search

Today the _search endpoint is a blocking API. Users send search requests and wait for the response until a timeout occurs. For complex queries that spans multiple shards this blocking design can be limiting because there is no way for users to follow the progress so they need to wait minutes sometimes hours, hoping that the search eventually finishes.

For these reasons we've decided to add a new API called _async_search that allows to follow the progress of a search and to get partial results while the query is still running. As the name implies, this new API is asynchronous, you can submit a search but instead of waiting for the final response, the endpoint will return a token id that can be used to follow the progress in a non-blocking fashion:

# Submit an _async_search and waits up to 100ms for a final response
GET my_index_pattern*/_async_search?wait_for_completion=100ms
{
  "aggs": {
    "date_histogram": {
      "field": "@timestamp",
      "fixed_interval": "1h"
    }
  }

If after 100ms (wait_for_completion) the final response is not available, a partial_response is included in the body:

{
  "id": "9N3J1m4BgyzUDzqgC15b",
  "version": 1,
  "is_running": true,
  "partial_response": {
    "total_shards": 100,
    "successful_shards": 5,
    "failed_shards": 0,
    "total_hits": {
      "value": 1653433,
      "relation": "eq"
    },
    "aggs": {
      ...
    }
  }
}

The id can then be used to follow the progress using the _async_search get API:

GET _async_search/9N3J1m4BgyzUDzqgC15b/?wait_for_completion=100ms

The final response is stored in an index so we're now working on adding the security layer that will prevent users to access this index directly.

Improving SQL support

As a step toward improving SQL support in Kibana, syntax highlighting has been turned on in Console. We've received some great feedback on this so far, and we should be merging this enhancement soon.

Faster Wildcard Query

We opened a PR which introduces a new field type in basic called wildcard. This new type is derived from a simple keyword field but it indexes additional ngrams in order to limit the number of expansions required at query time when with wildcards. Leading wildcards (e.g.: *foo) are very expensive but users often need to match patterns that appear anywhere in the field such as **Exc **so this field would speed up these queries at the cost of slower indexing. We'll now evaluate the real cost of this field with benchmarks to have a better idea of the trade-off but also to mitigate the impact on indexing and disk size.

JDK 13 now required for compilation

We merged the changes necessary to upgrade the build to use Gradle 6.0. Gradle 6 enables us to compile and run the build with JDK 13 and we followed up with a PR to require JDK 13 in order to build.

'CONTAINS' support for LanLonShape and XYShape

We merged a PR in Lucene that adds "CONTAINS" support for LatLonShape and XYShape. A PR is under review in ES to a new Lucene snapshot so it is available for implementation. This brings us one step closer for feature parity between BKD shapes and the old prefix trees.

ODBC compression

We're adding the ability to compress payloads in the Elasticsearch SQL ODBC driver.

The compression uses the zlib library to efficiently reduce the size of the data that is sent through HTTP. We also added the ability to activate the compression in the DSN graphical editor.

Sunsetting old versions in the docs

We've done a few things recently to identify in the docx versions of Elasticsearch that are no longer supported.

The "out of maintenance" notice has been updated and is now automatically applied as needed when we release new minor versions.
The version drop down now shows just the "live" versions by default. Clicking other versions lets you access the full list.
The edit buttons have been removed from OOM/EOL versions. (Soon to be removed from all legacy & translated docs.)

Apache Lucene Highlights

Lucene 8.4

Final blockers are being resolved and we expect the build of the first release candidate to occur soon.

CONTAINS support on shapes

Queries on geo-shapes now support the CONTAINS relation, which allows to find all indexed shapes that contain the query shape. This relation required adding information to the indexed triangles about which edges are shared with the original polygon, in order to skip shapes that have an edge that crosses the query shape.

Distance queries on shapes

We're reviving an old pull request that adds support for distance queries on shapes, also known as buffer queries. It allows to find all shapes that intersect with a disk of configurable center and radius.

Other

We noticed that we can get better performance out of BKD trees by reducing the number of points on leaves, especially for costly queries. This has implications for memory usage, so we are discussing trade-offs on the issue.

- We reported a bug with the JapaneseNumberFilter, which might mistakenly tag numbers.

- A committer is making ObjectInputStream and ObjectOutputStream forbidden APIs.

Changes

Changes in Elasticsearch

Changes in 8.0:

Support "accept_enterprise" param in get license #50067

Changes in 7.6:

SingleBucket aggs need to reduce their bucket's pipelines first #50103
cat.indices.json bytes enum not exhaustive #49369
Ensure meta and document field maps are never null in GetResult #50112
Remove reserved roles for code search #50068
CSV processor #49509
Add the REST API specifications for SLM Status / Start / Stop endpoints. #49759
Allow skipping ranges of versions #50014
Fix query analyzer logic for mixed conjunctions of terms and ranges #49803
Add setting to restrict license types #49418
Enable dependent settings values to be validated #49942
Add Validation for maxQueryTerms to be greater than 0 for MoreLikeThisQuery #49966

Changes in 7.5:

Make elasticsearch-node tools custom metadata-aware #48390
Make testclusters registry extension name unique #49956

Changes in 7.4:

SQL: COUNT DISTINCT returns 0 instead of NULL for no matching docs #50037

Changes in 6.8:

Log attachment generation failures #50080
Fix delete- and update-by-query on indices without sequence numbers #50077

The Search AI Company

Generative AI

Search

Security

Observability

By solution

Industries

This Week in Elasticsearch and Apache Lucene - 2019-12-13

Elasticsearch Highlights

Asynchronous Search

Improving SQL support

Faster Wildcard Query

JDK 13 now required for compilation

'CONTAINS' support for LanLonShape and XYShape

ODBC compression

Sunsetting old versions in the docs

Apache Lucene Highlights

Lucene 8.4

CONTAINS support on shapes

Distance queries on shapes

Other

Changes

Changes in Elasticsearch

Follow us

About us

Join us

Press

Partners

Trust & Security

Investor relations

EXCELLENCE AWARDS