18 December 2017

This Week in Elasticsearch and Apache Lucene - 2017-12-18

By Clinton GormleyAdrien Grand

Welcome to This Week in Elasticsearch and Apache Lucene! With this weekly series, we're bringing you an update on all things Elasticsearch and Apache Lucene at Elastic, including the latest on commits, releases and other learning resources.

Optimised version map for append-only indexing

#27752 automatically optimises away the need to track versions of in-memory buffered documents while indexing if all documents in the ram buffer are guaranteed to have no duplicates and are documents using auto-generated IDs. This reduces the GC overhead drastically in high-throughput scenarios (up to 50%) and offers a 5-10% indexing throughput improvement depending on the workload. This change will come in 6.2.

Elasticsearch 6.2.0 supporting JDK 9

Elasticsearch 6.2.0 will be the first release of Elasticsearch to officially support JDK 9. Elasticsearch 6.2.0 will run out-of-the-box on both JDK 8 and JDK 9. We recommend that users stay on JDK 8 as JDK 9 is not an LTS release of the JDK, but Elasticsearch will move forward with the JDK ecosystem. When JDK 9 is end-of-life in March 2018, releases of Elasticsearch will stop supporting JDK 9; we intend to support JDK 10 but there is no guarantee of that at this time. Support for JDK 8 will continue until end-of-life in September 2018 when JDK 11 will be the next LTS release of the JDK.

New ranking evaluation API

A new ranking evaluation endpoint (_rank_eval) has been added to master and is planned to be backported to 6.2. The ranking evaluation API can be used to evaluate the quality of ranked search results over a set of typical search queries. Users can supply a set of typical queries together with a list or manually rated documents, and the API will perform the queries and calculate common information retrieval metrics like mean reciprocal rank, precision or discounted cumulative gain on it.

The API is currently marked as experimental and will probably change a bit in the foreseeable future. More details about the current state can be found in the documentation.

Ranking via the API is a very manual process at the moment, so we only expect to see traction around this feature once we have a UI to make interaction much more point-and-click. Brainstorming in progress with the Kibana team.

Changes in 5.6:

  • update ingest-attachment to use Tika 1.17 and newer deps #27824
  • Do not use system properties when building the HttpAsyncClient #27829

Changes in 6.0:

  • Use AmazonS3.doesObjectExist() method in S3BlobContainer #27723

Changes in 6.1:

  • Add version support for inner hits in field collapsing (#27822) #27833
  • No longer unidle shard during recovery #27757

Changes in 6.2:

  • BREAKING: Remove operationThreaded from Java API #27836
  • Allow _doc as a type. #27816
  • Use lastSyncedGlobalCheckpoint in deletion policy #27826
  • Add NioGroup for use in different transports #27737
  • Optimize version map for append-only indexing #27752
  • Fixes ByteSizeValue to serialise correctly #27702
  • also extract match_all queries when indexing percolator queries #27585
  • Allow custom service names when installing on windows #25255
  • Remove potential nio selector leak #27825
  • Clean Up Painless Cast Object #27794
  • Use CountedBitSet in LocalCheckpointTracker #27793
  • Keep commits and translog up to the global checkpoint #27606
  • Painless: Only allow Painless type names to be the same as the equivalent Java class. #27264
  • Fix performance of RoutingNodes#assertShardStats #27747
  • Use typeName() to check field type in GeoShapeQueryBuilder #27730
  • X-Pack:
    • [Watcher] Use index.auto_expand_replicas: 0-1 #3284
    • Watcher: Set index and type dynamically in index action #3264
    • Fix license messaging for Logstash functionality #3268
    • Check for existing x-pack directory on users commands #3271

Changes in 7.0:

  • Add ranking evaluation API #27478
  • Fail restore when the shard allocations max retries count is reached #27493
  • Remove pre 6.0.0 support from InternalEngine #27720
  • String distance algorithms cleanup #27640

Apache Lucene

Lucene 7.2.0

There is an ongoing vote to release Lucene 7.2.0, which is going well so far.

New committer / PMC member

Ahmet Arslan is now a Lucene/Solr committer and Ishan Chattopadhyaya is now a PMC member.

Other