30 November 2018

This Week in Elasticsearch and Apache Lucene - 2018-11-30

By Jake LandisDaniel MitterdorferZachary TongJim FerencziJay ModiBill McConaghyYannick WelschPaul Sanwald

Elasticsearch

LatLonShapes are coming

We merged a PR which lays the groundwork for using Lucene's LatLonShape (aka BKD-backed GeoShapes) in Elasticsearch. LatLonShapes tessellate the shape into triangles which are then stored in a BKD tree, instead of using prefix-encoded quad-cells like today. This particular PR adds various shape builder primitives needed to support the change. A follow-up PR will cut geo_shape over to the new BKD-backed strategy

Cross Cluster Replication

We have continued work on remote clusters UI. We have finished the UI layer, and worked on the app layer to integrate the UI with endpoints. So far our work has been done based on the assumption that add/edit/delete are all persistent settings. Next, we will work on handling transient settings as well as settings that come from elasticsearch.yml.

Auto-follow UI: We have continued work on auto-follow UI, completing the work for the list view. Next up will be the UI and functionality for adding an auto-follow pattern.

We opened a meta issue for the "recovery from remote” work, which is used to bootstrap a follower index from a remote leader index. The bootstrapping implementation reuses internal snapshot/restore infrastructure. After prototyping different approaches, We have opened a PR to automatically register or unregister internal CcrRepository instances depending on the presence of remote cluster definitions.

We refactored the auto follow coordinator to track leader indices per remote cluster and replaced poll interval setting with a hardcoded poll interval. The hard coded interval will be removed in a follow up change by making use of the newly added cluster state API's wait_for_metatdata_version parameter.

Frozen and closed indices

We added HRLC support for frozen indices. The focus is now on benchmarks to better understand the scaling and performance characteristics of frozen indices, for example how long it takes to dynamically open and close Lucene indices.

We opened a WIP PR for the "Close Index API". This work also requires stronger ordering guarantees in our shard permit acquisition logic, which is a shard-local coordination mechanism to ensure that writes can be temporarily blocked while the shard internals undergo some transformation.

Audit Request ID

We have merged the audit request id work. The request id will follow a request and sub-requests through the cluster to enable tracing of a request using the audit log.

Search response format in 7.0

The format of the search response will change in ES 7. Instead of a single number, the hits.total section will include a value and a relation (e.g. "total": { "value": 100, "relation": "gte"}). The value represents the number of hits and the relation indicates if the number is accurate ("eq") or a lower bound ("gte") of the total hits that match the query.

In order to help users to migrate to 7 in a rolling fashion we also introduced a new query parameter called rest_total_hits_as_int in master and 6x. This option allows user to opt out from the new rest format in 7 in order to handle queries to 6x and 7 nodes seamlessly.

The pr to change the format is ready to be merged in master and we communicated with the other dev teams the plan to merge it next week.

SQL

We added the support in SQL for the Phone Home functionality, allowing us to understand how users are using SQL. We merged the support for INTERVAL in SQL and we finished the implementation of the NULL functions.

Changes

Changes in 6.5:

  • [CCR] Only auto follow indices when all primary shards have started #35814
  • Build: Fix jdbc jar pom to not include deps #36036
  • SQL: Fix issue with wrong data type for scripted Grouping keys #35969
  • Add missing entries to conffiles #35810
  • Scripting: Actually add joda time back to whitelist #35965
  • Watcher: Only trigger a watch if new or schedule/changed #35908
  • SQL: Fix translation of math functions to painless #35910
  • Add a More Like This query routing requirement check (#29678) #33974
  • Scripting: Add back joda to whitelist #35915
  • Build: Fix jdbc jar to include deps #35602
  • Handles exists query in composite aggs #35758
  • Revert "Use more precise does S3 bucket exist method" #35732
  • Fix analyzed prefix query in query_string #35756

Changes in 6.6:

  • Support content type application/x-ndjson in DeprecationRestHandler #36025
  • BREAKING: Throw a parsing exception when boost is set in span_or query (#28390) #34112
  • BREAKING: Disallow boosts on inner span queries #35967
  • Build: Fix reproduce info for methods with ( or ) #35712
  • Fix Watcher NotificationService’s secure settings #35610
  • Security: improve exact index matching performance #36017
  • SQL: Make INTERVAL millis optional #36043
  • Fix IndexAuditTrail rolling restart on rollover edge #35988
  • Cache the score of the parent document in the nested agg #36019
  • Add realm information for Authenticate API #35648
  • Ensure TokenFilters only produce single tokens when parsing synonyms #34331
  • ActiveShardCount should not fail when closing the index #35936
  • HLRC: Add ability to put user with a password hash #35844
  • [HLRC] Added support for CCR Delete Auto Follow Pattern API #35981
  • Add "request.id" to file audit logs #35536
  • BACKPORT: Add "request.id" to file audit logs (#35536) #36027
  • HLRC: Add delete user action #35294
  • LLREST: Add PreferHasAttributeNodeSelector #36005
  • Move creation of temporary directory to Java #36002
  • Adds deprecation logging to ScriptDocValues#getValues. #34279
  • Add high-level REST client API for _freeze and _unfreeze #35723
  • Tasks: Only require task permissions #35667
  • Fix custom AUTO issue with Fuzziness#toXContent #35807
  • Raise a 404 exception when document source is not found (#33384) #34083
  • SQL: DATABASE() and USER() system functions #35946
  • Undeprecate /_license endpoints #35974
  • [HLRC] XPack ML info action #35777
  • SQL: Lock down JDBC driver #35798
  • [Kerberos] Add support for Kerberos V5 Oid #35764
  • [Rollup] Add more diagnostic stats to job #35471
  • Build: Fix gradle build for Mac OS #35968
  • [Monitoring] Make Exporters Async #35765
  • Remove fromXContent from IndexUpgradeInfoResponse #35934
  • Copy checkpoint atomically when rolling generation #35407
  • Geo: better handling of malformed geo_points #35554
  • Build: Pick default test jvms for macOS #35789
  • SQL: Implement data type verification for conditionals #35916
  • [HLRC] Added support for CCR Put Auto Follow Pattern API #35780
  • Add Tests for findSamlRealm #35905
  • Backport: Move XContent generation to HasPrivilegesResponse (#35616) #35892
  • ingest: grok fix duplicate patterns JAVACLASS and JAVAFILE #35886
  • SQL: Implement GREATEST and LEAST functions #35879
  • SQL: SYS COLUMNS returns ODBC specific schema #35870
  • Respect indices options on _msearch #35887
  • SQL: Implement null safe equality operator <=> #35873
  • SQL: Add filtering to SYS TYPES #35852
  • [HLRC] Add support for get roles API #35787
  • Added wait_for_metadata_version parameter to cluster state api. #35535
  • SQL: XPack FeatureSet functionality #35725
  • Always return false from refreshNeeded on ReadOnlyEngine #35837
  • Wrap can_match reader with ElasticsearchDirectoryReader #35857
  • SQL: Implement NULLIF(expr1, expr2) function #35826
  • Deprecate negative scores in functon_score query #35865
  • SQL: Polish grammar for intervals #35853
  • Expose all permits acquisition in IndexShard and TransportReplicationAction #35540
  • Add read-only repository verification #35731

Changes in 7.0:

  • Remove X-Pack centric graph endpoints #36010
  • SQL: deprecate X-Pack SQL translate endpoint #36030
  • Fix kerberos setting registration #35986
  • Deprecate X-Pack centric Migration endpoints #35976
  • Deprecate X-Pack centric license endpoints #35959
  • BREAKING: Remove deprecated Graph endpoints #35956
  • Deprecate X-Pack centric SQL endpoints #35964
  • Deprecate types in search and multi search templates. #35669
  • BREAKING: Validate metdata on _msearch #35938
  • BREAKING: Always enforce cluster-wide shard limit #34892

Apache Lucene

Lucene 7.6

There are no blockers anymore but we are hitting some issues while trying to build a release that we are hoping to address soon.

We merged a change that makes indexing geo shapes 70% faster by not sorting data-only dimensions. See last week's update for details about the change.

Other