This Week in Elasticsearch and Apache Lucene - 2018-07-13
Elasticsearch
Kerberos
The kerberos realm has been merged into the feature branch. We are currently working on a QA test that uses an actual KDC. Once the lookup realms feature has been completed, we will need to integrate the kerberos realm with lookup realms but this will not block the completion of the kerberos realm.
Structured Audit Logging
We hafe raised an in progress PR for structured audit logging. This PR makes use of the log4j StringMapMessage for audit events. They allow the greatest flexibility in terms of final log line format. Specifically, log4j defines layouts which format the logline printf style utilizing the values from the map message event. This should allow us to experiment with multiple formats by only changing the log4j config.
Cross Cluster Replication
We have merged the rewrite for the ShardFollowingTask. The rewrite was meant to simplify the logic and allow us to get better insights into the various internals of the tasks for better monitoring and control (for example, how many operations are already fetched from the leader buffered up for writes on the follower).
Zen2
The Zen2 project is now managed on our public repo and is now being tracked on a meta issue. Due to the complexity of the project, the POC phase was quite elaborate and the code is relatively mature for a POC. Work will focus on porting the POC code to a production level, simplifying things and adding tests. This is a major milestone. Congratulations to David and Yannick for reaching it!
Auto-Interval Date Histogram
We have merged a PR for a new aggregation, called auto_date_histogram. It works like a date_histogram, but instead of specifying the time interval, you specify the max number of buckets you’d like, and the aggregation chooses the interval that will be closest to the maximum without going over.
SQL Drivers
Work on ODBC and JDBC drivers continues, with a goal of getting to parity. We have been working on adding parameterised execution to the ODBC driver, allowing a statement to be prepared once and then executed multiple times with different data parameters. For JDBC, adding single parameter text manipulating functions to SQL which allows users to transform text using function such as LENGTH, UCASE, LCASE and LTRIM
Changes in 6.3:
- SQL: HAVING clause should accept only aggregates #31872
- Fix building AD URL from domain name #31849
- Watcher: Add ssl.trust email account setting #31684
- Watcher: Increase HttpClient parallel sent requests #31859
- Inconsistency between description and example #31858
- SQL: Fix incorrect HAVING equality #31820
Changes in 6.4:
- Slack message empty text #31596
- Date: Add DateFormatters class that uses java.time #31856
- Tests: Remove use of joda time in some tests #31922
- Add Get Snapshots High Level REST API #31980
- Force execution of fetch tasks #31974
- Add Expected Reciprocal Rank metric #31891
- SQL: Add support for single parameter text manipulating functions #31874
- SQL: Support for escape sequences #31884
- Add Snapshots Status API to High Level Rest Client #31515
- ingest: date_index_name processor template resolution #31841
- Test: fix null failure in watcher test #31968
- Added lenient flag for synonym token filter #31484
- Fix wrong NaN check in MovingFunctions#stdDev() #31888
- Add opaque_id to audit logging #31878
- add support for is_write_index in put-alias body parsing #31674
- Handle missing values in painless #30975
- BREAKING: High Leven REST Client: Add x-pack-info API #31870
- Do not return all indices if a specific alias is requested via get aliases api. #29538
- Ingest: Enable Templated Fieldnames in Rename #31690
- Ingest: Add ignore_missing option to RemoveProc #31693
- Add template config for Beat state to X-Pack Monitoring #31809
- SQL: Remove restriction for single column grouping #31818
- Check timeZone argument in AbstractSqlQueryRequest #31822
- Fix profiling of ordered terms aggs #31814
Changes in 6.5:
- [X-Pack] Beats centralized management: security role + licensing #30520
Changes in 7.0:
- Tests: Fix SearchFieldsIT.testDocValueFields #31995
- BREAKING: Remove the ability to index or query context suggestions without context #31007
Apache Lucene
Points-based geo shapes
Lucene 6 introduced a new API called points, which is implemented under the hood with a BKD-tree. Since then, points been the way to go in order to index and search numerics (1 dimension), geo points and numeric ranges (2 dimensions) as BKD trees provide faster searching and better compression than the previous inverted-index-based support.
The next logical step would be to implement shape support on top of the points API, but this isn't without raising questions: shapes are more commonly indexed using R-trees rather than BKD trees, which are designed for points. However adding support for a new type of data-structure to Lucene isn't a small task, and the proposal to add support for R-trees didn't get much traction.
We are back with a new proposal that uses the BKD tree as a R-tree by tessellating shapes into triangles that are indexed as 6-dimensional points. The minimum and maximum values for each coordinate that the BKD tree provides on inner levels help compute the minimum bounding rectangle that contains all triangles on the leaves, just like a R-tree would provide. We are also confident that the current split mechanism will perform well. Initial results are very promising.
Other
- We fixed a bug that could make the word-delimiter filter add unnecessary gaps.
- Iterating on making the unified highlighter leverage the new matches API.
- We submitted a first patch to reclaim deletes more aggressively through natural merges.
- The new setMinCompetitiveScore API that was introduced for top-k queries needs to be used carefully with MultiCollector.