Rally 1.3.0 was released. The most notable change is Rally stores the track and team revision in the metrics store so it is easier to reproduce benchmark results over longer periods of time in case there are changes to tracks or Elasticsearch configurations. With this release support for Elasticsearch 1.x was also dropped.

Packaging

The next major release of macOS will require applications distributed outside the Mac App Store to be signed and notarized. As part of our effort towards signing and notarizing the components of Elasticsearch (the JDK, JNA, ML binaries), we need a signed and notarized JDK. The Oracle OpenJDK that we are using is not signed and notarized, and doesn't appear that it will be for the next major release of the JDK either. AdoptOpenJDK is a newer distribution of OpenJDK that we were in the process of adding support for starting with 7.4.0, and it turns out that AdoptOpenJDK is already signing and notarizing their distributions. Thus we discussed and agreed to switch our bundled jdk to AdoptOpenJDK, and we opened a PR this week.

Snapshots

We enhanced documentation to include support for OneZoneInfrequentAccess storage by the S3 repository plugin.

We investigated issues with restoring large snapshots on Cloud. These failures were caused by S3 closing connections mid-downloads which is treated as a fatal error by the S3 SDK and not retried. We opened a PR to add retries for this scenario on top of the S3 SDK.

We continued work on enhancing our integration tests for Cloud provider backed repositories and opened various PRs to create a mocked GCS endpoint and enhance other mock HTTP endpoints to allow for testing multiple failure scenarios and validating the retry-on-failure behaviour of the cloud provider's SDKs. This work was already used for adding tests on the handling of broken connections during S3 blob downloads.

We continued work on adjusting the snapshot metadata format to speed up snapshots and limit the negative effects of S3's eventual consistency model on the S3 snapshot repository implementation.

G1GC

We investigated real-memory circuit breaker issues and found that nearly all of the reported issues on the Discuss forum are from deployments using G1GC. We tracked down the issue to sub-optimal G1GC defaults in our default jvm.options and opened a PR to adjust them to more aggressively run collections to make sure that the real-memory circuit breaker threshold would not be exceeded in situations of moderate memory-pressure. Also, we conducted extensive investigations into the performance impact of the proposed change to the JVM defaults to validate that they would not degrade performance.

Better storage of _source

We did a quick prototype that consists of storing the schema and the data of documents in separate fields in order to help compression. It seems to already help on geonames but we wanted to conduct more tests, especially on adversarial cases like CSV-style content with random integers. This idea would be complementary with storing top-level JSON fields in different stored fields because, as we noticed last time we discussed source compression, the Elastic common schema stores everything under objects so storing top-level fields under a separate stored field alone wouldn't help much users of the ECS. It relates to a previous discussion on making ingesting faster by enabling users to send a bulk request that contains one schema and data for many documents, which would allow parsing field names and looking up field mappers only once for entire batches of documents.

UI: Giving hugs by fixing bugs

The ES UI team dedicated the first week of the release cycle to fixing some of our top bugs. We'll make this a standard practice for each release to keep the bug backlog from growing too large. This week we fixed the following:

We addressed various accessibility issues in Index Management, Index Lifecycle Policies, and Rollup Jobs. This is great for users who use screen readers and improves our compliance for our Federal customers.
We also fixed a bug in Index Lifecycle Policies where a user was not able to successfully enable the “Move to warm phase on rollover” setting in edit mode. Five users reported being affected by this problem.
We also fixed a bug in Watcher UI where “0” wasn’t permitted as a threshold value when creating a threshold watch.
We acted on feedback from Dan Roscigno that the “Lifecycle phase” filter in Index Management didn’t behave as expected. After a healthy amount of discussion, we decided that we should remove the filter from the UI until we can implement a long-term solution.
We updated Console autocomplete with support for distance_feature queries.
We added logic that checks whether Cross-Cluster Replication is supported by the user's license and removes it if it's not. This aligns with the behavior of other apps in Kibana and addresses some complaints from users on the forums.

Pivot in SQL

We introduced PIVOT-ing in SQL

A popular transformation function, this creates a statistics table around the pivoting column, being quite popular in BI tools (one of the most voted issues in Kibana is around pivoting: https://github.com/elastic/kibana/issues/5049):

SELECT * FROM (SELECT browser, request_bytes, country FROM logs) PIVOT (AVG(request_bytes) FOR country IN ('NL', 'US', 'RO')) browser | NL | US | RO ------------+-----------------+-----------------+-----------------+ Chrome |48396.28571428572|53216.28571428572|78353.39941364497| IE |47058.90909090909|34698.31544913364|85082.12264984912| Mozilla |49767.22342622222|46463.87613649497|43761.97631565941| Other |44103.90909090290|44323.10345673210|22134.11231234329|

Geo

We opened a PR for adding support for the new xy (cartesian) shape fields to SQL. We also started working on fixing geo_shape edge-case bugs that were uncovered during geo_shape/shape refactoring, fixing handling of west to east linestring, and working on fixing handling of very long linestrings

We completed the initial implementation for adding spatial projection support as an XPack feature extension to the open source geo_shape field type. The following mapping example demonstrates a use case for indexing geospatial geometries projected in UTM Zone 14N using the GeoJSON Coordinate Reference System (CRS) Format.

"properties" : { "location" : { 'type" : "geo_shape", "crs": { "type": "name", "properties": { "name": "EPSG:32614" // UTM Zone 14N } } } }

The incoming CRS is handled in the X-Pack Spatial extension plugin and indexes the incoming UTM Geometries using lucene's new XYShape field without having to reproject to WGS84 (a previous requirement for all geo types). Queries are also appropriately handled and built based on the defined CRS.

Since the Maps Application does not yet have the ability to visualize geospatial geometries in anything other than WGS84 Web Mercator a new (prototype) Geometry Reprojection Ingest Processor has been written that enables users to reproject incoming documents to any supported coordinate reference system. In this manner users can index their data in the native coordinate reference system and (if it's not already in WGS84 web mercator) use the ingest processor to reproject and index in a separate field for visualizing in the Maps app.

Snapshot Lifecycle Management (SLM)

SLM retention has been merged to master and 7.x (#46407). Retention for SLM is the next phase for Snapshot Lifecycle Management that allows a user to specify how many snapshots to keep, and for how long.

What about Cloud and SLM ?

Astute readers will notice that SLM is approaching the same capabilities as Cloud's snapshotting. 7.4, the initial release of SLM, allows users to schedule when snapshots occur and to which repository the snapshot will land. Cloud does not allow users direct access to their snapshots, so users that want more control can push their own snapshots to their own repository. This has always been possible, but required extra tooling (like Curator) to take snapshots on a schedule. With 7.4 users can now natively schedule snapshots. Cloud takes snapshot every 30 minutes, and are usually fairly quick thanks to the incremental nature of snapshots. However, it should be noted that only one snapshot process can be executed in the cluster at any time. There is a potential for Cloud initiated snapshots to collide with SLM (or manual) snapshots. If this happen the first one always wins, and the second one will be in an error state. This isn't ideal, nor is it a new issue (but potentially exacerbated by SLM). The root cause of this potential collision is two different snapshot orchestrations (Cloud vs. SLM) that are not aware of each other.

Retention is the biggest gap in functionality between Cloud snapshotting and SLM. Now that retention is getting stabilized this gap is going away and there is an opportunity for the two snapshot orchestrations to converge to one (SLM). There have/are on-going discussion with Cloud to help ensure that SLM meets Cloud's requirements. Also the UI area is working on a Kibana UI for the retention piece for SLM.

Apache Lucene

Faster parallel search

After lots of discussions with other committers in Lucene, a pr was opened to share information between threads to help skip non-competitive hits in the concurrent search. Today, to avoid contention, each slice (thread) keeps track of their local competitive hits in a priority queue and we merge them when all threads are terminated. This method is very effective but it doesn't take advantage of the fact that a high score on one thread could help skip non-competitive hits on another thread. For this reason we're investigating a way to share the minimum score among threads to maintain a minimum score globally. There are multiple ways to achieve this and we're now leaning to a simple solution that keeps track of the minimum score by picking the maximum bottom score of all priority queues. This global minimum score can then be used locally by threads that didn't reach this level to skip non-competitive hits efficiently.

Faster nearest neighbor search

The old logic used to look at the maximum distance that is required for a hit to be competitive, compute a bounding box from it and search for hits that are within this bounding box. Ignacio optimized this logic to insteadcompute the minimum distance between the point of interest and a cell of the BKD tree, and only look at points within this cell if this minimum distance is competitive. This helps skip a few more cells than the previous approach.

Other

We made WITHIN and DISJOINT queries faster on shapes.
It is proposed that we add a CharFilter with the same capabilities as ICUTransformFilter so that dictionary-based tokenization becomes easier, by not having to worry for instance that some terms might contain a mix of simplified and traditional chinese characters.
We want to move XYRectangle2D to floats instead of doubles since coordinates are internally indexed as floats.
We are reviving a pull request that adds CONTAINS support to shapes.

Changes in Elasticsearch

Changes in 7.5:

Handle lower retaining seqno retention lease error #46420
[ILM] Add date setting to calculate index age #46561
Geo: fix indexing of west to east linestrings crossing the antimeridian #46601
SQL: Implement DATE_TRUNC function #46473
Deprecate _field_names disabling #42854
Update http-core and http-client dependencies #46549
Disable local build cache in CI #46505
Ensure rest api specs are always copied before using test classpath #46514
Refactor AllocatedPersistentTask#init(), move rollup logic out of ctor (Redux) #46444
Add retention to Snapshot Lifecycle Management #46407
Remove trailing comma from nodes lists #46484
Update mustache dependency to 0.9.6 #46243
Execute SnapshotsService Error Callback on Generic Thread #46277
Resolve the incorrect scroll_current when delete or close index #45226
[ML-DataFrame] improve error message for timeout case in stop #46131
Deprecate the "index.max_adjacency_matrix_filters" setting #46394

Changes in 7.4:

Upgrade to Gradle 5.6 #45005
Add support for tests.jvm.argline in testclusters #46540
Handle partial failure retrieving segments in SegmentCountStep #46556
Enforce realm name uniqueness #46253
Fallback to realm authc if ApiKey fails #46538
Fix highlighting for script_score query #46507
HLRC multisearchTemplate forgot params #46492
Fix SnapshotLifecycleMetadata xcontent serialization #46500
Fix the JVM we use for bwc nodes #46314
Ignore replication for noop updates #46458

Changes in 7.3:

SQL: Use null schema response #46386
SQL: fix scripting for grouped by datetime functions #46421
Update field-names-field.asciidoc #46430

Changes in 6.8:

Fix false positive out of sync warning in synced-flush #46576
Make reuse of sql test code explicit #45884
Add more meaningful keystore version mismatch errors #46291
SQL: Fix issue with common type resolution #46565
Fix class used to initialize logger in Watcher #46467
Always rebuild checkpoint tracker for old indices #46340

Changes in Elasticsearch Hadoop Plugin

Changes in 8.0:

BREAKING: Remove Scala 2.10 Support #1350
Upgrade to and fix compatibility with Gradle 5.5 #1333

Changes in 7.3:

[DOCS] Add 7.3.2 release notes #1352
[DOCS] Bump docs version from 7.3.1 to 7.3.2 #1353

Changes in 7.0:

[DOCS] Updates location of version attribute for Apache Hadoop Guide #1349

Changes in Elasticsearch SQL ODBC Driver

Changes in 7.5:

Add dropdown to select data encapsulation format #180

Changes in 7.3:

Bump version to 7.3.3 #183

Changes in Rally

Changes in 1.4.0:

Store track-related meta-data in results #771
Honor ingest-percentage for bulks #768
Remove merge times from command line report #767
Run a task completely even without time-periods #763

Changes in 1.3.0:

BREAKING: Remove MergeParts internal telemetry device #764

Changes in Rally Tracks

Adjust to new parameter source API #83

The Search AI Company

Generative AI

Search

Security

Observability

By solution

Industries

This Week In Elasticsearch and Apache Lucene - 2019-09-13

Highlights

Rally 1.3.0

Packaging

Snapshots

G1GC

Better storage of _source

UI: Giving hugs by fixing bugs

Pivot in SQL

Geo

Apache Lucene

Faster parallel search

Faster nearest neighbor search

Other

Changes in Elasticsearch

Changes in Elasticsearch Hadoop Plugin

Changes in Elasticsearch SQL ODBC Driver

Changes in Rally

Changes in Rally Tracks

Follow us

About us

Join us

Press

Partners

Trust & Security

Investor relations

EXCELLENCE AWARDS