Hexagonal spatial analytics in Elasticsearch

blog-header-hexagonal-spatial-analytics.jpg

Elasticsearch 8.7.0 can do some very cool geo-spatial aggregations over a hexagonal grid, which has nice advantages over the existing rectangular grid we’ve been using for years. This feature has been introduced over the last few releases and is now really worth bragging about.

How did we get there? In 2018 Uber announced it had open sourced the H3 library, enabling hexagonal tiling of the planet for much better analytics of the company’s traffic and regional pricing models.

elastic hexagonal tiling

In late 2021, we evaluated the option of porting the Uber H3 library from C to Java for possible inclusion in Elasticsearch. This project went very well, and in early 2022 was released as a standalone artifact under the Apache2 license, as well as part of a new hexagonal aggregation feature within Elasticsearch and Kibana 8.1. While that was a fantastic addition to Elasticsearch, it did have one very clear limitation: aggregations only worked on point data, specifically fields mapped as geo_point. Now in Elasticsearch 8.7, we are pleased to announce support for geo_shape aggregations as well as many performance and memory improvements to our port of H3.

Boosting H3 on Java

First, let’s talk about how we improved our version of the H3 library. The original port aimed to be a faithful copy of the Uber C code, translated into Java. This worked well enough for the earlier release of hexagonal aggregations over geo_point data, but as we enhanced this to support the much more complex problem of aggregating geo_shape data, we realized the performance of the library needed to improve. 

We developed microbenchmarks in order to compare the performance of the java bindings provided by Uber’s H3 library with our Java implementation and we noticed that two functions were significantly slower: the method that resolves a latitude/longitude pair into an H3 cell and the function that returns the boundary of an H3 cell:

Benchmark                       Mode  Cnt    Score   Error  Units
H3Benchmark.h3BoundaryElastic  thrpt   25  522,524 ± 3,770  ops/s
H3Benchmark.h3BoundaryUber     thrpt   25  717,033 ± 1,687  ops/s
H3Benchmark.pointToH3Elastic   thrpt   25  100,912 ± 0,240  ops/s
H3Benchmark.pointToH3Uber      thrpt   25  166,608 ± 2,017  ops/s

When investigating these differences, we discovered that there were many parts of the library that needed to be optimized for both memory and performance:

  • We refactored the code to be more in line with Java standards and worked to minimize the object allocation since these functions will be called in hot loops.
  • We used a faster trigonometric math implementation based on the fdlibm package.
  • We used planar 3d math whenever possible to try to minimize the number of calls to trigonometric functions.

After these changes, the performance improvements were very satisfactory:

Benchmark                       Mode  Cnt     Score    Error  Units
H3Benchmark.h3BoundaryElastic  thrpt   25  1303,305 ± 19,108  ops/s
H3Benchmark.h3BoundaryUber     thrpt   25   716,137 ±  4,040  ops/s
H3Benchmark.pointToH3Elastic   thrpt   25   189,624 ±  1,400  ops/s
H3Benchmark.pointToH3Uber      thrpt   25   168,549 ±  0,479  ops/s

In addition to this, our needs were not always supported by existing functions in the H3 library, and so additional functions were written. For example, we added a function that given an H3 cell, it returns the H3 cells at the child level that intersect the provided cell but are not part of the children set. This allowed us to optimize hexagonal aggregations geometrically.

Elasticsearch geohex aggregations

Elasticsearch originally introduced a geohash based grid aggregation over geo_point data way back in 2014. Over the years, this has been enhanced in several ways: first, the addition of `z/x/y` tiling as a superior mechanism to geohash, and second, the introduction of geo_shape as a supported field type for geo-grid aggregations.

field type go-grid aggregations

Then in Elasticsearch 8.1, we added support for geo-grid tiling using H3 hexagonal grids over geo_point, and now finally in Elasticsearch 8.7 we’ve extended this support to geo_shape.

This development was challenging due to the different properties of the H3 grid in comparison to geotile or geohash grids. Geohash and geotile hierarchy has the property of geometric containment, which means that the children of a cell are fully contained by the cell. On the other hand, the Uber H3 hierarchy is only logical and the children of a cell are not fully contained, which makes navigating the hierarchy much more complicated.

geometric containment
Geometric containment, for example geotile. The big square fully contains the four children squares.
logical containment
Logical containment in Uber h3. The parent hexagon does not fully contain the seven hexagon children.

When performing aggregations over the grid at a specific resolution, what is done internally is to answer the question “which tiles at this resolution intersect a particular geometry?” With the geometric containment we see above, this can be implemented as a simple recursive algorithm: start at resolution 0, and if the tile intersects the geometry, recurse to the four child tiles (or 32 child tiles for geohash), and repeat the test. No child will be checked if its parent does not intersect, reducing the search space. Consider a search for the geotile containing the French city of “Albert” at precision 8. The following visualization shows the five steps from precision 4 to 8 of the search:

child tiles
Each geotile is divided into four child tiles, one of which contains the search point. It is then further divided, with the search recursing down to the desired precision.

For geohex aggregations, however, we need to be a lot cleverer. In order to navigate the hierarchy geometrically, we added a new function to the H3 API, which returns the cells that are not logical children of a cell but still intersect the cell. Then we will navigate those cells if we detect that their parent cells are not part of the solution.

logical children intersect
Green cells are logical children; yellow cells are not logical children, but still intersect the parent cell.

As with geotile and geohash aggregations, it is possible to traverse the hierarchy in a recursive way when asking the question “which tiles at this resolution intersect a particular geometry?” But we need to handle the case when a geometry intersects a child tile but not the parent tile (the light green parts in the image above), or conversely when a geometry intersects a parent tile, but not one of its children. These are the same concern and can be solved by using two recursion conditions evaluated at each parent cell:

  • Search the direct children if the parent intersects (same condition as with geotile and geohex). This will cover the area of the child cells that intersect the parent cell.
  • Also search the intersecting non-children (yellow cells in the image above) if the parent of those non-children does not intersect the geometry. This means we will only recurse down for the area of overlap between the yellow and blue above, and it completes the entire area covered by the parent cell.

To clarify this, consider again a search for the town of “Albert” in France, where we can see that this town’s location intersects the cell “811fb” but not any of its children:

inside parent cell
The town of “Albert” is inside parent cell “811fb” but not any of its children. Instead it is found in the triangular intersection between “811fb” and the child of a neighboring cell.

Repeatedly recursing down through either the intersecting children or the intersecting non-children will uniquely reach the cell that contains the point at the desired precision:

inside intersecting children
The last two recursive steps found the town of “Albert” inside intersecting children.

Kibana and geohex

Kibana announced support for geohex aggregations over geo_point for version 8.1 in March 2022. Now in version 8.7, we support geohex aggregations over geo_shape as well. Let’s take a look at how to access this feature in Kibana maps.

kibana map geo-shapes
Kibana map with geo_shapes added as a ‘documents’ layer

Assuming you have already loaded some spatial data containing geo_shape fields, it is possible to visualize the shapes themselves on the map by adding a Documents layer, as seen in the above screenshot. It is also possible to add a layer with the results of a tiled aggregation. This is achieved using the Clusters layer type, seen in the third row of layer types above.

hexagons add layer
Adding a ‘Hexagonal’ clusters layer in Kibana maps editor

After selecting the data view and geo_shape field name you need to choose the geo-grid aggregation type. By default, Kibana selects Clusters, which will use a rectangular tiling but display as circles sized to the document count. You should select Grids for a geotile aggregation or Hexagons for a geohex aggregation. Finally click Add Layer, choose a layer name, and save the layer.

hexagonal layer settings

A very useful feature added to Kibana maps recently is the ability to synchronize the panning and zooming of two map views within the same dashboard. This allows us to compare and contrast different aggregations visually. Particularly relevant to the new geohex aggregation is the ability to contrast it with the already existing geotile aggregation. Or in terms of the Kibana UI, this means to add two cluster layers with Grid and Hexagon selected respectively.

This can be achieved either by starting with a blank dashboard, and adding two maps to it, each with the relevant aggregation added, or by starting by creating a new map and adding it to an existing dashboard. One approach that is particularly flexible is to create a single map with both layers, add it to a dashboard, and then clone the map. In each copy, hide the layer you do not want to see with the hide layer icon.

hexagonal layers

Finally, use the synchronize map movement option to synchronize the maps.

synchronize map movement option

The result can be very striking in a demo.

elastic map demo

To see this in action, take a look at this short video that demonstrates this new feature and compares it with the geotile grid aggregation over both geo_point and geo_shape.

videoImage

Conclusions

Geo-grid aggregations have been a useful feature of Elasticsearch and Kibana maps for a long time. The addition of hexagonal grids over the past few releases adds the power of the popular Uber H3 tiling approach to Elasticsearch aggregations and to Kibana cluster layers. This aggregation mechanism has many advantages over rectangular tiles, in particular the fact that the area of each tile is approximately the same all over the planet, allowing for far more relevant statistical results when compared to equi-rectangular grids, which have extreme distortions away from the equator. Also, the distance between each hexagonal tile and its neighbors is the same in all directions, which is not true for rectangular grids.

We hope you find this feature as useful as we do, and we welcome any feedback from users.

Do you want to learn more? Take a look at our reference documentation, or watch a video demo: