25. April 2017

Tribe Nodes & Cross-Cluster Search: The Future of Federated Search in Elasticsearch

Elasticsearch has a powerful _search API that allows it to search against all indices on the local cluster. We recently released Elasticsearch 5.3.0 including a new functionality called Cross-Cluster Search that allows users to compose searches not just across local indices but also across clusters. This means that one can search against data that belongs to other remote clusters. Historically, the Tribe Node was used when searches needed to span multiple clusters, yet it works very differently. In this blog post, we will cover why we implemented Cross-Cluster Search, how it works and compares to the Tribe Node, and why we are convinced it is a step in the right direction for the future of federated search using Elasticsearch.

Keep it simple

When we sat down to try and redesign the next-generation Tribe Node, we re-evaluated the problems that it was trying to solve. The goal was to make federated search possible without all the limitations that the Tribe Node provides, and without adding a specific API for it, so that maintaining such feature would also be easier. We realized that some of the features that the Tribe Node offers besides federated search are commodities. The Tribe Node supports many Elasticsearch APIs allowing, for instance, to retrieve the cluster state or nodes stats via a Tribe Node, which will return the information collected from all the remote clusters and merged into a single view. Merging information coming from different sources is nothing complicated though, and is easily performed on the client side by sending multiple requests to the different clusters. The hard problem that must be addressed on the server side is federated search. It involves a distributed scatter and gather to be executed on nodes belonging to multiple clusters as well as the result merging and reduction requiring internal knowledge. That is why we decided to focus on solving this specific problem in a sustainable and robust way by adding support for Cross-Cluster Search to the existing _search API.

Search API Detour

The _search API allows Elasticsearch to execute searches, queries, aggregations, suggestions, and more against multiple indices, each one composed by one or more shards. When a Client sends a search request to an Elasticsearch cluster, the node that receives the request acts as the coordinating node for the whole execution of the request. It identifies which indices, shards, and nodes the search has to be executed against. While executing, all data nodes holding a shard that is queried receive requests in parallel, then each node executes the query phase locally and sends the results back to the coordinating node. The coordinating node waits for all shards to respond in order to reduce the results down to the top-N hits that need to be fetched from the shards. A second execution phase then fetches the top-N documents from the shards in order to return the results back to the Client.

How Cross-Cluster Search Works

Since Elasticsearch 5.3.0, it is possible to register remote clusters through the cluster update settings API under the search.remote namespace. Each cluster is identified by a cluster alias and a list of seed nodes that are used to discover other nodes belonging to the remote cluster, as follows:

PUT _cluster/settings
  {
   "persistent": {
     "search": {
       "remote": {
         "cluster_one": {
           "seeds": ["remote_node_one:9300"]
         },
         "cluster_two": {
           "seeds": ["remote_node_two:9300"]
         }
       }
     }
   }
  }

Once one or more remote clusters are registered, it is possible to execute search requests against their indices using the _search API of the cluster the remotes are configured on. In contrast to a so-called local index, remote indices must be disambiguated with their corresponding configured cluster alias separated by a colon (e.g. cluster_two:index_test).

Whenever a search request expands to an index on a remote cluster, the coordinating node resolves the shards of these indices on the remote cluster by sending one _search_shards request per cluster. Once the shards and remote data nodes are fetched, searches are executed just like any other search on the local cluster explained above, using the exact same code-paths which improves testability and robustness dramatically.

From a search execution perspective, there is no difference between local indices and indices that belong to remote clusters as long as the coordinating node can reach some nodes belonging to the remote clusters. Finally, the hits returned as part of the search response which belong to remote clusters have their index name prefixed with their cluster alias.

When registering a remote cluster, Elasticsearch discovers by default up to 3 nodes per configured remote cluster through the seed nodes in the configuration. In contrast to nodes in the local cluster, where any node connects to any other node, cross cluster search nodes are only connected in an unidirectional fashion. Those are the nodes that the coordinating node will communicate with as part of a Cross-Cluster Search request. It is possible to control how many nodes are discovered when registering the remote clusters, as well as mark nodes belonging to remote clusters as gateways in order to control exactly which nodes will receive connections from the outside world. In case a node holds data that has to be accessed as part of a cross cluster search request, that node will not receive a direct connection from the remote coordinating node, but rather from another node marked as gateway in its own cluster, which will act as a proxy.

Additionally, it is possible to control which nodes can act as coordinating nodes as part of a cross cluster search request through the search.remote.connect setting. This is useful to control which nodes in a cluster can send requests to remote clusters. If a node that is not allowed to connect to remote clusters receives a search request that involves remote clusters, an error will be returned.

What about the Tribe Node?

Searching against multiple clusters isn’t something new to Elasticsearch. In fact, users have been doing that up until now using a Tribe Node.

The Tribe Node is a separate node whose main job is to sniff the cluster states of the remote clusters and merge them altogether. In order to do that, it joins all the remote clusters which makes it a very special node that doesn’t belong to a cluster of its own, yet it joins multiple clusters.

When a Tribe Node receives a search request, it knows right away which nodes to forward it to as it holds the cluster state of all the registered remote clusters, it’s in fact a node in all of the clusters. That means that the additional “find out where the remote shards are and which node they belong to” step required with Cross-Cluster Search is not required with a Tribe Node. It is important to note that Cross-Cluster Search doesn’t require additional nodes, given that any node can act as the coordinating node for a search request, regardless of whether or not the request spans across multiple clusters. That also means that no additional nodes join the remote clusters, hence cluster state updates don’t have to be sent to those, which could potentially slow down the remote clusters when using a Tribe Node. That is because the Tribe Node receives and has to acknowledge every single cluster state update from every remote cluster.

Furthermore, when merging cluster states from the remote clusters, the Tribe Node cannot keep indices that have the same name in its own cluster state, although they belong to multiple clusters. This is quite a big limitation which Cross-Cluster Search addresses, as well as being able to dynamically register, update, and remove remote clusters, which requires a node restart when using the Tribe Node.

Also, being the tribe node such a special node, it turned out very hard to maintain code-wise over time, as it is the exception to the main assumption for an Elasticsearch node, that a node must belong to one and only one cluster.

Further Search Improvements

Elasticsearch’s retrieval capabilities are outstanding given that even with significant cluster size users can search across as many shards as they feel like. But wait, that’s not necessarily the case — at least in the current state of affairs. Elasticsearch comes with a limitation in the form of a soft-limit action.search.shard_count.limit which rejects search requests that expand to more than 1000 shards. Why would we want to limit this? Well, the reason here is obvious given the implementation detail mentioned in the search detour section above. We fan out to all shards and maintain all shards results in a dense data structure until all shards responded. Imagine you are running large aggregations, the coordinating node has to maintain a non-trivial amount of memory per search request to the entire duration of the initial phase.

Now with the addition of cross cluster search where we are emphasizing searches across many, many shards, having such a soft limit isn’t providing a good user experience after all. In order to eliminate the impact of querying a large number of shards, the coordinating node now reduces aggregations in batches of 512 results down to a single aggregation object while it’s waiting for more results to arrive. This upcoming improvement was the initial step to eventually remove the soft limit altogether. The upcoming 5.4.0 release will also allow to reduce the top-N documents in batches. With these two major memory consumers under control, we now default the action.search.shard_count.limit setting to unlimited. This allows users to still limit and protect their searches in the number of shards while providing a good user experience for other users.

Conclusions

Cross-Cluster Search is the new way of executing federated search, which will replace the Tribe Node in the future. It works very differently compared to the Tribe Node and is not subject to most of the drawbacks that the Tribe Node comes with. We invite you to try it out by downloading Elasticsearch 5.3.x (or deploying on Elastic Cloud) and give us feedback.