Search Typeedit

There are different execution paths that can be done when executing a distributed search. The distributed search operation needs to be scattered to all the relevant shards and then all the results are gathered back. When doing scatter/gather type execution, there are several ways to do that, specifically with search engines.

One of the questions when executing a distributed search is how much results to retrieve from each shard. For example, if we have 10 shards, the 1st shard might hold the most relevant results from 0 till 10, with other shards results ranking below it. For this reason, when executing a request, we will need to get results from 0 till 10 from all shards, sort them, and then return the results if we want to ensure correct results.

Another question, which relates to the search engine, is the fact that each shard stands on its own. When a query is executed on a specific shard, it does not take into account term frequencies and other search engine information from the other shards. If we want to support accurate ranking, we would need to first gather the term frequencies from all shards to calculate global term frequencies, then execute the query on each shard using these global frequencies.

Also, because of the need to sort the results, getting back a large document set, or even scrolling it, while maintaining the correct sorting behavior can be a very expensive operation. For large result set scrolling, it is best to sort by _doc if the order in which documents are returned is not important.

Elasticsearch is very flexible and allows to control the type of search to execute on a per search request basis. The type can be configured by setting the search_type parameter in the query string. The types are:

Query Then Fetchedit

Parameter value: query_then_fetch.

The request is processed in two phases. In the first phase, the query is forwarded to all involved shards. Each shard executes the search and generates a sorted list of results, local to that shard. Each shard returns just enough information to the coordinating node to allow it merge and re-sort the shard level results into a globally sorted set of results, of maximum length size.

During the second phase, the coordinating node requests the document content (and highlighted snippets, if any) from only the relevant shards.

This is the default setting, if you do not specify a search_type in your request.

Dfs, Query Then Fetchedit

Parameter value: dfs_query_then_fetch.

Same as "Query Then Fetch", except for an initial scatter phase which goes and computes the distributed term frequencies for more accurate scoring.