﻿---
title: Sparse vector query
description: The sparse vector query executes a query consisting of sparse vectors, such as built by a learned sparse retrieval model. This can be achieved with one...
url: https://www.elastic.co/docs/reference/query-languages/query-dsl/query-dsl-sparse-vector-query
products:
  - Elasticsearch
---

# Sparse vector query
The sparse vector query executes a query consisting of sparse vectors, such as built by a learned sparse retrieval model. This can be achieved with one of two strategies:
- Using an natural language processing model to convert query text into a list of token-weight pairs
- Sending in precalculated token-weight pairs as query vectors

These token-weight pairs are then used in a query against a [sparse vector](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/sparse-vector) or a [semantic_text](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text) field with a compatible sparse inference model. At query time, query vectors are calculated using the same inference model that was used to create the tokens. When querying, these query vectors are ORed together with their respective weights, which means scoring is effectively a [dot product](/docs/reference/query-languages/query-dsl/query-dsl-script-score-query#vector-functions-dot-product) calculation between stored dimensions and query dimensions.
For example, a stored vector `{"feature_0": 0.12, "feature_1": 1.2, "feature_2": 3.0}` with query vector `{"feature_0": 2.5, "feature_2": 0.2}` would score the document `_score = 0.12*2.5 + 3.0*0.2 = 0.9`

## Example request using an natural language processing model

```json

{
   "query":{
      "sparse_vector": {
        "field": "ml.tokens",
        "inference_id": "the inference ID to produce the token weights",
        "query": "the query string"
      }
   }
}
```


## Example request using precomputed vectors

```json

{
   "query":{
      "sparse_vector": {
        "field": "ml.tokens",
        "query_vector": { "token1": 0.5, "token2": 0.3, "token3": 0.2 }
      }
   }
}
```


## Top level parameters for `sparse_vector`

<definitions>
  <definition term="field">
    (Required, string) The name of the field that contains the token-weight pairs to be searched against.
  </definition>
  <definition term="inference_id">
    (Optional, string) The [inference ID](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-inference) to use to convert the query text into token-weight pairs. It must be the same inference ID that was used to create the tokens from the input text. Only one of `inference_id` and `query_vector` is allowed. If `inference_id` is specified, `query` must also be specified. If all queried fields are of type [semantic_text](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text), the inference ID associated with the `semantic_text` field will be inferred. You can reference a `deployment_id` of a machine learning trained model deployment as an `inference_id`. For example, if you download and deploy the ELSER model in the Machine learning trained models UI in Kibana, you can use the `deployment_id` of that deployment as the `inference_id`.
  </definition>
  <definition term="query">
    (Optional, string) The query text you want to use for search. If `inference_id` is specified, `query` must also be specified. If `query_vector` is specified, `query` must not be specified.
  </definition>
  <definition term="query_vector">
    (Optional, dictionary) A dictionary of token-weight pairs representing the precomputed query vector to search. Searching using this query vector will bypass additional inference. Only one of `inference_id` and `query_vector` is allowed.
  </definition>
  <definition term="prune Elastic Stack: Generally available since 9.1, Elastic Stack: Preview in 9.0">
    (Optional, boolean) Whether to perform pruning, omitting the non-significant tokens from the query to improve query performance. If `prune` is true but the `pruning_config` is not specified, pruning will occur but default values will be used. Default: false.
  </definition>
  <definition term="pruning_config Elastic Stack: Generally available since 9.1, Elastic Stack: Preview in 9.0">
    (Optional, object) Optional pruning configuration. If enabled, this will omit non-significant tokens from the query in order to improve query performance. This is only used if `prune` is set to `true`. If `prune` is set to `true` but `pruning_config` is not specified, default values will be used.
    Parameters for `pruning_config` are:
    <definitions>
      <definition term="tokens_freq_ratio_threshold">
        (Optional, integer) Tokens whose frequency is more than `tokens_freq_ratio_threshold` times the average frequency of all tokens in the specified field are considered outliers and pruned. This value must between 1 and 100. Default: `5`.
      </definition>
      <definition term="tokens_weight_threshold">
        (Optional, float) Tokens whose weight is less than `tokens_weight_threshold` are considered insignificant and pruned. This value must be between 0 and 1. Default: `0.4`.
      </definition>
      <definition term="only_score_pruned_tokens">
        (Optional, boolean) If `true` we only input pruned tokens into scoring, and discard non-pruned tokens. It is strongly recommended to set this to `false` for the main query, but this can be set to `true` for a rescore query to get more relevant results. Default: `false`.
      </definition>
    </definitions>
    <note>
      The default values for `tokens_freq_ratio_threshold` and `tokens_weight_threshold` were chosen based on tests using ELSERv2 that provided the optimal results.
    </note>
  </definition>
</definitions>

When token pruning is applied, non-significant tokens will be pruned from the query.
Non-significant tokens can be defined as tokens that meet both of the following criteria:
- The token appears much more frequently than most tokens, indicating that it is a very common word and may not benefit the overall search results much.
- The weight/score is so low that the token is likely not very relevant to the original term

Both the token frequency threshold and weight threshold must show the token is non-significant in order for the token to be pruned.
This ensures that:
- The tokens that are kept are frequent enough and have significant scoring.
- Very infrequent tokens that may not have as high of a score are removed.


## Example ELSER query

The following is an example of the `sparse_vector` query that references the ELSER model to perform semantic search. For a more detailed description of how to perform semantic search by using ELSER and the `sparse_vector` query, refer to [this tutorial](https://www.elastic.co/docs/solutions/search/semantic-search/semantic-search-elser-ingest-pipelines).
```json

{
   "query":{
      "sparse_vector": {
         "field": "ml.tokens",
         "inference_id": "my-elser-model",
         "query": "How is the weather in Jamaica?"
      }
   }
}
```

Multiple `sparse_vector` queries can be combined with each other or other query types. This can be achieved by wrapping them in [boolean query clauses](https://www.elastic.co/docs/reference/query-languages/query-dsl/query-dsl-bool-query) and using linear boosting:
```json

{
  "query": {
    "bool": {
      "should": [
        {
          "sparse_vector": {
            "field": "ml.inference.title_expanded.predicted_value",
            "inference_id": "my-elser-model",
            "query": "How is the weather in Jamaica?",
            "boost": 1
          }
        },
        {
          "sparse_vector": {
            "field": "ml.inference.description_expanded.predicted_value",
            "inference_id": "my-elser-model",
            "query": "How is the weather in Jamaica?",
            "boost": 1
          }
        },
        {
          "multi_match": {
            "query": "How is the weather in Jamaica?",
            "fields": [
              "title",
              "description"
            ],
            "boost": 4
          }
        }
      ]
    }
  }
}
```

This can also be achieved using [reciprocal rank fusion (RRF)](https://www.elastic.co/docs/reference/elasticsearch/rest-apis/reciprocal-rank-fusion), through an [`rrf` retriever](https://www.elastic.co/docs/reference/elasticsearch/rest-apis/retrievers/rrf-retriever) with multiple [`standard` retrievers](https://www.elastic.co/docs/reference/elasticsearch/rest-apis/retrievers/standard-retriever).
```json

{
  "retriever": {
    "rrf": {
      "retrievers": [
        {
          "standard": {
            "query": {
              "multi_match": {
                "query": "How is the weather in Jamaica?",
                "fields": [
                  "title",
                  "description"
                ]
              }
            }
          }
        },
        {
          "standard": {
            "query": {
              "sparse_vector": {
                "field": "ml.inference.title_expanded.predicted_value",
                "inference_id": "my-elser-model",
                "query": "How is the weather in Jamaica?",
                "boost": 1
              }
            }
          }
        },
        {
          "standard": {
            "query": {
              "sparse_vector": {
                "field": "ml.inference.description_expanded.predicted_value",
                "inference_id": "my-elser-model",
                "query": "How is the weather in Jamaica?",
                "boost": 1
              }
            }
          }
        }
      ],
      "window_size": 10,
      "rank_constant": 20
    }
  }
}
```


## Example query on a `semantic_text` field

You can also run a `sparse_vector` query directly on a [`semantic_text`](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text) field. In this case Elasticsearch automatically uses the inference endpoint configured in the field mapping to expand the query into sparse tokens.
First, create an index with a `semantic_text` field:
```json

{
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "content_semantic": {
        "type": "semantic_text",
        "inference_id": ".elser-2-elasticsearch"
      }
    }
  }
}
```

Index some example documents:
```json

{ "index": { "_index": "my-semantic-sparse-index", "_id": "1" } }
{ "title": "Best surfing spots", "content_semantic": "Hawaii has world-class surfing with warm water and consistent swells." }
{ "index": { "_index": "my-semantic-sparse-index", "_id": "2" } }
{ "title": "City breaks", "content_semantic": "Paris offers museums, cafés, and beautiful architecture." }
{ "index": { "_index": "my-semantic-sparse-index", "_id": "3" } }
{ "title": "Learning to surf", "content_semantic": "Beginners often start on longboards at gentle beach breaks." }
```

Then query with `sparse_vector` against the `semantic_text` field:
```json

{
  "size": 3,
  "query": {
    "sparse_vector": {
      "field": "content_semantic",
      "query": "best places to surf as a beginner" 
    }
  }
}
```


## Example ELSER query with pruning configuration and rescore

The following is an extension to the above example that adds a pruning configuration to the `sparse_vector` query. The pruning configuration identifies non-significant tokens to prune from the query in order to improve query performance.
Token pruning happens at the shard level. While this should result in the same tokens being labeled as insignificant across shards, this is not guaranteed based on the composition of each shard. Therefore, if you are running `sparse_vector` with a `pruning_config` on a multi-shard index, we strongly recommend adding a [Rescore filtered search results](/docs/reference/elasticsearch/rest-apis/rescore-search-results#rescore) function with the tokens that were originally pruned from the query. This will help mitigate any shard-level inconsistency with pruned tokens and provide better relevance overall.
```json

{
   "query":{
      "sparse_vector":{
         "field": "ml.tokens",
         "inference_id": "my-elser-model",
         "query":"How is the weather in Jamaica?",
         "prune": true,
         "pruning_config": {
           "tokens_freq_ratio_threshold": 5,
           "tokens_weight_threshold": 0.4,
           "only_score_pruned_tokens": false
         }
      }
   },
   "rescore": {
      "window_size": 100,
      "query": {
         "rescore_query": {
            "sparse_vector": {
               "field": "ml.tokens",
               "inference_id": "my-elser-model",
               "query": "How is the weather in Jamaica?",
               "prune": true,
               "pruning_config": {
                   "tokens_freq_ratio_threshold": 5,
                   "tokens_weight_threshold": 0.4,
                   "only_score_pruned_tokens": true
               }
            }
         }
      }
   }
}
```

<note>
  When performing [cross-cluster search](https://www.elastic.co/docs/explore-analyze/cross-cluster-search), inference is performed on the local cluster.
</note>