﻿---
title: Sparse vector field type
description: A sparse_vector field can index features and weights so that they can later be used to query documents in queries with a sparse_vector. This field can...
url: https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/sparse-vector
products:
  - Elasticsearch
applies_to:
  - Elastic Cloud Serverless: Generally available
  - Elastic Stack: Generally available
---

# Sparse vector field type
A `sparse_vector` field can index features and weights so that they can later be used to query documents in queries with a [`sparse_vector`](https://www.elastic.co/docs/reference/query-languages/query-dsl/query-dsl-sparse-vector-query). This field can also be used with a legacy [`text_expansion`](https://www.elastic.co/docs/reference/query-languages/query-dsl/query-dsl-text-expansion-query) query.
`sparse_vector` is the field type that should be used with [ELSER mappings](https://www.elastic.co/docs/solutions/search/semantic-search/semantic-search-elser-ingest-pipelines#elser-mappings).
```json

{
  "mappings": {
    "properties": {
      "text.tokens": {
        "type": "sparse_vector"
      }
    }
  }
}
```


## Token pruning

<applies-to>
  - Elastic Stack: Generally available since 9.1
</applies-to>

With any new indices created, token pruning will be turned on by default with appropriate defaults. You can control this behaviour using the optional `index_options` parameters for the field:
```json

{
  "mappings": {
    "properties": {
      "text.tokens": {
        "type": "sparse_vector",
        "index_options": {
          "prune": true,
          "pruning_config": {
            "tokens_freq_ratio_threshold": 5,
            "tokens_weight_threshold": 0.4
          }
        }
      }
    }
  }
}
```

See [semantic search with ELSER](https://www.elastic.co/docs/solutions/search/semantic-search/semantic-search-elser-ingest-pipelines) for a complete example on adding documents to a `sparse_vector` mapped field using ELSER.

## Parameters for `sparse_vector` fields

The following parameters are accepted by `sparse_vector` fields:
<definitions>
  <definition term="index_options Elastic Stack: Generally available since 9.1">
    (Optional, object) You can set index options for your  `sparse_vector` field to determine if you should prune tokens, and the parameter configurations for the token pruning. If pruning options are not set in your [`sparse_vector` query](https://www.elastic.co/docs/reference/query-languages/query-dsl/query-dsl-sparse-vector-query), Elasticsearch will use the default options configured for the field, if any.
  </definition>
</definitions>

Parameters for `index_options` are:
<definitions>
  <definition term="prune Elastic Stack: Generally available since 9.1">
    (Optional, boolean) Whether to perform pruning, omitting the non-significant tokens from the query to improve query performance. If `prune` is true but the `pruning_config` is not specified, pruning will occur but default values will be used. Default: true.
  </definition>
  <definition term="pruning_config Elastic Stack: Generally available since 9.1">
    (Optional, object) Optional pruning configuration. If enabled, this will omit non-significant tokens from the query in order to improve query performance. This is only used if `prune` is set to `true`. If `prune` is set to `true` but `pruning_config` is not specified, default values will be used. If `prune` is set to false but `pruning_config` is specified, an exception will occur.
    Parameters for `pruning_config` include:
    <definitions>
      <definition term="tokens_freq_ratio_threshold Elastic Stack: Generally available since 9.1">
        (Optional, integer) Tokens whose frequency is more than `tokens_freq_ratio_threshold` times the average frequency of all tokens in the specified field are considered outliers and pruned. This value must between 1 and 100. Default: `5`.
      </definition>
      <definition term="tokens_weight_threshold Elastic Stack: Generally available since 9.1">
        (Optional, float) Tokens whose weight is less than `tokens_weight_threshold` are considered insignificant and pruned. This value must be between 0 and 1. Default: `0.4`.
      </definition>
    </definitions>
    <note>
      The default values for `tokens_freq_ratio_threshold` and `tokens_weight_threshold` were chosen based on tests using ELSERv2 that provided the optimal results.
    </note>
  </definition>
</definitions>

When token pruning is applied, non-significant tokens will be pruned from the query.
Non-significant tokens can be defined as tokens that meet both of the following criteria:
- The token appears much more frequently than most tokens, indicating that it is a very common word and may not benefit the overall search results much.
- The weight/score is so low that the token is likely not very relevant to the original term

Both the token frequency threshold and weight threshold must show the token is non-significant in order for the token to be pruned.
This ensures that:
- The tokens that are kept are frequent enough and have significant scoring.
- Very infrequent tokens that may not have as high of a score are removed.


## Accessing `sparse_vector` fields in search responses

<applies-to>
  - Elastic Cloud Serverless: Generally available
  - Elastic Stack: Generally available since 9.2
</applies-to>

By default, `sparse_vector` fields are **not included in `_source`** in responses from the `_search`, `_msearch`, `_get`, and `_mget` APIs.
This helps reduce response size and improve performance, especially in scenarios where vectors are used solely for similarity scoring and not required in the output.
To retrieve vector values explicitly, you can use:
- The `fields` option to request specific vector fields directly:

```json

{
  "fields": ["my_vector"]
}
```

- The `_source.exclude_vectors` flag to re-enable vector inclusion in `_source` responses:

```json

{
  "_source": {
    "exclude_vectors": false
  }
}
```

<tip>
  For more context about the decision to exclude vectors from `_source` by default, read the [blog post](https://www.elastic.co/search-labs/blog/elasticsearch-exclude-vectors-from-source).
</tip>


### Storage behavior and `_source`

By default, `sparse_vector` fields are not stored in `_source` on disk. This is also controlled by the index setting `index.mapping.exclude_source_vectors`.
This setting is enabled by default for newly created indices and can only be set at index creation time.
When enabled:
- `sparse_vector` fields are removed from `_source` and the rest of the `_source` is stored as usual.
- If a request includes `_source` and vector values are needed (e.g., during recovery or reindex), the vectors are rehydrated from their internal format.

This setting is compatible with synthetic `_source`, where the entire `_source` document is reconstructed from columnar storage. In full synthetic mode, no `_source` is stored on disk, and all fields — including vectors — are rebuilt when needed.

### Rehydration and precision

When vector values are rehydrated (e.g., for reindex, recovery, or explicit `_source` requests), they are restored from their internal format.
Internally, vectors are stored as floats with 9 significant bits for the precision, so the rehydrated values will have reduced precision.
This lossy representation is intended to save space while preserving search quality.

### Storing original vectors in `_source`

If you want to preserve the original vector values exactly as they were provided, you can re-enable vector storage in `_source`:
```json

{
  "settings": {
    "index.mapping.exclude_source_vectors": false
  },
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "sparse_vector"
      }
    }
  }
}
```

When this setting is disabled:
- `sparse_vector` fields are stored as part of the `_source`, exactly as indexed.
- The index will store both the original `_source` value and the internal representation used for vector search, resulting in increased storage usage.
- Vectors are once again returned in `_source` by default in all relevant APIs, with no need to use `exclude_vectors` or `fields`.

This configuration is appropriate when full source fidelity is required, such as for auditing or round-tripping exact input values.

## Multi-value sparse vectors

When passing in arrays of values for sparse vectors the max value for similarly named features is selected.
The paper Adapting Learned Sparse Retrieval for Long Documents ([[https://arxiv.org/pdf/2305.18494.pdf](https://arxiv.org/pdf/2305.18494.pdf)](https://arxiv.org/pdf/2305.18494.pdf)) discusses this in more detail. In summary, research findings support representation aggregation typically outperforming score aggregation.
For instances where you want to have overlapping feature names use should store them separately or use nested fields.
Below is an example of passing in a document with overlapping feature names. Consider that in this example two categories exist for positive sentiment and negative sentiment. However, for the purposes of retrieval we also want the overall impact rather than specific sentiment. In the example `impact` is stored as a multi-value sparse vector and only the max values of overlapping names are stored. More specifically the final `GET` query here returns a `_score` of ~1.2 (which is the `max(impact.delicious[0], impact.delicious[1])` and is approximate because we have a relative error of 0.4% as explained below)
```json

{
  "mappings": {
    "properties": {
      "text": {
        "type": "text",
        "analyzer": "standard"
      },
      "impact": {
        "type": "sparse_vector"
      },
      "positive": {
        "type": "sparse_vector"
      },
      "negative": {
        "type": "sparse_vector"
      }
    }
  }
}


{
    "text": "I had some terribly delicious carrots.",
    "impact": [{"I": 0.55, "had": 0.4, "some": 0.28, "terribly": 0.01, "delicious": 1.2, "carrots": 0.8},
               {"I": 0.54, "had": 0.4, "some": 0.28, "terribly": 2.01, "delicious": 0.02, "carrots": 0.4}],
    "positive": {"I": 0.55, "had": 0.4, "some": 0.28, "terribly": 0.01, "delicious": 1.2, "carrots": 0.8},
    "negative": {"I": 0.54, "had": 0.4, "some": 0.28, "terribly": 2.01, "delicious": 0.02, "carrots": 0.4}
}


{
  "query": {
    "term": {
      "impact": {
         "value": "delicious"
      }
    }
  }
}
```


## Updating `sparse_vector` fields

When using the [Update API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-update) with the `doc` parameter, `sparse_vector` fields behave like nested objects and are **merged** rather than replaced. This means:
- Existing tokens in the sparse vector are preserved
- New tokens are added
- Tokens present in both the existing and new data will have their values updated

This is different from primitive array fields (like `keyword`), which are replaced entirely during updates.

### Example of merging behavior

Original document:
```json

{
  "my_vector": {
    "token_a": 0.5,
    "token_b": 0.8
  }
}
```

Partial update:
```json

{
  "doc": {
    "my_vector": {
      "token_c": 0.3
    }
  }
}
```

Observe that tokens are merged, not replaced:
```js
{
  "my_vector": {
    "token_a": 0.5,
    "token_b": 0.8,
    "token_c": 0.3
  }
}
```


### Replacing the entire `sparse_vector` field

To replace the entire contents of a `sparse_vector` field, use a [script](https://www.elastic.co/docs/explore-analyze/scripting/modules-scripting-using) in your update request:
```json

{
  "script": {
    "source": "ctx._source.my_vector = params.new_vector",
    "params": {
      "new_vector": {
        "token_x": 1.0,
        "token_y": 0.6
      }
    }
  }
}
```

<note>
  This same merging behavior also applies to [`rank_features` fields](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/rank-features), because they are also object-like structures.
</note>


## Important notes and limitations

- `sparse_vector` fields cannot be included in indices that were **created** on Elasticsearch versions between 8.0 and 8.10
- `sparse_vector` fields only support strictly positive values. Negative values will be rejected.
- `sparse_vector` fields do not support [analyzers](https://www.elastic.co/docs/manage-data/data-store/text-analysis), querying, sorting or aggregating. They may only be used within specialized queries. The recommended query to use on these fields are [`sparse_vector`](https://www.elastic.co/docs/reference/query-languages/query-dsl/query-dsl-sparse-vector-query) queries. They may also be used within legacy [`text_expansion`](https://www.elastic.co/docs/reference/query-languages/query-dsl/query-dsl-text-expansion-query) queries.
- `sparse_vector` fields only preserve 9 significant bits for the precision, which translates to a relative error of about 0.4%.