Optimize dense vector storage for semantic search
When scaling semantic search, the memory footprint of dense vector embeddings is a primary concern. You can reduce storage requirements by configuring a quantization strategy on your semantic_text fields using the index_options parameter.
This guide walks you through choosing a strategy and applying it to a semantic_text field mapping. For full details on all available quantization options and their parameters, refer to the dense_vector field type reference.
- You need a
semantic_textfield that uses an inference endpoint producing dense vector embeddings (such as E5, OpenAI embeddings, or Cohere). - If you use a custom model, create the inference endpoint first using the Create inference API.
These index_options do not apply to sparse vector models like ELSER, which use a different internal representation.
To run the curl examples on this page, set the following environment variables:
export ELASTICSEARCH_URL="your-elasticsearch-url"
export API_KEY="your-api-key"
To generate API keys, search for API keys in the global search bar. Learn more about finding your endpoint and credentials.
Select a quantization strategy based on your dataset size and performance requirements:
| Strategy | Memory reduction | Best for | Trade-offs |
|---|---|---|---|
bbq_hnsw |
Up to 32x | Most production use cases (default for 384+ dimensions) | Minimal accuracy loss |
bbq_flat |
Up to 32x | Smaller datasets needing maximum accuracy | Slower queries (brute-force search) |
bbq_disk
|
Up to 32x | Large datasets with constrained RAM | Slower queries (disk-based) |
int8_hnsw |
4x | High accuracy retention | Lower compression than BBQ |
int4_hnsw |
8x | Balance between compression and accuracy | Some accuracy loss |
For most use cases with dense vector embeddings from text models, we recommend Better Binary Quantization (BBQ). BBQ requires a minimum of 64 dimensions and works best with text embeddings.
Create an index with a semantic_text field and set the index_options to your chosen quantization strategy.
PUT semantic-embeddings-optimized
{
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": ".multilingual-e5-small-elasticsearch",
"index_options": {
"dense_vector": {
"type": "bbq_hnsw"
}
}
}
}
}
}
- Reference to a text embedding inference endpoint. This example uses the built-in E5 endpoint, which is automatically available. For custom models, you must create the endpoint first using the Create inference API.
- Uses BBQ with HNSW indexing for up to 32x memory reduction.
Equivalent `curl` command
curl -X PUT "${ELASTICSEARCH_URL}/semantic-embeddings-optimized" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey ${API_KEY}" \
-d '{
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": ".multilingual-e5-small-elasticsearch",
"index_options": {
"dense_vector": {
"type": "bbq_hnsw"
}
}
}
}
}
}'
Use bbq_flat for smaller datasets where you need maximum accuracy at the expense of speed:
PUT semantic-embeddings-flat
{
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": ".multilingual-e5-small-elasticsearch",
"index_options": {
"dense_vector": {
"type": "bbq_flat"
}
}
}
}
}
}
- Reference to a text embedding inference endpoint. This example uses the built-in E5 endpoint, which is automatically available. For custom models, you must create the endpoint first using the Create inference API.
- BBQ without HNSW for smaller datasets. Uses brute-force search, so queries are slower but indexing is lighter.
Equivalent `curl` command
curl -X PUT "${ELASTICSEARCH_URL}/semantic-embeddings-flat" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey ${API_KEY}" \
-d '{
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": ".multilingual-e5-small-elasticsearch",
"index_options": {
"dense_vector": {
"type": "bbq_flat"
}
}
}
}
}
}'
For large datasets where RAM is constrained, use bbq_disk (DiskBBQ) to minimize memory usage:
PUT semantic-embeddings-disk
{
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": ".multilingual-e5-small-elasticsearch",
"index_options": {
"dense_vector": {
"type": "bbq_disk"
}
}
}
}
}
}
- Reference to a text embedding inference endpoint. This example uses the built-in E5 endpoint, which is automatically available. For custom models, you must create the endpoint first using the Create inference API.
- DiskBBQ keeps vectors compressed on disk, dramatically reducing RAM requirements at the cost of slower queries.
Equivalent `curl` command
curl -X PUT "${ELASTICSEARCH_URL}/semantic-embeddings-disk" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey ${API_KEY}" \
-d '{
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": ".multilingual-e5-small-elasticsearch",
"index_options": {
"dense_vector": {
"type": "bbq_disk"
}
}
}
}
}
}'
PUT semantic-embeddings-int8
{
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": ".multilingual-e5-small-elasticsearch",
"index_options": {
"dense_vector": {
"type": "int8_hnsw"
}
}
}
}
}
}
- Reference to a text embedding inference endpoint. This example uses the built-in E5 endpoint, which is automatically available. For custom models, you must create the endpoint first using the Create inference API.
- 8-bit integer quantization for ~4x memory reduction. For higher compression, use
"type": "int4_hnsw"(~8x reduction).
Equivalent `curl` command
curl -X PUT "${ELASTICSEARCH_URL}/semantic-embeddings-int8" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey ${API_KEY}" \
-d '{
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": ".multilingual-e5-small-elasticsearch",
"index_options": {
"dense_vector": {
"type": "int8_hnsw"
}
}
}
}
}
}'
Example response
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "semantic-embeddings-optimized"
}
trueconfirms the index was created successfully with your mapping configuration.
Confirm that the index_options are applied to your index:
GET semantic-embeddings-optimized/_mapping
curl -X GET "${ELASTICSEARCH_URL}/semantic-embeddings-optimized/_mapping" \
-H "Authorization: ApiKey ${API_KEY}"
The response includes the index_options you configured under the content field's mapping. If the index_options block is missing, check that you specified it correctly in the PUT request.
Example response
{
"semantic-embeddings-optimized": {
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": ".multilingual-e5-small-elasticsearch",
"index_options": {
"dense_vector": {
"type": "bbq_hnsw"
}
}
}
}
}
}
}
- The
index_optionsblock confirms your quantization strategy is applied. After indexing data, the mapping may also include auto-detectedmodel_settingssuch as dimensions and similarity metric.
For HNSW-based strategies, you can tune graph parameters like m and ef_construction in the index_options. Refer to the dense_vector field type reference for the full list of tunable parameters.
PUT semantic-embeddings-custom
{
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": ".multilingual-e5-small-elasticsearch",
"index_options": {
"dense_vector": {
"type": "bbq_hnsw",
"m": 32,
"ef_construction": 200
}
}
}
}
}
}
- Controls graph connectivity. Higher values improve recall at the cost of memory. Default:
16. - Controls index build quality. Higher values improve quality but slow indexing. Default:
100.
curl -X PUT "${ELASTICSEARCH_URL}/semantic-embeddings-custom" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey ${API_KEY}" \
-d '{
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": ".multilingual-e5-small-elasticsearch",
"index_options": {
"dense_vector": {
"type": "bbq_hnsw",
"m": 32,
"ef_construction": 200
}
}
}
}
}
}'
- Follow the Semantic search with
semantic_texttutorial to set up an end-to-end semantic search workflow. - Combine semantic search with keyword search using hybrid search.