Optimize dense vector storage for semantic search

When scaling semantic search, the memory footprint of dense vector embeddings is a primary concern. You can reduce storage requirements by configuring a quantization strategy on your semantic_text fields using the index_options parameter.

This guide walks you through choosing a strategy and applying it to a semantic_text field mapping. For full details on all available quantization options and their parameters, refer to the dense_vector field type reference.

Requirements

You need a semantic_text field that uses an inference endpoint producing dense vector embeddings (such as E5, OpenAI embeddings, or Cohere).
If you use a custom model, create the inference endpoint first using the Create inference API.

Note

These index_options do not apply to sparse vector models like ELSER, which use a different internal representation.

Tip

To run the curl examples on this page, set the following environment variables:

		export ELASTICSEARCH_URL="your-elasticsearch-url"
export API_KEY="your-api-key"

To generate API keys, search for API keys in the global search bar. Learn more about finding your endpoint and credentials.

Choose a quantization strategy

Select a quantization strategy based on your dataset size and performance requirements:

Strategy	Memory reduction	Best for	Trade-offs
`bbq_hnsw`	Up to 32x	Most production use cases (default for 384+ dimensions)	Minimal accuracy loss
`bbq_flat`	Up to 32x	Smaller datasets needing maximum accuracy	Slower queries (brute-force search)
`bbq_disk`	Up to 32x	Large datasets with constrained RAM	Slower queries (disk-based)
`int8_hnsw`	4x	High accuracy retention	Lower compression than BBQ
`int4_hnsw`	8x	Balance between compression and accuracy	Some accuracy loss

For most use cases with dense vector embeddings from text models, we recommend Better Binary Quantization (BBQ). BBQ requires a minimum of 64 dimensions and works best with text embeddings.

Configure your index mapping

Create an index with a semantic_text field and set the index_options to your chosen quantization strategy.

BBQ with HNSW

						PUT semantic-embeddings-optimized
					{
  "mappings": {
    "properties": {
      "content": {
        "type": "semantic_text",
        "inference_id": ".multilingual-e5-small-elasticsearch",
        "index_options": {
          "dense_vector": {
            "type": "bbq_hnsw"
          }
        }
      }
    }
  }
}
		
	

Reference to a text embedding inference endpoint. This example uses the built-in E5 endpoint, which is automatically available. For custom models, you must create the endpoint first using the Create inference API.
Uses BBQ with HNSW indexing for up to 32x memory reduction.

		curl -X PUT "${ELASTICSEARCH_URL}/semantic-embeddings-optimized" \
     -H "Content-Type: application/json" \
     -H "Authorization: ApiKey ${API_KEY}" \
     -d '{
       "mappings": {
         "properties": {
           "content": {
             "type": "semantic_text",
             "inference_id": ".multilingual-e5-small-elasticsearch",
             "index_options": {
               "dense_vector": {
                 "type": "bbq_hnsw"
               }
             }
           }
         }
       }
     }'
		
	

BBQ flat

Use bbq_flat for smaller datasets where you need maximum accuracy at the expense of speed:

						PUT semantic-embeddings-flat
					{
  "mappings": {
    "properties": {
      "content": {
        "type": "semantic_text",
        "inference_id": ".multilingual-e5-small-elasticsearch",
        "index_options": {
          "dense_vector": {
            "type": "bbq_flat"
          }
        }
      }
    }
  }
}
		
	

Reference to a text embedding inference endpoint. This example uses the built-in E5 endpoint, which is automatically available. For custom models, you must create the endpoint first using the Create inference API.
BBQ without HNSW for smaller datasets. Uses brute-force search, so queries are slower but indexing is lighter.

		curl -X PUT "${ELASTICSEARCH_URL}/semantic-embeddings-flat" \
     -H "Content-Type: application/json" \
     -H "Authorization: ApiKey ${API_KEY}" \
     -d '{
       "mappings": {
         "properties": {
           "content": {
             "type": "semantic_text",
             "inference_id": ".multilingual-e5-small-elasticsearch",
             "index_options": {
               "dense_vector": {
                 "type": "bbq_flat"
               }
             }
           }
         }
       }
     }'
		
	

DiskBBQ

For large datasets where RAM is constrained, use bbq_disk (DiskBBQ) to minimize memory usage:

						PUT semantic-embeddings-disk
					{
  "mappings": {
    "properties": {
      "content": {
        "type": "semantic_text",
        "inference_id": ".multilingual-e5-small-elasticsearch",
        "index_options": {
          "dense_vector": {
            "type": "bbq_disk"
          }
        }
      }
    }
  }
}
		
	

Reference to a text embedding inference endpoint. This example uses the built-in E5 endpoint, which is automatically available. For custom models, you must create the endpoint first using the Create inference API.
DiskBBQ keeps vectors compressed on disk, dramatically reducing RAM requirements at the cost of slower queries.

		curl -X PUT "${ELASTICSEARCH_URL}/semantic-embeddings-disk" \
     -H "Content-Type: application/json" \
     -H "Authorization: ApiKey ${API_KEY}" \
     -d '{
       "mappings": {
         "properties": {
           "content": {
             "type": "semantic_text",
             "inference_id": ".multilingual-e5-small-elasticsearch",
             "index_options": {
               "dense_vector": {
                 "type": "bbq_disk"
               }
             }
           }
         }
       }
     }'
		
	

Integer quantization

						PUT semantic-embeddings-int8
					{
  "mappings": {
    "properties": {
      "content": {
        "type": "semantic_text",
        "inference_id": ".multilingual-e5-small-elasticsearch",
        "index_options": {
          "dense_vector": {
            "type": "int8_hnsw"
          }
        }
      }
    }
  }
}
		
	

Reference to a text embedding inference endpoint. This example uses the built-in E5 endpoint, which is automatically available. For custom models, you must create the endpoint first using the Create inference API.
8-bit integer quantization for ~4x memory reduction. For higher compression, use "type": "int4_hnsw" (~8x reduction).

		curl -X PUT "${ELASTICSEARCH_URL}/semantic-embeddings-int8" \
     -H "Content-Type: application/json" \
     -H "Authorization: ApiKey ${API_KEY}" \
     -d '{
       "mappings": {
         "properties": {
           "content": {
             "type": "semantic_text",
             "inference_id": ".multilingual-e5-small-elasticsearch",
             "index_options": {
               "dense_vector": {
                 "type": "int8_hnsw"
               }
             }
           }
         }
       }
     }'
		
	

		{
  "acknowledged": true,
  "shards_acknowledged": true,
  "index": "semantic-embeddings-optimized"
}
		
	

true confirms the index was created successfully with your mapping configuration.

Verify your configuration

Confirm that the index_options are applied to your index:

Console

				GET semantic-embeddings-optimized/_mapping

curl

		curl -X GET "${ELASTICSEARCH_URL}/semantic-embeddings-optimized/_mapping" \
     -H "Authorization: ApiKey ${API_KEY}"

The response includes the index_options you configured under the content field's mapping. If the index_options block is missing, check that you specified it correctly in the PUT request.

		{
  "semantic-embeddings-optimized": {
    "mappings": {
      "properties": {
        "content": {
          "type": "semantic_text",
          "inference_id": ".multilingual-e5-small-elasticsearch",
          "index_options": {
            "dense_vector": {
              "type": "bbq_hnsw"
            }
          }
        }
      }
    }
  }
}
		
	

The index_options block confirms your quantization strategy is applied. After indexing data, the mapping may also include auto-detected model_settings such as dimensions and similarity metric.

(Optional) Tune HNSW parameters

For HNSW-based strategies, you can tune graph parameters like m and ef_construction in the index_options. Refer to the dense_vector field type reference for the full list of tunable parameters.

Console

						PUT semantic-embeddings-custom
					{
  "mappings": {
    "properties": {
      "content": {
        "type": "semantic_text",
        "inference_id": ".multilingual-e5-small-elasticsearch",
        "index_options": {
          "dense_vector": {
            "type": "bbq_hnsw",
            "m": 32,
            "ef_construction": 200
          }
        }
      }
    }
  }
}
		
	

Controls graph connectivity. Higher values improve recall at the cost of memory. Default: 16.
Controls index build quality. Higher values improve quality but slow indexing. Default: 100.

curl

		curl -X PUT "${ELASTICSEARCH_URL}/semantic-embeddings-custom" \
     -H "Content-Type: application/json" \
     -H "Authorization: ApiKey ${API_KEY}" \
     -d '{
       "mappings": {
         "properties": {
           "content": {
             "type": "semantic_text",
             "inference_id": ".multilingual-e5-small-elasticsearch",
             "index_options": {
               "dense_vector": {
                 "type": "bbq_hnsw",
                 "m": 32,
                 "ef_construction": 200
               }
             }
           }
         }
       }
     }'
		
	

Next steps

Follow the Semantic search with semantic_text tutorial to set up an end-to-end semantic search workflow.
Combine semantic search with keyword search using hybrid search.

Optimize dense vector storage for semantic search

Requirements

Choose a quantization strategy

Configure your index mapping

Verify your configuration

(Optional) Tune HNSW parameters

Next steps

Related pages