Loading

Optimize dense vector storage for semantic search

When scaling semantic search, the memory footprint of dense vector embeddings is a primary concern. You can reduce storage requirements by configuring a quantization strategy on your semantic_text fields using the index_options parameter.

This guide walks you through choosing a strategy and applying it to a semantic_text field mapping. For full details on all available quantization options and their parameters, refer to the dense_vector field type reference.

  • You need a semantic_text field that uses an inference endpoint producing dense vector embeddings (such as E5, OpenAI embeddings, or Cohere).
  • If you use a custom model, create the inference endpoint first using the Create inference API.
Note

These index_options do not apply to sparse vector models like ELSER, which use a different internal representation.

Tip

To run the curl examples on this page, set the following environment variables:

export ELASTICSEARCH_URL="your-elasticsearch-url"
export API_KEY="your-api-key"
		

To generate API keys, search for API keys in the global search bar. Learn more about finding your endpoint and credentials.

Select a quantization strategy based on your dataset size and performance requirements:

Strategy Memory reduction Best for Trade-offs
bbq_hnsw Up to 32x Most production use cases (default for 384+ dimensions) Minimal accuracy loss
bbq_flat Up to 32x Smaller datasets needing maximum accuracy Slower queries (brute-force search)
bbq_disk Up to 32x Large datasets with constrained RAM Slower queries (disk-based)
int8_hnsw 4x High accuracy retention Lower compression than BBQ
int4_hnsw 8x Balance between compression and accuracy Some accuracy loss

For most use cases with dense vector embeddings from text models, we recommend Better Binary Quantization (BBQ). BBQ requires a minimum of 64 dimensions and works best with text embeddings.

Create an index with a semantic_text field and set the index_options to your chosen quantization strategy.

				PUT semantic-embeddings-optimized
					{
  "mappings": {
    "properties": {
      "content": {
        "type": "semantic_text",
        "inference_id": ".multilingual-e5-small-elasticsearch",
        "index_options": {
          "dense_vector": {
            "type": "bbq_hnsw"
          }
        }
      }
    }
  }
}
		
  1. Reference to a text embedding inference endpoint. This example uses the built-in E5 endpoint, which is automatically available. For custom models, you must create the endpoint first using the Create inference API.
  2. Uses BBQ with HNSW indexing for up to 32x memory reduction.

Use bbq_flat for smaller datasets where you need maximum accuracy at the expense of speed:

				PUT semantic-embeddings-flat
					{
  "mappings": {
    "properties": {
      "content": {
        "type": "semantic_text",
        "inference_id": ".multilingual-e5-small-elasticsearch",
        "index_options": {
          "dense_vector": {
            "type": "bbq_flat"
          }
        }
      }
    }
  }
}
		
  1. Reference to a text embedding inference endpoint. This example uses the built-in E5 endpoint, which is automatically available. For custom models, you must create the endpoint first using the Create inference API.
  2. BBQ without HNSW for smaller datasets. Uses brute-force search, so queries are slower but indexing is lighter.

For large datasets where RAM is constrained, use bbq_disk (DiskBBQ) to minimize memory usage:

				PUT semantic-embeddings-disk
					{
  "mappings": {
    "properties": {
      "content": {
        "type": "semantic_text",
        "inference_id": ".multilingual-e5-small-elasticsearch",
        "index_options": {
          "dense_vector": {
            "type": "bbq_disk"
          }
        }
      }
    }
  }
}
		
  1. Reference to a text embedding inference endpoint. This example uses the built-in E5 endpoint, which is automatically available. For custom models, you must create the endpoint first using the Create inference API.
  2. DiskBBQ keeps vectors compressed on disk, dramatically reducing RAM requirements at the cost of slower queries.
				PUT semantic-embeddings-int8
					{
  "mappings": {
    "properties": {
      "content": {
        "type": "semantic_text",
        "inference_id": ".multilingual-e5-small-elasticsearch",
        "index_options": {
          "dense_vector": {
            "type": "int8_hnsw"
          }
        }
      }
    }
  }
}
		
  1. Reference to a text embedding inference endpoint. This example uses the built-in E5 endpoint, which is automatically available. For custom models, you must create the endpoint first using the Create inference API.
  2. 8-bit integer quantization for ~4x memory reduction. For higher compression, use "type": "int4_hnsw" (~8x reduction).

Confirm that the index_options are applied to your index:

				GET semantic-embeddings-optimized/_mapping
		
curl -X GET "${ELASTICSEARCH_URL}/semantic-embeddings-optimized/_mapping" \
     -H "Authorization: ApiKey ${API_KEY}"
		

The response includes the index_options you configured under the content field's mapping. If the index_options block is missing, check that you specified it correctly in the PUT request.

For HNSW-based strategies, you can tune graph parameters like m and ef_construction in the index_options. Refer to the dense_vector field type reference for the full list of tunable parameters.

				PUT semantic-embeddings-custom
					{
  "mappings": {
    "properties": {
      "content": {
        "type": "semantic_text",
        "inference_id": ".multilingual-e5-small-elasticsearch",
        "index_options": {
          "dense_vector": {
            "type": "bbq_hnsw",
            "m": 32,
            "ef_construction": 200
          }
        }
      }
    }
  }
}
		
  1. Controls graph connectivity. Higher values improve recall at the cost of memory. Default: 16.
  2. Controls index build quality. Higher values improve quality but slow indexing. Default: 100.
curl -X PUT "${ELASTICSEARCH_URL}/semantic-embeddings-custom" \
     -H "Content-Type: application/json" \
     -H "Authorization: ApiKey ${API_KEY}" \
     -d '{
       "mappings": {
         "properties": {
           "content": {
             "type": "semantic_text",
             "inference_id": ".multilingual-e5-small-elasticsearch",
             "index_options": {
               "dense_vector": {
                 "type": "bbq_hnsw",
                 "m": 32,
                 "ef_construction": 200
               }
             }
           }
         }
       }
     }'