Semantic search with `semantic_text`

This tutorial walks you through setting up semantic search using the semantic_text field type. By the end, you will be able to:

Create an index mapping with a semantic_text field
Ingest documents that are automatically converted to vector embeddings
Query your data using semantic search with both Query DSL and ES|QL

The semantic_text field type simplifies the inference workflow by providing inference at ingestion time with sensible defaults. You don’t need to define model-related settings and parameters, or create inference ingest pipelines.

We recommend using the semantic_text workflow for semantic search in the Elastic Stack. When you need more control over indexing and query settings, you can use the complete inference workflow instead (refer to Semantic search with the Inference API for details).

This tutorial uses the Elastic Inference Service (EIS), but you can use any service and model supported by the Inference API.

Requirements

This tutorial uses the Elastic Inference Service (EIS), which is automatically enabled on Elastic Cloud Hosted deployments and Serverless projects.

Note

You can also use EIS for self-managed clusters.

To use the semantic_text field type with an inference service other than Elastic Inference Service, you must create an inference endpoint using the Create inference API.

Tip

To run the curl examples in this tutorial, set the following environment variables:

		export ELASTICSEARCH_URL="your-elasticsearch-url"
export API_KEY="your-api-key"

To generate API keys, search for API keys in the global search bar. Learn more about finding your endpoint and credentials.

Create the index mapping

Create a destination index with a semantic_text field. This field stores the vector embeddings that the inference endpoint generates from your input text.

You can run inference either using the Elastic Inference Service or on your own ML-nodes. The following examples show you both scenarios.

Using EIS

						PUT semantic-embeddings
					{
  "mappings": {
    "properties": {
      "content": {
        "type": "semantic_text"
      }
    }
  }
}
		
	

The name of the field to contain the generated embeddings.
The field to contain the embeddings is a semantic_text field. Since no inference_id is provided, the default inference endpoint is used.

Using ML-nodes

						PUT semantic-embeddings
					{
  "mappings": {
    "properties": {
      "content": {
        "type": "semantic_text",
        "inference_id": ".elser-2-elasticsearch"
      }
    }
  }
}
		
	

The name of the field to contain the generated embeddings.
The field to contain the embeddings is a semantic_text field.
The .elser-2-elasticsearch preconfigured inference endpoint for the elasticsearch service is used. To use a different inference service, you must create an inference endpoint first using the Create inference API and then specify it in the semantic_text field mapping using the inference_id parameter.

Using curl (EIS)

		curl -X PUT "${ELASTICSEARCH_URL}/semantic-embeddings" \
     -H "Content-Type: application/json" \
     -H "Authorization: ApiKey ${API_KEY}" \
     -d '{
       "mappings": {
         "properties": {
           "content": {
             "type": "semantic_text"
           }
         }
       }
     }'
		
	

The name of the field to contain the generated embeddings.
The field to contain the embeddings is a semantic_text field. Since no inference_id is provided, the default inference endpoint is used.

Note

Relying on the default inference endpoint is convenient for getting started, but for production environments we recommend explicitly specifying the inference_id. The default endpoint can change across versions and deployment types, which can lead to indices with mixed embedding models and cause ranking issues in multi-index searches. For details, refer to Potential issues when mixing embedding models across indices.

Note

For large-scale deployments using dense vector embeddings, you can significantly reduce memory usage by configuring quantization strategies like BBQ. For advanced configuration, refer to Optimizing vector storage.

Note

If you're using web crawlers or connectors to generate indices, you have to update the index mappings for these indices to include the semantic_text field. Once the mapping is updated, you'll need to run a full web crawl or a full connector sync. This ensures that all existing documents are reprocessed and updated with the new semantic embeddings, enabling semantic search on the updated data.

Ingest data

With your index mapping in place, you can add some data. When you index a document, Elasticsearch automatically sends the semantic_text field's contents to the configured inference endpoint, generates vector embeddings, and stores them in the document.

Use the _bulk API to ingest a few sample documents:

Console

						POST _bulk
					{ "index": { "_index": "semantic-embeddings", "_id": "1" } }
{ "content": "After running, cool down with light cardio for a few minutes to lower your heart rate and reduce muscle soreness." }
{ "index": { "_index": "semantic-embeddings", "_id": "2" } }
{ "content": "Marathon plans stress weekly mileage; carb loading before a race does not replace recovery between hard sessions." }
{ "index": { "_index": "semantic-embeddings", "_id": "3" } }
{ "content": "Tune cluster performance by monitoring thread pools and refresh interval." }
		
	

curl

		curl -X POST "${ELASTICSEARCH_URL}/_bulk" \
     -H "Content-Type: application/x-ndjson" \
     -H "Authorization: ApiKey ${API_KEY}" \
     --data-binary @- << 'EOF'
{ "index": { "_index": "semantic-embeddings", "_id": "1" } }
{ "content": "After running, cool down with light cardio for a few minutes to lower your heart rate and reduce muscle soreness." }
{ "index": { "_index": "semantic-embeddings", "_id": "2" } }
{ "content": "Marathon plans stress weekly mileage; carb loading before a race does not replace recovery between hard sessions." }
{ "index": { "_index": "semantic-embeddings", "_id": "3" } }
{ "content": "Tune cluster performance by monitoring thread pools and refresh interval." }
EOF
		
	

						
					{
  "errors": false,
  "took": 400,
  "items": [
    {
      "index": {
        "_index": "semantic-embeddings",
        "_id": "1",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 2,
          "failed": 0
        },
        "_seq_no": 0,
        "_primary_term": 1,
        "status": 201
      }
    },
    {
      "index": {
        "_index": "semantic-embeddings",
        "_id": "2",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 2,
          "failed": 0
        },
        "_seq_no": 1,
        "_primary_term": 1,
        "status": 201
      }
    },
    {
      "index": {
        "_index": "semantic-embeddings",
        "_id": "3",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 2,
          "failed": 0
        },
        "_seq_no": 2,
        "_primary_term": 1,
        "status": 201
      }
    }
  ]
}
		
	

false indicates all indexing operations completed without errors.
Each document was successfully created. The semantic_text field contents are automatically sent to the configured inference endpoint for embedding generation.

If you see errors, check that your index mapping and inference endpoint are configured correctly.

Run a semantic search query

With your data ingested and automatically embedded, you can query it using semantic search. You can use Query DSL or ES|QL syntax.

Query DSL

The Query DSL approach uses the match query type with the semantic_text field:

						GET semantic-embeddings/_search
					{
  "query": {
    "match": {
      "content": {
        "query": "What causes muscle soreness after running?"
      }
    }
  }
}
		
	

The semantic_text field on which you want to perform the search.
The query text.

ES|QL

The ES|QL approach uses the match (:) operator, which automatically detects the semantic_text field and performs the search on it. The query uses METADATA _score to sort by _score in descending order.

						POST /_query?format=txt
					{
  "query": """
    FROM semantic-embeddings METADATA _score
    | WHERE content: "How to avoid muscle soreness while running?"
    | SORT _score DESC
    | LIMIT 1000
  """
}
		
	

The METADATA _score clause returns the relevance score of each document.
The match (:) operator detects the semantic_text field and performs semantic search on content.
Sorts by descending score to display the most relevant results first.
Limits the results to 1000 documents.

Query DSL (curl)

		curl -X GET "${ELASTICSEARCH_URL}/semantic-embeddings/_search" \
     -H "Content-Type: application/json" \
     -H "Authorization: ApiKey ${API_KEY}" \
     -d '{
       "query": {
         "match": {
           "content": {
             "query": "What causes muscle soreness after running?"
           }
         }
       }
     }'
		
	

ES|QL (curl)

		curl -X POST "${ELASTICSEARCH_URL}/_query?format=txt" \
     -H "Content-Type: application/json" \
     -H "Authorization: ApiKey ${API_KEY}" \
     -d '{
       "query": "FROM semantic-embeddings METADATA _score | WHERE content: \"How to avoid muscle soreness while running?\" | SORT _score DESC | LIMIT 1000"
     }'
		
	

Both queries return the documents ranked by semantic relevance. The documents about running and muscle soreness score highest because they are semantically closest to the query, while the document about cluster performance scores lower.

						
					{
  "took": 87,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 21.098728,
    "hits": [
      {
        "_index": "semantic-embeddings2",
        "_id": "1",
        "_score": 21.098728,
        "_source": {
          "content": "After running, cool down with light cardio for a few minutes to lower your heart rate and reduce muscle soreness."
        }
      },
      {
        "_index": "semantic-embeddings2",
        "_id": "2",
        "_score": 8.030467,
        "_source": {
          "content": "Marathon plans stress weekly mileage; carb loading before a race does not replace recovery between hard sessions."
        }
      }
    ]
  }
}
		
	

Documents are ranked by _score. Higher scores indicate stronger semantic relevance to the query.
The _source contains the original document text. Embeddings are stored internally and excluded from the response by default.

		                                                     content                                                     |      _score
-----------------------------------------------------------------------------------------------------------------+------------------
After running, cool down with light cardio for a few minutes to lower your heart rate and reduce muscle soreness.|26.408897399902344
Marathon plans stress weekly mileage; carb loading before a race does not replace recovery between hard sessions.|11.229613304138184
Tune cluster performance by monitoring thread pools and refresh interval.                                        |0.3044795095920563                                            |  1.235689
		
	

For an overview of all query types supported by semantic_text fields and guidance on when to use them, refer to Querying semantic_text fields.
If you want to use semantic_text in hybrid search, refer to this notebook for a step-by-step guide.
For more information on how to optimize your ELSER endpoints, refer to the ELSER recommendations section in the model documentation.
To learn more about model autoscaling, refer to the trained model autoscaling page.
To learn how to optimize storage and search performance when using dense vector embeddings, refer to Optimizing vector storage.

Semantic search with semantic_text

Requirements

Create the index mapping

Ingest data

Run a semantic search query

Related pages

Semantic search with `semantic_text`