﻿---
title: Elastic Inference Service
description: Use Elastic Inference Service (EIS) to run inference for search, embeddings, and chat without deploying models in your environment.
url: https://www.elastic.co/docs/explore-analyze/elastic-inference/eis
products:
  - Elastic Cloud Enterprise
  - Elastic Cloud on Kubernetes
  - Elastic Stack
applies_to:
  - Elastic Cloud Serverless: Generally available
  - Elastic Stack: Generally available
---

# Elastic Inference Service
Elastic Inference Service (EIS) enables you to leverage AI-powered search as a service without deploying a model in your environment.
With EIS, you don't need to manage the infrastructure and resources required for machine learning inference by adding, configuring, and scaling machine learning nodes.
Instead, you can use machine learning models for ingest, search, and chat independently of your Elasticsearch infrastructure.
<applies-to>Elastic Stack: Generally available since 9.3</applies-to> You can use EIS with your [self-managed](https://www.elastic.co/docs/deploy-manage/deploy/self-managed) cluster through Cloud Connect. For details, refer to [EIS for self-managed clusters](https://www.elastic.co/docs/explore-analyze/elastic-inference/connect-self-managed-cluster-to-eis).

## AI features powered by EIS

- Your Elastic deployment or project comes with [Elastic Managed LLMs](https://www.elastic.co/docs/reference/kibana/connectors-kibana/elastic-managed-llm) by default. These can be used in Agent Builder, the AI Assistant, Attack Discovery, Automatic Import and Search Playground. For the list of available models, refer to the documentation.
- You can use [ELSER](https://www.elastic.co/docs/explore-analyze/machine-learning/nlp/ml-nlp-elser) to perform semantic search as a service (ELSER on EIS). <applies-to>Elastic Stack: Generally available since 9.2, Elastic Stack: Preview in 9.1</applies-to> <applies-to>Elastic Cloud Serverless: Generally available</applies-to>
- You can use the [`jina-embeddings-v3`](/docs/explore-analyze/machine-learning/nlp/ml-nlp-jina#jina-embeddings-v3) multilingual dense vector embedding model to perform semantic search through the Elastic Inference Service. <applies-to>Elastic Stack: Preview since 9.3</applies-to> <applies-to>Elastic Cloud Serverless: Preview</applies-to>


## Supported models

This table lists the models supported by Elastic Inference Service.
<note>
  The **Inference Regions** column shows the regions where inference requests are processed and where data is sent.
</note>

**Scroll horizontally to view more information.**

| Author    | Name                     | ID                                 | Model Card                                                                                                                  | Provider Terms                                                                                                                                                                                                          | Input Modalities | Output Modalities | EOL Date   | Data Retention Period (Days) | Data Used To Train Models? | Inference Regions | Release Status      | Stack Version |
|-----------|--------------------------|------------------------------------|-----------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------|-------------------|------------|------------------------------|----------------------------|-------------------|---------------------|---------------|
| Jina      | Embeddings v3            | jina-embeddings-v3                 | [jina-embeddings-v3](https://jina.ai/models/jina-embeddings-v3/)                                                            | [Elastic Terms](https://www.elastic.co/legal/terms-of-use)                                                                                                                                                              | Text             | Embedding         |            | 0                            | No                         | US                | Generally Available | 9.3           |
| Google    | Gemini Embedding 001     | google-gemini-embedding-001        | [Gemini Embedding 001](https://deepmind.google/research/publications/157741/)                                               | [Google terms](https://cloud.google.com/terms)                                                                                                                                                                          | Text             | Text              |            | 55 days                      | No                         | US                | Generally Available | 9.3           |
| OpenAI    | Text Embedding 003 Large | openai-text-embedding-3-large      | [Text Embedding 003 Large](https://platform.openai.com/docs/models/text-embedding-3-large)                                  | [OpenAI terms](https://openai.com/en-GB/policies/row-terms-of-use/)                                                                                                                                                     | Text             | Text              |            | Unknown                      | No                         | US                | Generally Available | 9.3           |
| OpenAI    | Text Embedding 003 Small | openai-text-embedding-3-small      | [Text Embedding 003 Small](https://platform.openai.com/docs/models/text-embedding-3-small)                                  | [OpenAI terms](https://openai.com/en-GB/policies/row-terms-of-use/)                                                                                                                                                     | Text             | Text              |            | Unknown                      | No                         | US                | Generally Available | 9.3           |
| Elastic   | ELSER v2                 | elser_model_2                      | [ELSER docs](https://www.elastic.co/docs/explore-analyze/machine-learning/nlp/ml-nlp-elser)                                 | [Elastic Terms](https://www.elastic.co/legal/terms-of-use)                                                                                                                                                              | Text             | Embedding         |            | 0                            | No                         | US                | Generally Available | 9.1           |
| Anthropic | Claude Sonnet 3.7        | anthropic-claude-3.7-sonnet        | [Claude 3.7 Sonnet System Card](https://assets.anthropic.com/m/785e231869ea8b3b/original/claude-3-7-sonnet-system-card.pdf) | [AWS terms](https://aws.amazon.com/service-terms/)                                                                                                                                                                      | Text             | Text              | 2026-04-28 | 0                            | No                         | US                | Generally Available | 9.2           |
| Anthropic | Claude Sonnet 4.5        | anthropic-claude-4.5-sonnet        | [Claude Sonnet 4.5 System Card](https://assets.anthropic.com/m/12f214efcc2f457a/original/Claude-Sonnet-4-5-System-Card.pdf) | [AWS terms](https://aws.amazon.com/service-terms/)                                                                                                                                                                      | Text             | Text              | 2026-09-29 | 0                            | No                         | US                | Generally Available | 9.2           |
| Google    | Gemini 2.5 Flash         | google-gemini-2.5-flash            | [Google Gemini 2.5 Flash](https://modelcards.withgoogle.com/assets/documents/gemini-2-flash.pdf)                            | [Google terms](https://cloud.google.com/terms)                                                                                                                                                                          | Text             | Text              | 2027-06-17 | 0                            | No                         | US                | Generally Available | 9.3           |
| Google    | Gemini 2.5 Pro           | google-gemini-2.5-pro              | [Google Gemini 2.5 Pro](https://modelcards.withgoogle.com/assets/documents/gemini-2.5-pro.pdf)                              | [Google terms](https://cloud.google.com/terms)                                                                                                                                                                          | Text             | Text              | 2027-06-17 | 0                            | No                         | US                | Generally Available | 9.3           |
| OpenAI    | GPT-4.1                  | openai-gpt-4.1                     | [OpenAI GPT 4.1](https://platform.openai.com/docs/models/gpt-4.1)                                                           | [Microsoft Terms](https://azure.microsoft.com/en-gb/support/legal/)                                                                                                                                                     | Text             | Text              | 2027-04-11 | 0                            | No                         | US                | Generally Available | 9.3           |
| OpenAI    | GPT-4.1 Mini             | openai-gpt-4.1-mini                | [OpenAI GPT 4.1 Mini](https://platform.openai.com/docs/models/gpt-4.1-mini)                                                 | [Microsoft Terms](https://azure.microsoft.com/en-gb/support/legal/)                                                                                                                                                     | Text             | Text              | 2027-04-11 | 0                            | No                         | US                | Generally Available | 9.3           |
| OpenAI    | GPT-5.2                  | openai-gpt-5.2                     | [OpenAI GPT 5.2](https://platform.openai.com/docs/models/gpt-5.2)                                                           | [Microsoft Terms](https://azure.microsoft.com/en-gb/support/legal/)                                                                                                                                                     | Text             | Text              | 2027-05-12 | 0                            | No                         | US                | Generally Available | 9.3           |
| OpenAI    | GPT-OSS 120B             | openai-gpt-oss-120b                | [OpenAI GPT-OSS-120B](http://gpt-oss-120b/)                                                                                 | [Google Terms](https://cloud.google.com/terms)<br>[Together AI Terms](https://www.together.ai/terms-of-service)<br>[DeepInfra Terms](https://deepinfra.com/terms)<br>[AWS Terms](https://aws.amazon.com/service-terms/) | Text             | Text              |            | 0                            | No                         | US                | Generally Available | 9.3           |
| Jina      | Reranker v2              | jina-reranker-v2-base-multilingual | [jina-reranker-v2-base-multilingual](https://jina.ai/models/jina-reranker-v2-base-multilingual/)                            | [Elastic Terms](https://www.elastic.co/legal/terms-of-use)                                                                                                                                                              | Text             | Text              |            | 0                            | No                         | US                | Generally Available | 9.3           |
| Jina      | Reranker v3              | jina-reranker-v3                   | [jina-reranker-v3](https://jina.ai/models/jina-reranker-v3/)                                                                | [Elastic Terms](https://www.elastic.co/legal/terms-of-use)                                                                                                                                                              | Text             | Text              |            | 0                            | No                         | US                | Generally Available | 9.3           |

<important>
  - The applicable terms of use, uptime, and performance for each of the AI models available with EIS are each described in the applicable AI model's Provider Terms and Model Card.
  - Prior to using the AI model with EIS, Customers are responsible for reviewing and agreeing to the chosen AI model's Provider Terms to understand the availability and data practices of the AI model's provider.
</important>


## Region and hosting

Elastic Inference Service is currently available in a single region: AWS `us-east-1`. All inference requests sent through EIS are routed to this region, regardless of where your Elasticsearch deployment or Serverless project is hosted.
Depending on the model being used, request processing may involve Elastic inference infrastructure and, in some cases, trusted third-party model providers. For example, ELSER requests are processed entirely within Elastic inference infrastructure in AWS `us-east-1`. Other models, such as large language models or third-party embedding models, may involve additional processing by their respective model providers, which can operate in different cloud platforms or regions.

## Rate limits

The service enforces rate limits on an ongoing basis. Exceeding a limit results in HTTP 429 responses from the server until the sliding window moves on further and parts of the limit resets.

| Model                                                                                      | Request/minute | Tokens/minute (ingest) | Tokens/minute (search) | Notes                                                                                                   |
|--------------------------------------------------------------------------------------------|----------------|------------------------|------------------------|---------------------------------------------------------------------------------------------------------|
| Elastic Managed LLMs <applies-to>Elastic Stack: Generally available since 9.3</applies-to> | 2000           | -                      | -                      | No rate limit on tokens                                                                                 |
| ELSER <applies-to>Elastic Stack: Generally available since 9.0</applies-to>                | 6,000          | 6,000,000              | 600,000                | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first. |
| Jina Embeddings v3 <applies-to>Elastic Stack: Generally available since 9.3</applies-to>   | 6,000          | 6,000,000              | 600,000                | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first. |
| Jina Reranker v2 <applies-to>Elastic Stack: Generally available since 9.3</applies-to>     | 50             | -                      | 500,000                | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first. |
| Jina Reranker v3 <applies-to>Elastic Stack: Generally available since 9.3</applies-to>     | 50             | -                      | 500,000                | Limits are applied to both requests per minute and tokens per minute, whichever limit is reached first. |


## Pricing

All models on EIS incur a charge per million tokens. The pricing details are available on our [Pricing page](https://www.elastic.co/pricing/serverless-search).
This pricing model differs from the existing [Machine Learning Nodes](https://www.elastic.co/docs/explore-analyze/machine-learning/data-frame-analytics/ml-trained-models), which is billed through VCUs consumed.

### Token-based billing

EIS is billed per million tokens used:
- For **chat** models, input and output tokens are billed. Longer conversations with extensive context or detailed responses will consume more tokens.
- For **embeddings** models, only input tokens are billed.

Tokens are the fundamental units that language models process for both input and output. Tokenizers convert text into numerical data by segmenting it into subword units. A token can be a complete word, part of a word, or a punctuation mark, depending on the model's trained tokenizer and the frequency patterns in its training data.
For example, the sentence `It was the best of times, it was the worst of times.` contains 52 characters but would tokenize into approximately 14 tokens with a typical word-based approach, though the exact count varies by tokenizer.

### Monitor your token usage

To track your token consumption:
1. Navigate to [**Billing and subscriptions > Usage**](https://cloud.elastic.co/billing/usage) in the Elastic Cloud Console.
2. Look for line items where the **Billing dimension** is set to "Inference".


## Usecases


### ELSER through Elastic Inference Service (ELSER on EIS)

<applies-to>
  - Elastic Cloud Serverless: Generally available
  - Elastic Stack: Generally available since 9.2
  - Elastic Stack: Preview in 9.1
</applies-to>

ELSER on EIS enables you to use the ELSER model on GPUs, without having to manage your own ML nodes. We expect better performance for ingest throughput than ML nodes and equivalent performance for search latency. We will continue to benchmark, remove limitations and address concerns.

#### Using the ELSER on EIS endpoint

You can now use `semantic_text` with the new ELSER endpoint on EIS. To learn how to use the `.elser-2-elastic` inference endpoint, refer to [Using ELSER on EIS](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text-setup-configuration).

##### Get started with semantic search with ELSER on EIS

[Semantic Search with `semantic_text`](https://www.elastic.co/docs/solutions/search/semantic-search/semantic-search-semantic-text) has a detailed tutorial on using the `semantic_text` field and using the ELSER endpoint on EIS instead of the default endpoint. This is a great way to get started and try the new endpoint.

### `jina-embeddings-v3` on EIS

<applies-to>
  - Elastic Cloud Serverless: Preview
  - Elastic Stack: Preview since 9.3
</applies-to>

You can use the `jina-embeddings-v3` model through Elastic Inference Service. Running the model on EIS means that you use the model on GPUs, without the need of managing infrastructure and model resources.

#### Get started with `jina-embeddings-v3` on EIS

Create an inference endpoint that references the `jina-embeddings-v3` model in the `model_id` field.
```json

{
  "service": "elastic",
  "service_settings": {
    "model_id": "jina-embeddings-v3"
  }
}
```

The created inference endpoint uses the model for inference operations on the Elastic Inference Service. You can reference the `inference_id` of the endpoint in index mappings for the [`semantic_text`](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text) field type, text_embedding inference tasks, or search queries.