Perform chat completion inference on the service | Elasticsearch API documentation

Perform chat completion inference on the service Generally available; Added in 8.18.0

POST /_inference/chat_completion/{inference_id}/_stream

The chat completion inference API enables real-time responses for chat completion tasks by delivering answers incrementally, reducing response times during computation. It only works with the chat_completion task type.

NOTE: The chat_completion task type is only available within the _stream API and only supports streaming. The Chat completion inference API and the Stream inference API differ in their response structure and capabilities. The Chat completion inference API provides more comprehensive customization options through more fields and function calling support. To determine whether a given inference service supports this task type, please see the page for that service.

Path parameters

inference_id string Required

The inference Id

Query parameters

timeout string

Specifies the amount of time to wait for the inference request to complete.

External documentation

application/json

Body Required

messages array[object] Required

A list of objects representing the conversation. Requests should generally only add new messages from the user (role user). The other message roles (assistant, system, or tool) should generally only be copied from the response to a previous completion request, such that the messages array is built up throughout a conversation.
Hide messages attributes Show messages attributes object
An object representing part of the conversation.
- content string | array[object]
  
  The content of the message.
  
  String example:
  
  { "content": "Some string" }
  
  Text example:
  
  { "content": [ { "text": "Some text", "type": "text" } ] }
  
  Image example:
  
  { "content": [ { "image_url": { "url": "data:image/jpeg;base64,..." }, "type": "image_url" } ] }
  
  File example:
  
  { "content": [ { "file": { "file_data": "data:application/pdf;base64,...", "filename": "somePDF" }, "type": "file" } ] }
  
  One of:
  string-1 string array-2 array[object]
  
  An object style representation of a single portion of a conversation.
- role string Required
  
  The role of the message author. Valid values are user, assistant, system, and tool.
- tool_call_id string
  
  Only for tool role messages. The tool call that this message is responding to.
- tool_calls array[object]
  Only for assistant role messages. The tool calls generated by the model. If it's specified, the content field is optional. Example:
  
  { "tool_calls": [ { "id": "call_KcAjWtAww20AihPHphUh46Gd", "type": "function", "function": { "name": "get_current_weather", "arguments": "{\"location\":\"Boston, MA\"}" } } ] }
  Hide tool_calls attributes Show tool_calls attributes object
  
  A tool call generated by the model.
  
  id string Required
  
  The identifier of the tool call.
  
  function object Required
  
  The function that the model called.
  
  Hide function attributes Show function attributes object
  
  arguments string Required
  
  The arguments to call the function with in JSON format.
  
  name string Required
  
  The name of the function to call.
  
  type string Required
  
  The type of the tool call.
- reasoning string
  
  Only for assistant role messages. The reasoning details generated by the model as plaintext. Currently supported only for elastic provider.
- reasoning_details array[object]
  
  Only for assistant role messages. The reasoning details generated by the model as structured data. Currently supported only for elastic provider.
  Type representing the different types of reasoning details that can be included in the response from the model. Currently supported only for elastic provider.
  
  One of:
  object-2 object-2 object-2
model string

The ID of the model to use. By default, the model ID is set to the value included when creating the inference endpoint.
max_completion_tokens number

The upper bound limit for the number of tokens that can be generated for a completion request.
reasoning object
The reasoning configuration for the completion request. This controls the model's reasoning process in one of two ways:
- By specifying the model’s reasoning effort level with the effort field.
- By enabling reasoning with default settings by setting enabled field to true.
It also includes optional settings to control:
- The level of detail in the summary returned in the response with the summary field.
- Whether reasoning details are included in the response at all with the exclude field.
Example (effort):
```
{
   "reasoning": {
       "effort": "high",
       "summary": "concise",
       "exclude": false
   }
}
```
Example (enabled):
```
{
   "reasoning": {
       "enabled": true,
       "summary": "concise",
       "exclude": false
   }
}
```
Currently supported only for elastic provider.
Hide reasoning attributes Show reasoning attributes object
- effort string
  
  The level of effort the model should put into reasoning. This is a hint that guides the model in how much effort to put into reasoning, with xhigh being the most effort and none being no effort.
  
  Values are xhigh, high, medium, low, minimal, or none.
- enabled boolean
  
  Whether to enable reasoning with default settings. This is a shortcut for enabling reasoning without having to specify the other parameters. If enabled is set to true, then reasoning at the medium effort level is enabled. Ignored if effort is specified, in which case that parameter will control the reasoning process instead.
- exclude boolean
  
  Whether to exclude reasoning information from the response. If true, the response will not include any reasoning details.
- summary string
  
  The level of detail included in the reasoning summary returned in the response. This is a hint on how much detail to include in the summary of the reasoning that is returned in the response, with auto being the default level of detail, concise being less detail, and detailed being more detail.
  
  Values are auto, concise, or detailed.
stop array[string]

A sequence of strings to control when the model should stop generating additional tokens.
temperature number

The sampling temperature to use.
tool_choice string | object
Controls which tool is called by the model. String representation: One of auto, none, or requrired. auto allows the model to choose between calling tools and generating a message. none causes the model to not call any tools. required forces the model to call one or more tools. Example (object representation):
```
{
  "tool_choice": {
      "type": "function",
      "function": {
          "name": "get_current_weather"
      }
  }
}
```
One of:
string-1 string CompletionToolChoice object
Controls which tool is called by the model.
Hide attributes Show attributes

type string Required

The type of the tool.

function object Required

The tool choice function.

Hide function attribute Show function attribute object

name string Required

The name of the function to call.
tools array[object]
A list of tools that the model can call. Example:
```
{
  "tools": [
      {
          "type": "function",
          "function": {
              "name": "get_price_of_item",
              "description": "Get the current price of an item",
              "parameters": {
                  "type": "object",
                  "properties": {
                      "item": {
                          "id": "12345"
                      },
                      "unit": {
                          "type": "currency"
                      }
                  }
              }
          }
      }
  ]
}
```
Hide tools attributes Show tools attributes object
A list of tools that the model can call.
- type string Required
  
  The type of tool.
- function object Required
  
  The function definition.
  Hide function attributes Show function attributes object
  
  description string
  
  A description of what the function does. This is used by the model to choose when and how to call the function.
  
  name string Required
  
  The name of the function.
  
  parameters object
  
  The parameters the functional accepts. This should be formatted as a JSON object.
  
  strict boolean
  
  Whether to enable schema adherence when generating the function call.
top_p number

Nucleus sampling, an alternative to sampling with temperature.

Responses

200 application/json

POST /_inference/chat_completion/{inference_id}/_stream

POST _inference/chat_completion/openai-completion/_stream
{
  "model": "gpt-4o",
  "messages": [
      {
          "role": "user",
          "content": "What is Elastic?"
      }
  ]
}

resp = client.inference.chat_completion_unified(
    inference_id="openai-completion",
    chat_completion_request={
        "model": "gpt-4o",
        "messages": [
            {
                "role": "user",
                "content": "What is Elastic?"
            }
        ]
    },
)

const response = await client.inference.chatCompletionUnified({
  inference_id: "openai-completion",
  chat_completion_request: {
    model: "gpt-4o",
    messages: [
      {
        role: "user",
        content: "What is Elastic?",
      },
    ],
  },
});

response = client.inference.chat_completion_unified(
  inference_id: "openai-completion",
  body: {
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "What is Elastic?"
      }
    ]
  }
)

$resp = $client->inference()->chatCompletionUnified([
    "inference_id" => "openai-completion",
    "body" => [
        "model" => "gpt-4o",
        "messages" => array(
            [
                "role" => "user",
                "content" => "What is Elastic?",
            ],
        ),
    ],
]);

curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"model":"gpt-4o","messages":[{"role":"user","content":"What is Elastic?"}]}' "$ELASTICSEARCH_URL/_inference/chat_completion/openai-completion/_stream"

client.inference().chatCompletionUnified(c -> c
    .inferenceId("openai-completion")
    .chatCompletionRequest(ch -> ch
        .messages(m -> m
            .content(co -> co
                .string("What is Elastic?")
            )
            .role("user")
            .toolCalls(List.of())
            .reasoningDetails(List.of())
        )
        .model("gpt-4o")
        .stop(List.of())
        .tools(List.of())
    )
);

Request examples

Run `POST _inference/chat_completion/openai-completion/_stream` to perform a chat completion on the example question with streaming.

{
  "model": "gpt-4o",
  "messages": [
      {
          "role": "user",
          "content": "What is Elastic?"
      }
  ]
}

Run `POST _inference/chat_completion/openai-completion/_stream` to perform a chat completion using an Assistant message with `tool_calls`.

{
  "messages": [
      {
          "role": "assistant",
          "content": "Let's find out what the weather is",
          "tool_calls": [ 
              {
                  "id": "call_KcAjWtAww20AihPHphUh46Gd",
                  "type": "function",
                  "function": {
                      "name": "get_current_weather",
                      "arguments": "{\"location\":\"Boston, MA\"}"
                  }
              }
          ]
      },
      { 
          "role": "tool",
          "content": "The weather is cold",
          "tool_call_id": "call_KcAjWtAww20AihPHphUh46Gd"
      }
  ]
}

Run `POST _inference/chat_completion/openai-completion/_stream` to perform a chat completion using a User message with `tools` and `tool_choice`.

{
  "messages": [
      {
          "role": "user",
          "content": [
              {
                  "type": "text",
                  "text": "What's the price of a scarf?"
              }
          ]
      }
  ],
  "tools": [
      {
          "type": "function",
          "function": {
              "name": "get_current_price",
              "description": "Get the current price of a item",
              "parameters": {
                  "type": "object",
                  "properties": {
                      "item": {
                          "id": "123"
                      }
                  }
              }
          }
      }
  ],
  "tool_choice": {
      "type": "function",
      "function": {
          "name": "get_current_price"
      }
  }
}

Run `POST _inference/chat_completion/reasoning-chat-completion/_stream` to perform a chat completion task, using both `effort` parameter based reasoning configuration and including reasoning generated by the model on previous step.

{
  "messages": [{
      "role": "user",
      "content": [{
          "type": "text",
          "text": "Barber shaves all those, who do not shave themselves. Who shaves the barber?"
        }
      ]
    }, {
      "role": "assistant",
      "content": [{
          "type": "text",
          "text": "This is the barber paradox. Such a barber cannot logically exist."
        }
      ],
      "reasoning": "If the barber shaves himself, he should not; if he does not, he should.",
      "reasoning_details": [{
          "type": "reasoning.encrypted",
          "data": "[REDACTED]"
        }, {
          "type": "reasoning.summary",
          "summary": "Barber shaving himself creates contradiction"
        }, {
          "type": "reasoning.text",
          "text": "If the barber shaves himself, he should not; if he does not, he should.",
          "signature": "sig_123"
        }
      ]
    }, {
      "role": "user",
      "content": [{
          "type": "text",
          "text": "What if there are 2 barbers?"
        }
      ]
    }
  ],
  "reasoning": {
    "effort": "high",
    "summary": "detailed",
    "exclude": false
  }
}

Run `POST _inference/chat_completion/reasoning-chat-completion/_stream` to perform a chat completion task, using both `enabled` parameter based reasoning configuration and including reasoning generated by the model on previous step.

{
  "messages": [{
      "role": "user",
      "content": [{
          "type": "text",
          "text": "Barber shaves all those, who do not shave themselves. Who shaves the barber?"
        }
      ]
    }, {
      "role": "assistant",
      "content": [{
          "type": "text",
          "text": "This is the barber paradox. Such a barber cannot logically exist."
        }
      ],
      "reasoning": "If the barber shaves himself, he should not; if he does not, he should.",
      "reasoning_details": [{
          "type": "reasoning.encrypted",
          "data": "[REDACTED]"
        }, {
          "type": "reasoning.summary",
          "summary": "Barber shaving himself creates contradiction"
        }, {
          "type": "reasoning.text",
          "text": "If the barber shaves himself, he should not; if he does not, he should.",
          "signature": "sig_123"
        }
      ]
    }, {
      "role": "user",
      "content": [{
          "type": "text",
          "text": "What if there are 2 barbers?"
        }
      ]
    }
  ],
  "reasoning": {
    "enabled": true,
    "summary": "detailed",
    "exclude": false
  }
}

Response examples (200)

A successful response when performing a chat completion task using a User message with `tools` and `tool_choice`.

event: message
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[{"delta":{"content":"","role":"assistant"},"index":0}],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk"}}

event: message
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[{"delta":{"content":Elastic"},"index":0}],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk"}}

event: message
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[{"delta":{"content":" is"},"index":0}],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk"}}

(...)

event: message
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk","usage":{"completion_tokens":28,"prompt_tokens":16,"total_tokens":44}}} 

event: message
data: [DONE]

A successful response when performing a chat completion task with response-level reasoning data.

event: message
data: {"chat_completion":{"id":"chatcmpl-910TWsy2VPnSfBbv5UztnSdYUJA10","choices":[{"delta":{"content":"With two barbers, the paradox disappears.","role":"assistant"},"index":0,"reasoning":"The contradiction only occurs when a barber must determine whether to shave himself.","reasoning_details":[{"type":"reasoning.encrypted","data":"[REDACTED]"},{"type":"reasoning.summary","summary":"Two barbers can shave each other."},{"type":"reasoning.text","text":"The contradiction only occurs when a barber must determine whether to shave himself.","signature":"sig_example"}]}],"model":"openai-gpt-oss-120b","object":"chat.completion.chunk"}}

event: message
data: {"chat_completion":{"id":"chatcmpl-910TWsy2VPnSfBbv5UztnSdYUJA10","choices":[{"delta":{"summary":"Each barber can shave the other, so neither needs to shave himself.","role":"assistant"},"index":0,"reasoning":"With two barbers, they can shave each other.","reasoning_details":[{"type":"reasoning.encrypted","data":"[REDACTED]"},{"type":"reasoning.summary","summary":"avoiding the self-reference paradox"},{"type":"reasoning.text","format":"some_text_reasoning_detail_format","text":"With two barbers, they can shave each other.","signature":"sig_example"}]}],"model":"openai-gpt-oss-120b","object":"chat.completion.chunk"}}

(...)

event: message
data: {"chat_completion":{"id":"chatcmpl-910TWsy2VPnSfBbv5UztnSdYUJA10","choices":[],"model":"openai-gpt-oss-120b","object":"chat.completion.chunk","usage":{"completion_tokens":28,"prompt_tokens":16,"total_tokens":44,"completion_tokens_details":{"reasoning_tokens":10}}}}

event: message
data: [DONE]

Perform chat completion inference on the service Generally available; Added in 8.18.0

Path parameters

Query parameters

Body Required

content string | array[object]

tool_choice string | object

Responses