The chat completion inference API enables real-time responses for chat completion tasks by delivering answers incrementally, reducing response times during computation.
It only works with the chat_completion task type.
NOTE: The chat_completion task type is only available within the _stream API and only supports streaming.
The Chat completion inference API and the Stream inference API differ in their response structure and capabilities.
The Chat completion inference API provides more comprehensive customization options through more fields and function calling support.
To determine whether a given inference service supports this task type, please see the page for that service.
Query parameters
-
Specifies the amount of time to wait for the inference request to complete.
External documentation
Body
Required
-
A list of objects representing the conversation. Requests should generally only add new messages from the user (role
user). The other message roles (assistant,system, ortool) should generally only be copied from the response to a previous completion request, such that the messages array is built up throughout a conversation.An object representing part of the conversation.
-
The ID of the model to use. By default, the model ID is set to the value included when creating the inference endpoint.
-
The upper bound limit for the number of tokens that can be generated for a completion request.
-
The reasoning configuration for the completion request. This controls the model's reasoning process in one of three ways:
- By specifying the model’s reasoning effort level with the
effortfield. - By setting a maximum number of reasoning tokens with the
max_tokensfield. - By enabling reasoning with default settings by setting
enabledfield totrue.
It also includes optional settings to control:
- The level of detail in the summary returned in the response with the
summaryfield. - Whether reasoning details are included in the response at all with the
excludefield.
Example (effort):
{ "reasoning": { "effort": "high", "summary": "concise", "exclude": false } }Example (max_tokens):
{ "reasoning": { "max_tokens": 100, "summary": "concise", "exclude": false } }Example (enabled):
{ "reasoning": { "enabled": true, "summary": "concise", "exclude": false } }Currently supported only for
elasticprovider. - By specifying the model’s reasoning effort level with the
-
A sequence of strings to control when the model should stop generating additional tokens.
-
The sampling temperature to use.
tool_choice
string | object Controls which tool is called by the model. String representation: One of
auto,none, orrequrired.autoallows the model to choose between calling tools and generating a message.nonecauses the model to not call any tools.requiredforces the model to call one or more tools. Example (object representation):{ "tool_choice": { "type": "function", "function": { "name": "get_current_weather" } } }-
A list of tools that the model can call. Example:
{ "tools": [ { "type": "function", "function": { "name": "get_price_of_item", "description": "Get the current price of an item", "parameters": { "type": "object", "properties": { "item": { "id": "12345" }, "unit": { "type": "currency" } } } } } ] }A list of tools that the model can call.
-
Nucleus sampling, an alternative to sampling with temperature.
POST _inference/chat_completion/openai-completion/_stream
{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "What is Elastic?"
}
]
}
resp = client.inference.chat_completion_unified(
inference_id="openai-completion",
chat_completion_request={
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "What is Elastic?"
}
]
},
)
const response = await client.inference.chatCompletionUnified({
inference_id: "openai-completion",
chat_completion_request: {
model: "gpt-4o",
messages: [
{
role: "user",
content: "What is Elastic?",
},
],
},
});
response = client.inference.chat_completion_unified(
inference_id: "openai-completion",
body: {
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "What is Elastic?"
}
]
}
)
$resp = $client->inference()->chatCompletionUnified([
"inference_id" => "openai-completion",
"body" => [
"model" => "gpt-4o",
"messages" => array(
[
"role" => "user",
"content" => "What is Elastic?",
],
),
],
]);
curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"model":"gpt-4o","messages":[{"role":"user","content":"What is Elastic?"}]}' "$ELASTICSEARCH_URL/_inference/chat_completion/openai-completion/_stream"
client.inference().chatCompletionUnified(c -> c
.inferenceId("openai-completion")
.chatCompletionRequest(ch -> ch
.messages(m -> m
.content(co -> co
.string("What is Elastic?")
)
.role("user")
)
.model("gpt-4o")
)
);
{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "What is Elastic?"
}
]
}
{
"messages": [
{
"role": "assistant",
"content": "Let's find out what the weather is",
"tool_calls": [
{
"id": "call_KcAjWtAww20AihPHphUh46Gd",
"type": "function",
"function": {
"name": "get_current_weather",
"arguments": "{\"location\":\"Boston, MA\"}"
}
}
]
},
{
"role": "tool",
"content": "The weather is cold",
"tool_call_id": "call_KcAjWtAww20AihPHphUh46Gd"
}
]
}
{
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's the price of a scarf?"
}
]
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_current_price",
"description": "Get the current price of a item",
"parameters": {
"type": "object",
"properties": {
"item": {
"id": "123"
}
}
}
}
}
],
"tool_choice": {
"type": "function",
"function": {
"name": "get_current_price"
}
}
}
{
"messages": [{
"role": "user",
"content": [{
"type": "text",
"text": "Barber shaves all those, who do not shave themselves. Who shaves the barber?"
}
]
}, {
"role": "assistant",
"content": [{
"type": "text",
"text": "This is the barber paradox. Such a barber cannot logically exist."
}
],
"reasoning": "If the barber shaves himself, he should not; if he does not, he should.",
"reasoning_details": [{
"type": "reasoning.encrypted",
"data": "[REDACTED]"
}, {
"type": "reasoning.summary",
"summary": "Barber shaving himself creates contradiction"
}, {
"type": "reasoning.text",
"text": "If the barber shaves himself, he should not; if he does not, he should.",
"signature": "sig_123"
}
]
}, {
"role": "user",
"content": [{
"type": "text",
"text": "What if there are 2 barbers?"
}
]
}
],
"reasoning": {
"effort": "high",
"summary": "detailed",
"exclude": false
}
}
{
"messages": [{
"role": "user",
"content": [{
"type": "text",
"text": "Barber shaves all those, who do not shave themselves. Who shaves the barber?"
}
]
}, {
"role": "assistant",
"content": [{
"type": "text",
"text": "This is the barber paradox. Such a barber cannot logically exist."
}
],
"reasoning": "If the barber shaves himself, he should not; if he does not, he should.",
"reasoning_details": [{
"type": "reasoning.encrypted",
"data": "[REDACTED]"
}, {
"type": "reasoning.summary",
"summary": "Barber shaving himself creates contradiction"
}, {
"type": "reasoning.text",
"text": "If the barber shaves himself, he should not; if he does not, he should.",
"signature": "sig_123"
}
]
}, {
"role": "user",
"content": [{
"type": "text",
"text": "What if there are 2 barbers?"
}
]
}
],
"reasoning": {
"max_tokens": 100,
"summary": "detailed",
"exclude": false
}
}
{
"messages": [{
"role": "user",
"content": [{
"type": "text",
"text": "Barber shaves all those, who do not shave themselves. Who shaves the barber?"
}
]
}, {
"role": "assistant",
"content": [{
"type": "text",
"text": "This is the barber paradox. Such a barber cannot logically exist."
}
],
"reasoning": "If the barber shaves himself, he should not; if he does not, he should.",
"reasoning_details": [{
"type": "reasoning.encrypted",
"data": "[REDACTED]"
}, {
"type": "reasoning.summary",
"summary": "Barber shaving himself creates contradiction"
}, {
"type": "reasoning.text",
"text": "If the barber shaves himself, he should not; if he does not, he should.",
"signature": "sig_123"
}
]
}, {
"role": "user",
"content": [{
"type": "text",
"text": "What if there are 2 barbers?"
}
]
}
],
"reasoning": {
"enabled": true,
"summary": "detailed",
"exclude": false
}
}
event: message
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[{"delta":{"content":"","role":"assistant"},"index":0}],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk"}}
event: message
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[{"delta":{"content":Elastic"},"index":0}],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk"}}
event: message
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[{"delta":{"content":" is"},"index":0}],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk"}}
(...)
event: message
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk","usage":{"completion_tokens":28,"prompt_tokens":16,"total_tokens":44}}}
event: message
data: [DONE]
event: message
data: {"chat_completion":{"id":"chatcmpl-910TWsy2VPnSfBbv5UztnSdYUJA10","choices":[{"delta":{"content":"With two barbers, the paradox disappears.","role":"assistant"},"index":0,"reasoning":"The contradiction only occurs when a barber must determine whether to shave himself.","reasoning_details":[{"type":"reasoning.encrypted","data":"[REDACTED]"},{"type":"reasoning.summary","summary":"Two barbers can shave each other."},{"type":"reasoning.text","text":"The contradiction only occurs when a barber must determine whether to shave himself.","signature":"sig_example"}]}],"model":"openai-gpt-oss-120b","object":"chat.completion.chunk"}}
event: message
data: {"chat_completion":{"id":"chatcmpl-910TWsy2VPnSfBbv5UztnSdYUJA10","choices":[{"delta":{"summary":"Each barber can shave the other, so neither needs to shave himself.","role":"assistant"},"index":0,"reasoning":"With two barbers, they can shave each other.","reasoning_details":[{"type":"reasoning.encrypted","data":"[REDACTED]"},{"type":"reasoning.summary","summary":"avoiding the self-reference paradox"},{"type":"reasoning.text","format":"some_text_reasoning_detail_format","text":"With two barbers, they can shave each other.","signature":"sig_example"}]}],"model":"openai-gpt-oss-120b","object":"chat.completion.chunk"}}
(...)
event: message
data: {"chat_completion":{"id":"chatcmpl-910TWsy2VPnSfBbv5UztnSdYUJA10","choices":[],"model":"openai-gpt-oss-120b","object":"chat.completion.chunk","usage":{"completion_tokens":28,"prompt_tokens":16,"total_tokens":44,"completion_tokens_details":{"reasoning_tokens":10}}}}
event: message
data: [DONE]