Perform inference on the service using the Unified Schema
Added in 8.18.0
Path parameters
-
task_type
string Required The task type
Values are
sparse_embedding
,text_embedding
,rerank
, orcompletion
. -
inference_id
string Required The inference Id
Query parameters
-
timeout
string Specifies the amount of time to wait for the inference request to complete.
Body
-
messages
array[object] Required A list of objects representing the conversation.
-
model
string The ID of the model to use.
-
max_completion_tokens
number The upper bound limit for the number of tokens that can be generated for a completion request.
-
stop
array[string] A sequence of strings to control when the model should stop generating additional tokens.
-
temperature
number The sampling temperature to use.
-
tools
array[object] A list of tools that the model can call.
-
top_p
number Nucleus sampling, an alternative to sampling with temperature.
POST
/_inference/{task_type}/{inference_id}/_unified
curl \
--request POST http://api.example.com/_inference/{task_type}/{inference_id}/_unified \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '{"messages":[{"":"string","role":"string","tool_call_id":"string","tool_calls":[{"id":"string","function":{"arguments":"string","name":"string"},"type":"string"}]}],"model":"string","max_completion_tokens":42.0,"stop":["string"],"temperature":42.0,"":"string","tools":[{"type":"string","function":{"description":"string","name":"string","parameters":{},"strict":true}}],"top_p":42.0}'
Request examples
{
"messages": [
{
"": "string",
"role": "string",
"tool_call_id": "string",
"tool_calls": [
{
"id": "string",
"function": {
"arguments": "string",
"name": "string"
},
"type": "string"
}
]
}
],
"model": "string",
"max_completion_tokens": 42.0,
"stop": [
"string"
],
"temperature": 42.0,
"": "string",
"tools": [
{
"type": "string",
"function": {
"description": "string",
"name": "string",
"parameters": {},
"strict": true
}
}
],
"top_p": 42.0
}
Response examples (200)
{}