Azure OpenAI inference integration
editAzure OpenAI inference integration
editCreates an inference endpoint to perform an inference task with the azureopenai service.
Request
editPUT /_inference/<task_type>/<inference_id>
Path parameters
edit-
<inference_id> - (Required, string) The unique identifier of the inference endpoint.
-
<task_type> -
(Required, string) The type of the inference task that the model will perform.
Available task types:
-
completion, -
text_embedding.
-
Request body
edit-
chunking_settings -
(Optional, object) Chunking configuration object. Refer to Configuring chunking to learn more about chunking.
-
max_chunk_size -
(Optional, integer)
Specifies the maximum size of a chunk in words.
Defaults to
250. This value cannot be higher than300or lower than20(forsentencestrategy) or10(forwordstrategy). -
overlap -
(Optional, integer)
Only for
wordchunking strategy. Specifies the number of overlapping words for chunks. Defaults to100. This value cannot be higher than the half ofmax_chunk_size. -
sentence_overlap -
(Optional, integer)
Only for
sentencechunking strategy. Specifies the numnber of overlapping sentences for chunks. It can be either1or0. Defaults to1. -
strategy -
(Optional, string)
Specifies the chunking strategy.
It could be either
sentenceorword.
-
-
service -
(Required, string)
The type of service supported for the specified task type. In this case,
azureopenai. -
service_settings -
(Required, object) Settings used to install the inference model.
These settings are specific to the
azureopenaiservice.-
api_keyorentra_id -
(Required, string) You must provide either an API key or an Entra ID. If you do not provide either, or provide both, you will receive an error when trying to create your model. See the Azure OpenAI Authentication documentation for more details on these authentication types.
You need to provide the API key only once, during the inference model creation. The Get inference API does not retrieve your API key. After creating the inference model, you cannot change the associated API key. If you want to use a different API key, delete the inference model and recreate it with the same name and the updated API key.
-
resource_name - (Required, string) The name of your Azure OpenAI resource. You can find this from the list of resources in the Azure Portal for your subscription.
-
deployment_id - (Required, string) The deployment name of your deployed models. Your Azure OpenAI deployments can be found though the Azure OpenAI Studio portal that is linked to your subscription.
-
api_version - (Required, string) The Azure API version ID to use. We recommend using the latest supported non-preview version.
-
rate_limit -
(Optional, object) The
azureopenaiservice sets a default number of requests allowed per minute depending on the task type. Fortext_embeddingit is set to1440. Forcompletionit is set to120. This helps to minimize the number of rate limit errors returned from Azure. To modify this, set therequests_per_minutesetting of this object in your service settings:"rate_limit": { "requests_per_minute": <<number_of_requests>> }More information about the rate limits for Azure can be found in the Quota limits docs and How to change the quotas.
-
-
task_settings -
(Optional, object) Settings to configure the inference task. These settings are specific to the
<task_type>you specified.task_settingsfor thecompletiontask type-
user - (optional, string) Specifies the user issuing the request, which can be used for abuse detection.
task_settingsfor thetext_embeddingtask type-
user - (optional, string) Specifies the user issuing the request, which can be used for abuse detection.
-
Azure OpenAI service example
editThe following example shows how to create an inference endpoint called
azure_openai_embeddings to perform a text_embedding task type.
Note that we do not specify a model here, as it is defined already via our Azure OpenAI deployment.
The list of embeddings models that you can choose from in your deployment can be found in the Azure models documentation.
resp = client.inference.put(
task_type="text_embedding",
inference_id="azure_openai_embeddings",
inference_config={
"service": "azureopenai",
"service_settings": {
"api_key": "<api_key>",
"resource_name": "<resource_name>",
"deployment_id": "<deployment_id>",
"api_version": "2024-02-01"
}
},
)
print(resp)
const response = await client.inference.put({
task_type: "text_embedding",
inference_id: "azure_openai_embeddings",
inference_config: {
service: "azureopenai",
service_settings: {
api_key: "<api_key>",
resource_name: "<resource_name>",
deployment_id: "<deployment_id>",
api_version: "2024-02-01",
},
},
});
console.log(response);
PUT _inference/text_embedding/azure_openai_embeddings
{
"service": "azureopenai",
"service_settings": {
"api_key": "<api_key>",
"resource_name": "<resource_name>",
"deployment_id": "<deployment_id>",
"api_version": "2024-02-01"
}
}
The next example shows how to create an inference endpoint called
azure_openai_completion to perform a completion task type.
resp = client.inference.put(
task_type="completion",
inference_id="azure_openai_completion",
inference_config={
"service": "azureopenai",
"service_settings": {
"api_key": "<api_key>",
"resource_name": "<resource_name>",
"deployment_id": "<deployment_id>",
"api_version": "2024-02-01"
}
},
)
print(resp)
const response = await client.inference.put({
task_type: "completion",
inference_id: "azure_openai_completion",
inference_config: {
service: "azureopenai",
service_settings: {
api_key: "<api_key>",
resource_name: "<resource_name>",
deployment_id: "<deployment_id>",
api_version: "2024-02-01",
},
},
});
console.log(response);
PUT _inference/completion/azure_openai_completion
{
"service": "azureopenai",
"service_settings": {
"api_key": "<api_key>",
"resource_name": "<resource_name>",
"deployment_id": "<deployment_id>",
"api_version": "2024-02-01"
}
}
The list of chat completion models that you can choose from in your Azure OpenAI deployment can be found at the following places: