Amazon Bedrock inference integration
editAmazon Bedrock inference integration
editCreates an inference endpoint to perform an inference task with the amazonbedrock service.
Request
editPUT /_inference/<task_type>/<inference_id>
Path parameters
edit-
<inference_id> - (Required, string) The unique identifier of the inference endpoint.
-
<task_type> -
(Required, string) The type of the inference task that the model will perform.
Available task types:
-
completion, -
text_embedding.
-
Request body
edit-
chunking_settings -
(Optional, object) Chunking configuration object. Refer to Configuring chunking to learn more about chunking.
-
max_chunk_size -
(Optional, integer)
Specifies the maximum size of a chunk in words.
Defaults to
250. This value cannot be higher than300or lower than20(forsentencestrategy) or10(forwordstrategy). -
overlap -
(Optional, integer)
Only for
wordchunking strategy. Specifies the number of overlapping words for chunks. Defaults to100. This value cannot be higher than the half ofmax_chunk_size. -
sentence_overlap -
(Optional, integer)
Only for
sentencechunking strategy. Specifies the numnber of overlapping sentences for chunks. It can be either1or0. Defaults to1. -
strategy -
(Optional, string)
Specifies the chunking strategy.
It could be either
sentenceorword.
-
-
service -
(Required, string) The type of service supported for the specified task type.
In this case,
amazonbedrock. -
service_settings -
(Required, object) Settings used to install the inference model.
These settings are specific to the
amazonbedrockservice.-
access_key - (Required, string) A valid AWS access key that has permissions to use Amazon Bedrock and access to models for inference requests.
-
secret_key -
(Required, string)
A valid AWS secret key that is paired with the
access_key. To create or manage access and secret keys, see Managing access keys for IAM users in the AWS documentation.
-
You need to provide the access and secret keys only once, during the inference model creation. The Get inference API does not retrieve your access or secret keys. After creating the inference model, you cannot change the associated key pairs. If you want to use a different access and secret key pair, delete the inference model and recreate it with the same name and the updated keys.
-
provider -
(Required, string) The model provider for your deployment. Note that some providers may support only certain task types. Supported providers include:
-
amazontitan- available fortext_embeddingandcompletiontask types -
anthropic- available forcompletiontask type only -
ai21labs- available forcompletiontask type only -
cohere- available fortext_embeddingandcompletiontask types -
meta- available forcompletiontask type only -
mistral- available forcompletiontask type only
-
-
model - (Required, string) The base model ID or an ARN to a custom model based on a foundational model. The base model IDs can be found in the Amazon Bedrock model IDs documentation. Note that the model ID must be available for the provider chosen, and your IAM user must have access to the model.
-
region - (Required, string) The region that your model or ARN is deployed in. The list of available regions per model can be found in the Model support by AWS region documentation.
-
rate_limit -
(Optional, object) By default, the
amazonbedrockservice sets the number of requests allowed per minute to240. This helps to minimize the number of rate limit errors returned from Amazon Bedrock. To modify this, set therequests_per_minutesetting of this object in your service settings:"rate_limit": { "requests_per_minute": <<number_of_requests>> }-
task_settings -
(Optional, object)
Settings to configure the inference task.
These settings are specific to the
<task_type>you specified.
task_settingsfor thecompletiontask type-
max_new_tokens - (Optional, integer) Sets the maximum number for the output tokens to be generated. Defaults to 64.
-
temperature -
(Optional, float)
A number between 0.0 and 1.0 that controls the apparent creativity of the results. At temperature 0.0 the model is most deterministic, at temperature 1.0 most random.
Should not be used if
top_portop_kis specified. -
top_p -
(Optional, float)
Alternative to
temperature. A number in the range of 0.0 to 1.0, to eliminate low-probability tokens. Top-p uses nucleus sampling to select top tokens whose sum of likelihoods does not exceed a certain value, ensuring both variety and coherence. Should not be used iftemperatureis specified. -
top_k -
(Optional, float)
Only available for
anthropic,cohere, andmistralproviders. Alternative totemperature. Limits samples to the top-K most likely words, balancing coherence and variability. Should not be used iftemperatureis specified.
-
Amazon Bedrock service example
editThe following example shows how to create an inference endpoint called amazon_bedrock_embeddings to perform a text_embedding task type.
Choose chat completion and embeddings models that you have access to from the Amazon Bedrock base models.
resp = client.inference.put(
task_type="text_embedding",
inference_id="amazon_bedrock_embeddings",
inference_config={
"service": "amazonbedrock",
"service_settings": {
"access_key": "<aws_access_key>",
"secret_key": "<aws_secret_key>",
"region": "us-east-1",
"provider": "amazontitan",
"model": "amazon.titan-embed-text-v2:0"
}
},
)
print(resp)
const response = await client.inference.put({
task_type: "text_embedding",
inference_id: "amazon_bedrock_embeddings",
inference_config: {
service: "amazonbedrock",
service_settings: {
access_key: "<aws_access_key>",
secret_key: "<aws_secret_key>",
region: "us-east-1",
provider: "amazontitan",
model: "amazon.titan-embed-text-v2:0",
},
},
});
console.log(response);
PUT _inference/text_embedding/amazon_bedrock_embeddings
{
"service": "amazonbedrock",
"service_settings": {
"access_key": "<aws_access_key>",
"secret_key": "<aws_secret_key>",
"region": "us-east-1",
"provider": "amazontitan",
"model": "amazon.titan-embed-text-v2:0"
}
}
The next example shows how to create an inference endpoint called amazon_bedrock_completion to perform a completion task type.
resp = client.inference.put(
task_type="completion",
inference_id="amazon_bedrock_completion",
inference_config={
"service": "amazonbedrock",
"service_settings": {
"access_key": "<aws_access_key>",
"secret_key": "<aws_secret_key>",
"region": "us-east-1",
"provider": "amazontitan",
"model": "amazon.titan-text-premier-v1:0"
}
},
)
print(resp)
const response = await client.inference.put({
task_type: "completion",
inference_id: "amazon_bedrock_completion",
inference_config: {
service: "amazonbedrock",
service_settings: {
access_key: "<aws_access_key>",
secret_key: "<aws_secret_key>",
region: "us-east-1",
provider: "amazontitan",
model: "amazon.titan-text-premier-v1:0",
},
},
});
console.log(response);
PUT _inference/completion/amazon_bedrock_completion
{
"service": "amazonbedrock",
"service_settings": {
"access_key": "<aws_access_key>",
"secret_key": "<aws_secret_key>",
"region": "us-east-1",
"provider": "amazontitan",
"model": "amazon.titan-text-premier-v1:0"
}
}