VoyageAI inference integration
editVoyageAI inference integration
editCreates an inference endpoint to perform an inference task with the voyageai service.
Request
editPUT /_inference/<task_type>/<inference_id>
Path parameters
edit-
<inference_id> - (Required, string) The unique identifier of the inference endpoint.
-
<task_type> -
(Required, string) The type of the inference task that the model will perform.
Available task types:
-
text_embedding, -
rerank.
-
Request body
edit-
chunking_settings -
(Optional, object) Chunking configuration object. Refer to Configuring chunking to learn more about chunking.
-
max_chunk_size -
(Optional, integer)
Specifies the maximum size of a chunk in words.
Defaults to
250. This value cannot be higher than300or lower than20(forsentencestrategy) or10(forwordstrategy). -
overlap -
(Optional, integer)
Only for
wordchunking strategy. Specifies the number of overlapping words for chunks. Defaults to100. This value cannot be higher than the half ofmax_chunk_size. -
sentence_overlap -
(Optional, integer)
Only for
sentencechunking strategy. Specifies the numnber of overlapping sentences for chunks. It can be either1or0. Defaults to1. -
strategy -
(Optional, string)
Specifies the chunking strategy.
It could be either
sentenceorword.
-
-
service -
(Required, string)
The type of service supported for the specified task type. In this case,
voyageai. -
service_settings -
(Required, object) Settings used to install the inference model.
These settings are specific to the
voyageaiservice.-
dimensions -
(Optional, integer)
The number of dimensions the resulting output embeddings should have.
This setting maps to
output_dimensionin the VoyageAI documentation. Only for thetext_embeddingtask type. -
embedding_type -
(Optional, string)
The data type for the embeddings to be returned.
This setting maps to
output_dtypein the VoyageAI documentation. Permitted values:float,int8,bit.int8is a synonym ofbytein the VoyageAI documentation.bitis a synonym ofbinaryin the VoyageAI documentation. Only for thetext_embeddingtask type. -
model_id - (Required, string) The name of the model to use for the inference task. Refer to the VoyageAI documentation for the list of available text embedding and rerank models.
-
rate_limit -
(Optional, object) This setting helps to minimize the number of rate limit errors returned from VoyageAI. The
voyageaiservice sets a default number of requests allowed per minute depending on the task type. For bothtext_embeddingandrerank, it is set to2000. To modify this, set therequests_per_minutesetting of this object in your service settings:"rate_limit": { "requests_per_minute": <<number_of_requests>> }More information about the rate limits for OpenAI can be found in your Account limits.
-
-
task_settings -
(Optional, object) Settings to configure the inference task. These settings are specific to the
<task_type>you specified.task_settingsfor thetext_embeddingtask type-
input_type -
(Optional, string)
Type of the input text.
Permitted values:
ingest(maps todocumentin the VoyageAI documentation),search(maps toqueryin the VoyageAI documentation). -
truncation -
(Optional, boolean)
Whether to truncate the input texts to fit within the context length.
Defaults to
false.
task_settingsfor thereranktask type-
return_documents -
(Optional, boolean)
Whether to return the source documents in the response.
Defaults to
false. -
top_k - (Optional, integer) The number of most relevant documents to return. If not specified, the reranking results of all documents will be returned.
-
truncation -
(Optional, boolean)
Whether to truncate the input texts to fit within the context length.
Defaults to
false.
-
VoyageAI service example
editThe following example shows how to create an inference endpoint called voyageai-embeddings to perform a text_embedding task type.
The embeddings created by requests to this endpoint will have 512 dimensions.
resp = client.inference.put(
task_type="text_embedding",
inference_id="voyageai-embeddings",
inference_config={
"service": "voyageai",
"service_settings": {
"model_id": "voyage-3-large",
"dimensions": 512
}
},
)
print(resp)
PUT _inference/text_embedding/voyageai-embeddings
{
"service": "voyageai",
"service_settings": {
"model_id": "voyage-3-large",
"dimensions": 512
}
}
The next example shows how to create an inference endpoint called voyageai-rerank to perform a rerank task type.
resp = client.inference.put(
task_type="rerank",
inference_id="voyageai-rerank",
inference_config={
"service": "voyageai",
"service_settings": {
"model_id": "rerank-2"
}
},
)
print(resp)
PUT _inference/rerank/voyageai-rerank
{
"service": "voyageai",
"service_settings": {
"model_id": "rerank-2"
}
}