Get an inference endpoint | Elasticsearch API documentation (v9)

Get an inference endpoint Generally available; Added in 8.11.0

GET /_inference/{task_type}/{inference_id}

All methods and paths for this operation:

GET /_inference

GET /_inference/{inference_id}

GET /_inference/{task_type}/_all

GET /_inference/{task_type}/{inference_id}

This API requires the monitor_inference cluster privilege (the built-in inference_admin and inference_user roles grant this privilege).

Path parameters

task_type string

The task type of the endpoint to return

Values are sparse_embedding, text_embedding, rerank, completion, or chat_completion.
inference_id string Required

The inference Id of the endpoint to return. Using _all or * will return all endpoints with the specified task_type if one is specified, or all endpoints for all task types if no task_type is specified

Responses

200 application/json
Hide response attribute Show response attribute object
- endpoints array[object] Required
  
  Hide endpoints attributes Show endpoints attributes object
  
  Represents an inference endpoint as returned by the GET API
  
  chunking_settings object
  
  The chunking configuration object. Applies only to the sparse_embedding and text_embedding task types. Not applicable to the rerank, completion, or chat_completion task types.
  
  Hide chunking_settings attributes Show chunking_settings attributes object
  
  max_chunk_size number
  
  The maximum size of a chunk in words. This value cannot be lower than 20 (for sentence strategy) or 10 (for word strategy). This value should not exceed the window size for the associated model.
  
  Default value is 250.
  
  overlap number
  
  The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.
  
  Default value is 100.
  
  sentence_overlap number
  
  The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.
  
  Default value is 1.
  
  separator_group string
  
  Only applicable to the recursive strategy and required when using it.
  
  Sets a predefined list of separators in the saved chunking settings based on the selected text type. Values can be markdown or plaintext.
  
  Using this parameter is an alternative to manually specifying a custom separators list.
  
  separators array[string]
  
  Only applicable to the recursive strategy and required when using it.
  
  A list of strings used as possible split points when chunking text.
  
  Each string can be a plain string or a regular expression (regex) pattern. The system tries each separator in order to split the text, starting from the first item in the list.
  
  After splitting, it attempts to recombine smaller pieces into larger chunks that stay within the max_chunk_size limit, to reduce the total number of chunks generated.
  
  strategy string
  
  The chunking strategy: sentence, word, none or recursive.
  
  If strategy is set to recursive, you must also specify:
  
  max_chunk_size
  
  either separators orseparator_group
  
  Learn more about different chunking strategies in the linked documentation.
  
  Default value is sentence.
  
  service string Required
  
  The service type
  
  service_settings object Required
  
  Settings specific to the service
  
  task_settings object
  
  Task settings specific to the service and task type
  
  inference_id string Required
  
  The inference Id
  
  task_type string Required
  
  The task type
  
  Values are sparse_embedding, text_embedding, rerank, completion, or chat_completion.

GET /_inference/{task_type}/{inference_id}

GET _inference/sparse_embedding/my-elser-model

resp = client.inference.get(
    task_type="sparse_embedding",
    inference_id="my-elser-model",
)

const response = await client.inference.get({
  task_type: "sparse_embedding",
  inference_id: "my-elser-model",
});

response = client.inference.get(
  task_type: "sparse_embedding",
  inference_id: "my-elser-model"
)

$resp = $client->inference()->get([
    "task_type" => "sparse_embedding",
    "inference_id" => "my-elser-model",
]);

curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_inference/sparse_embedding/my-elser-model"

client.inference().get(g -> g
    .inferenceId("my-elser-model")
    .taskType(TaskType.SparseEmbedding)
);