Get an inference endpoint Generally available; Added in 8.11.0

GET /_inference/{task_type}/_all

Path parameters

  • task_type string Required

    The task type of the endpoint to return

    Values are sparse_embedding, text_embedding, rerank, completion, or chat_completion.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • endpoints array[object] Required
      Hide endpoints attributes Show endpoints attributes object

      Represents an inference endpoint as returned by the GET API

      • chunking_settings object

        Chunking configuration object

        Hide chunking_settings attributes Show chunking_settings attributes object
        • max_chunk_size number

          The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).

          Default value is 250.0.

        • overlap number

          The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

          Default value is 100.0.

        • sentence_overlap number

          The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

          Default value is 1.0.

        • strategy string

          The chunking strategy: sentence or word.

          Default value is sentence.

      • service string Required

        The service type

      • service_settings object Required

        Settings specific to the service

      • task_settings object

        Task settings specific to the service and task type

      • inference_id string Required

        The inference Id

      • task_type string Required

        The task type

        Values are sparse_embedding, text_embedding, rerank, completion, or chat_completion.

GET /_inference/{task_type}/_all
curl \
 --request GET 'http://api.example.com/_inference/{task_type}/_all'