Create a DeepSeek inference endpoint Generally available; Added in 8.19.0

PUT /_inference/{task_type}/{deepseek_inference_id}

Create an inference endpoint to perform an inference task with the deepseek service.

Required authorization

  • Cluster privileges: manage_inference

Path parameters

  • task_type string

    The type of the inference task that the model will perform.

    Values are completion or chat_completion.

  • deepseek_inference_id string Required

    The unique identifier of the inference endpoint.

Query parameters

application/json

Body Required

  • chunking_settings object

    The chunking configuration object.

    External documentation
    Hide chunking_settings attributes Show chunking_settings attributes object
    • max_chunk_size number

      The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).

      Default value is 250.

    • overlap number

      The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

      Default value is 100.

    • sentence_overlap number

      The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

      Default value is 1.

    • strategy string

      The chunking strategy: sentence or word.

      Default value is sentence.

  • service string Required

    The type of service supported for the specified task type. In this case, deepseek.

    Value is deepseek.

  • service_settings object Required

    Settings used to install the inference model. These settings are specific to the deepseek service.

    Hide service_settings attributes Show service_settings attributes object
    • api_key string Required

      A valid API key for your DeepSeek account. You can find or create your DeepSeek API keys on the DeepSeek API key page.

      IMPORTANT: You need to provide the API key only once, during the inference model creation. The get inference endpoint API does not retrieve your API key.

      External documentation
    • model_id string Required

      For a completion or chat_completion task, the name of the model to use for the inference task.

      For the available completion and chat_completion models, refer to the DeepSeek Models & Pricing docs.

    • url string

      The URL endpoint to use for the requests. Defaults to https://api.deepseek.com/chat/completions.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • chunking_settings object

      Chunking configuration object

      Hide chunking_settings attributes Show chunking_settings attributes object
      • max_chunk_size number

        The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).

        Default value is 250.

      • overlap number

        The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

        Default value is 100.

      • sentence_overlap number

        The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

        Default value is 1.

      • strategy string

        The chunking strategy: sentence or word.

        Default value is sentence.

    • service string Required

      The service type

    • service_settings object Required

      Settings specific to the service

    • task_settings object

      Task settings specific to the service and task type

    • inference_id string Required

      The inference Id

    • task_type string Required

      The task type

      Values are completion or chat_completion.

PUT /_inference/{task_type}/{deepseek_inference_id}
curl \
 --request PUT 'http://api.example.com/_inference/{task_type}/{deepseek_inference_id}' \
 --header "Authorization: $API_KEY" \
 --header "Content-Type: application/json" \
 --data '{"chunking_settings":{"max_chunk_size":250,"overlap":100,"sentence_overlap":1,"strategy":"sentence"},"service":"deepseek","service_settings":{"api_key":"string","model_id":"string","url":"string"}}'