Start a trained model deployment
Added in 8.0.0
It allocates the model to every machine learning node.
Path parameters
-
model_id
string Required The unique identifier of the trained model. Currently, only PyTorch models are supported.
Query parameters
-
cache_size
number | string The inference cache size (in memory outside the JVM heap) per node for the model. The default value is the same size as the
model_size_bytes
. To disable the cache,0b
can be provided. -
deployment_id
string A unique identifier for the deployment of the model.
-
number_of_allocations
number The number of model allocations on each node where the model is deployed. All allocations on a node share the same copy of the model in memory but use a separate set of threads to evaluate the model. Increasing this value generally increases the throughput. If this setting is greater than the number of hardware threads it will automatically be changed to a value less than the number of hardware threads.
-
priority
string The deployment priority.
Values are
normal
orlow
. -
queue_capacity
number Specifies the number of inference requests that are allowed in the queue. After the number of requests exceeds this value, new requests are rejected with a 429 error.
-
threads_per_allocation
number Sets the number of threads used by each model allocation during inference. This generally increases the inference speed. The inference process is a compute-bound process; any number greater than the number of available hardware threads on the machine does not increase the inference speed. If this setting is greater than the number of hardware threads it will automatically be changed to a value less than the number of hardware threads.
-
timeout
string Specifies the amount of time to wait for the model to deploy.
-
wait_for
string Specifies the allocation status to wait for before returning.
Values are
started
,starting
, orfully_allocated
.
curl \
--request POST http://api.example.com/_ml/trained_models/{model_id}/deployment/_start \
--header "Authorization: $API_KEY"
{
"assignment": {
"adaptive_allocations": {
"enabled": true,
"min_number_of_allocations": 42.0,
"max_number_of_allocations": 42.0
},
"assignment_state": "started",
"max_assigned_allocations": 42.0,
"reason": "string",
"routing_table": {
"*": {
"reason": "string",
"routing_state": "failed",
"current_allocations": 42.0,
"target_allocations": 42.0
}
},
"": "string",
"task_parameters": {
"": 42.0,
"model_id": "string",
"deployment_id": "string",
"number_of_allocations": 42.0,
"priority": "normal",
"queue_capacity": 42.0,
"threads_per_allocation": 42.0
}
}
}