Create trained model vocabulary APIedit

Creates a trained model vocabulary. This is supported only for natural language processing (NLP) models.


PUT _ml/trained_models/<model_id>/vocabulary/


Requires the manage_ml cluster privilege. This privilege is included in the machine_learning_admin built-in role.


The vocabulary is stored in the index as described in inference_config.*.vocabulary of the trained model definition.

Path parametersedit

(Required, string) The unique identifier of the trained model.

Request bodyedit

(array) The model vocabulary. Must not be empty.
(Optional, array) The model merges used in byte-pair encoding. The merges must be sub-token pairs, space delimited, and in order of preference. Example: ["f o", "fo o"]. Must be provided for RoBERTa and BART style models.
(Optional, array) Vocabulary value scores used by sentence-piece tokenization. Must have the same length as vocabulary. Required for unigram sentence-piece tokenized models like XLMRoberta and T5.


The following example shows how to create a model vocabulary for a previously stored trained model configuration.

PUT _ml/trained_models/elastic__distilbert-base-uncased-finetuned-conll03-english/vocabulary
  "vocabulary": [

The API returns the following results:

    "acknowledged": true