Get anomaly detection jobs Added in 7.7.0

GET /_cat/ml/anomaly_detectors

Get configuration and usage information for anomaly detection jobs. This API returns a maximum of 10,000 jobs. If the Elasticsearch security features are enabled, you must have monitor_ml, monitor, manage_ml, or manage cluster privileges to use this API.

IMPORTANT: CAT APIs are only intended for human consumption using the Kibana console or command line. They are not intended for use by applications. For application consumption, use the get anomaly detection job statistics API.

Query parameters

  • Specifies what to do when the request:

    • Contains wildcard expressions and there are no jobs that match.
    • Contains the _all string or no identifiers and there are no matches.
    • Contains wildcard expressions and there are only partial matches.

    If true, the API returns an empty jobs array when there are no matches and the subset of results when there are partial matches. If false, the API returns a 404 status code when there are no matches or only partial matches.

  • bytes string

    The unit used to display byte values.

    Values are b, kb, mb, gb, tb, or pb.

  • h string | array[string]

    Comma-separated list of column names to display.

    Supported values include:

    • assignment_explanation (or ae): For open anomaly detection jobs only, contains messages relating to the selection of a node to run the job.
    • buckets.count (or bc, bucketsCount): The number of bucket results produced by the job.
    • buckets.time.exp_avg (or btea, bucketsTimeExpAvg): Exponential moving average of all bucket processing times, in milliseconds.
    • buckets.time.exp_avg_hour (or bteah, bucketsTimeExpAvgHour): Exponentially-weighted moving average of bucket processing times calculated in a 1 hour time window, in milliseconds.
    • buckets.time.max (or btmax, bucketsTimeMax): Maximum among all bucket processing times, in milliseconds.
    • buckets.time.min (or btmin, bucketsTimeMin): Minimum among all bucket processing times, in milliseconds.
    • buckets.time.total (or btt, bucketsTimeTotal): Sum of all bucket processing times, in milliseconds.
    • data.buckets (or db, dataBuckets): The number of buckets processed.
    • data.earliest_record (or der, dataEarliestRecord): The timestamp of the earliest chronologically input document.
    • data.empty_buckets (or deb, dataEmptyBuckets): The number of buckets which did not contain any data.
    • data.input_bytes (or dib, dataInputBytes): The number of bytes of input data posted to the anomaly detection job.
    • data.input_fields (or dif, dataInputFields): The total number of fields in input documents posted to the anomaly detection job. This count includes fields that are not used in the analysis. However, be aware that if you are using a datafeed, it extracts only the required fields from the documents it retrieves before posting them to the job.
    • data.input_records (or dir, dataInputRecords): The number of input documents posted to the anomaly detection job.
    • data.invalid_dates (or did, dataInvalidDates): The number of input documents with either a missing date field or a date that could not be parsed.
    • data.last (or dl, dataLast): The timestamp at which data was last analyzed, according to server time.
    • data.last_empty_bucket (or dleb, dataLastEmptyBucket): The timestamp of the last bucket that did not contain any data.
    • data.last_sparse_bucket (or dlsb, dataLastSparseBucket): The timestamp of the last bucket that was considered sparse.
    • data.latest_record (or dlr, dataLatestRecord): The timestamp of the latest chronologically input document.
    • data.missing_fields (or dmf, dataMissingFields): The number of input documents that are missing a field that the anomaly detection job is configured to analyze. Input documents with missing fields are still processed because it is possible that not all fields are missing.
    • data.out_of_order_timestamps (or doot, dataOutOfOrderTimestamps): The number of input documents that have a timestamp chronologically preceding the start of the current anomaly detection bucket offset by the latency window. This information is applicable only when you provide data to the anomaly detection job by using the post data API. These out of order documents are discarded, since jobs require time series data to be in ascending chronological order.
    • data.processed_fields (or dpf, dataProcessedFields): The total number of fields in all the documents that have been processed by the anomaly detection job. Only fields that are specified in the detector configuration object contribute to this count. The timestamp is not included in this count.
    • data.processed_records (or dpr, dataProcessedRecords): The number of input documents that have been processed by the anomaly detection job. This value includes documents with missing fields, since they are nonetheless analyzed. If you use datafeeds and have aggregations in your search query, the processed record count is the number of aggregation results processed, not the number of Elasticsearch documents.
    • data.sparse_buckets (or dsb, dataSparseBuckets): The number of buckets that contained few data points compared to the expected number of data points.
    • forecasts.memory.avg (or fmavg, forecastsMemoryAvg): The average memory usage in bytes for forecasts related to the anomaly detection job.
    • forecasts.memory.max (or fmmax, forecastsMemoryMax): The maximum memory usage in bytes for forecasts related to the anomaly detection job.
    • forecasts.memory.min (or fmmin, forecastsMemoryMin): The minimum memory usage in bytes for forecasts related to the anomaly detection job.
    • forecasts.memory.total (or fmt, forecastsMemoryTotal): The total memory usage in bytes for forecasts related to the anomaly detection job.
    • forecasts.records.avg (or fravg, forecastsRecordsAvg): The average number of model_forecast` documents written for forecasts related to the anomaly detection job.
    • forecasts.records.max (or frmax, forecastsRecordsMax): The maximum number of model_forecast documents written for forecasts related to the anomaly detection job.
    • forecasts.records.min (or frmin, forecastsRecordsMin): The minimum number of model_forecast documents written for forecasts related to the anomaly detection job.
    • forecasts.records.total (or frt, forecastsRecordsTotal): The total number of model_forecast documents written for forecasts related to the anomaly detection job.
    • forecasts.time.avg (or ftavg, forecastsTimeAvg): The average runtime in milliseconds for forecasts related to the anomaly detection job.
    • forecasts.time.max (or ftmax, forecastsTimeMax): The maximum runtime in milliseconds for forecasts related to the anomaly detection job.
    • forecasts.time.min (or ftmin, forecastsTimeMin): The minimum runtime in milliseconds for forecasts related to the anomaly detection job.
    • forecasts.time.total (or ftt, forecastsTimeTotal): The total runtime in milliseconds for forecasts related to the anomaly detection job.
    • forecasts.total (or ft, forecastsTotal): The number of individual forecasts currently available for the job.
    • id: Identifier for the anomaly detection job.
    • model.bucket_allocation_failures (or mbaf, modelBucketAllocationFailures): The number of buckets for which new entities in incoming data were not processed due to insufficient model memory.
    • model.by_fields (or mbf, modelByFields): The number of by field values that were analyzed by the models. This value is cumulative for all detectors in the job.
    • model.bytes (or mb, modelBytes): The number of bytes of memory used by the models. This is the maximum value since the last time the model was persisted. If the job is closed, this value indicates the latest size.
    • model.bytes_exceeded (or mbe, modelBytesExceeded): The number of bytes over the high limit for memory usage at the last allocation failure.
    • model.categorization_status (or mcs, modelCategorizationStatus): The status of categorization for the job: ok or warn. If ok, categorization is performing acceptably well (or not being used at all). If warn, categorization is detecting a distribution of categories that suggests the input data is inappropriate for categorization. Problems could be that there is only one category, more than 90% of categories are rare, the number of categories is greater than 50% of the number of categorized documents, there are no frequently matched categories, or more than 50% of categories are dead.
    • model.categorized_doc_count (or mcdc, modelCategorizedDocCount): The number of documents that have had a field categorized.
    • model.dead_category_count (or mdcc, modelDeadCategoryCount): The number of categories created by categorization that will never be assigned again because another category’s definition makes it a superset of the dead category. Dead categories are a side effect of the way categorization has no prior training.
    • model.failed_category_count (or mdcc, modelFailedCategoryCount): The number of times that categorization wanted to create a new category but couldn’t because the job had hit its model memory limit. This count does not track which specific categories failed to be created. Therefore, you cannot use this value to determine the number of unique categories that were missed.
    • model.frequent_category_count (or mfcc, modelFrequentCategoryCount): The number of categories that match more than 1% of categorized documents.
    • model.log_time (or mlt, modelLogTime): The timestamp when the model stats were gathered, according to server time.
    • model.memory_limit (or mml, modelMemoryLimit): The timestamp when the model stats were gathered, according to server time.
    • model.memory_status (or mms, modelMemoryStatus): The status of the mathematical models: ok, soft_limit, or hard_limit. If ok, the models stayed below the configured value. If soft_limit, the models used more than 60% of the configured memory limit and older unused models will be pruned to free up space. Additionally, in categorization jobs no further category examples will be stored. If hard_limit, the models used more space than the configured memory limit. As a result, not all incoming data was processed.
    • model.over_fields (or mof, modelOverFields): The number of over field values that were analyzed by the models. This value is cumulative for all detectors in the job.
    • model.partition_fields (or mpf, modelPartitionFields): The number of partition field values that were analyzed by the models. This value is cumulative for all detectors in the job.
    • model.rare_category_count (or mrcc, modelRareCategoryCount): The number of categories that match just one categorized document.
    • model.timestamp (or mt, modelTimestamp): The timestamp of the last record when the model stats were gathered.
    • model.total_category_count (or mtcc, modelTotalCategoryCount): The number of categories created by categorization.
    • node.address (or na, nodeAddress): The network address of the node that runs the job. This information is available only for open jobs.
    • node.ephemeral_id (or ne, nodeEphemeralId): The ephemeral ID of the node that runs the job. This information is available only for open jobs.
    • node.id (or ni, nodeId): The unique identifier of the node that runs the job. This information is available only for open jobs.
    • node.name (or nn, nodeName): The name of the node that runs the job. This information is available only for open jobs.
    • opened_time (or ot): For open jobs only, the elapsed time for which the job has been open.
    • state (or s): The status of the anomaly detection job: closed, closing, failed, opened, or opening. If closed, the job finished successfully with its model state persisted. The job must be opened before it can accept further data. If closing, the job close action is in progress and has not yet completed. A closing job cannot accept further data. If failed, the job did not finish successfully due to an error. This situation can occur due to invalid input data, a fatal error occurring during the analysis, or an external interaction such as the process being killed by the Linux out of memory (OOM) killer. If the job had irrevocably failed, it must be force closed and then deleted. If the datafeed can be corrected, the job can be closed and then re-opened. If opened, the job is available to receive and process data. If opening, the job open action is in progress and has not yet completed.
  • s string | array[string]

    Comma-separated list of column names or column aliases used to sort the response.

    Supported values include:

    • assignment_explanation (or ae): For open anomaly detection jobs only, contains messages relating to the selection of a node to run the job.
    • buckets.count (or bc, bucketsCount): The number of bucket results produced by the job.
    • buckets.time.exp_avg (or btea, bucketsTimeExpAvg): Exponential moving average of all bucket processing times, in milliseconds.
    • buckets.time.exp_avg_hour (or bteah, bucketsTimeExpAvgHour): Exponentially-weighted moving average of bucket processing times calculated in a 1 hour time window, in milliseconds.
    • buckets.time.max (or btmax, bucketsTimeMax): Maximum among all bucket processing times, in milliseconds.
    • buckets.time.min (or btmin, bucketsTimeMin): Minimum among all bucket processing times, in milliseconds.
    • buckets.time.total (or btt, bucketsTimeTotal): Sum of all bucket processing times, in milliseconds.
    • data.buckets (or db, dataBuckets): The number of buckets processed.
    • data.earliest_record (or der, dataEarliestRecord): The timestamp of the earliest chronologically input document.
    • data.empty_buckets (or deb, dataEmptyBuckets): The number of buckets which did not contain any data.
    • data.input_bytes (or dib, dataInputBytes): The number of bytes of input data posted to the anomaly detection job.
    • data.input_fields (or dif, dataInputFields): The total number of fields in input documents posted to the anomaly detection job. This count includes fields that are not used in the analysis. However, be aware that if you are using a datafeed, it extracts only the required fields from the documents it retrieves before posting them to the job.
    • data.input_records (or dir, dataInputRecords): The number of input documents posted to the anomaly detection job.
    • data.invalid_dates (or did, dataInvalidDates): The number of input documents with either a missing date field or a date that could not be parsed.
    • data.last (or dl, dataLast): The timestamp at which data was last analyzed, according to server time.
    • data.last_empty_bucket (or dleb, dataLastEmptyBucket): The timestamp of the last bucket that did not contain any data.
    • data.last_sparse_bucket (or dlsb, dataLastSparseBucket): The timestamp of the last bucket that was considered sparse.
    • data.latest_record (or dlr, dataLatestRecord): The timestamp of the latest chronologically input document.
    • data.missing_fields (or dmf, dataMissingFields): The number of input documents that are missing a field that the anomaly detection job is configured to analyze. Input documents with missing fields are still processed because it is possible that not all fields are missing.
    • data.out_of_order_timestamps (or doot, dataOutOfOrderTimestamps): The number of input documents that have a timestamp chronologically preceding the start of the current anomaly detection bucket offset by the latency window. This information is applicable only when you provide data to the anomaly detection job by using the post data API. These out of order documents are discarded, since jobs require time series data to be in ascending chronological order.
    • data.processed_fields (or dpf, dataProcessedFields): The total number of fields in all the documents that have been processed by the anomaly detection job. Only fields that are specified in the detector configuration object contribute to this count. The timestamp is not included in this count.
    • data.processed_records (or dpr, dataProcessedRecords): The number of input documents that have been processed by the anomaly detection job. This value includes documents with missing fields, since they are nonetheless analyzed. If you use datafeeds and have aggregations in your search query, the processed record count is the number of aggregation results processed, not the number of Elasticsearch documents.
    • data.sparse_buckets (or dsb, dataSparseBuckets): The number of buckets that contained few data points compared to the expected number of data points.
    • forecasts.memory.avg (or fmavg, forecastsMemoryAvg): The average memory usage in bytes for forecasts related to the anomaly detection job.
    • forecasts.memory.max (or fmmax, forecastsMemoryMax): The maximum memory usage in bytes for forecasts related to the anomaly detection job.
    • forecasts.memory.min (or fmmin, forecastsMemoryMin): The minimum memory usage in bytes for forecasts related to the anomaly detection job.
    • forecasts.memory.total (or fmt, forecastsMemoryTotal): The total memory usage in bytes for forecasts related to the anomaly detection job.
    • forecasts.records.avg (or fravg, forecastsRecordsAvg): The average number of model_forecast` documents written for forecasts related to the anomaly detection job.
    • forecasts.records.max (or frmax, forecastsRecordsMax): The maximum number of model_forecast documents written for forecasts related to the anomaly detection job.
    • forecasts.records.min (or frmin, forecastsRecordsMin): The minimum number of model_forecast documents written for forecasts related to the anomaly detection job.
    • forecasts.records.total (or frt, forecastsRecordsTotal): The total number of model_forecast documents written for forecasts related to the anomaly detection job.
    • forecasts.time.avg (or ftavg, forecastsTimeAvg): The average runtime in milliseconds for forecasts related to the anomaly detection job.
    • forecasts.time.max (or ftmax, forecastsTimeMax): The maximum runtime in milliseconds for forecasts related to the anomaly detection job.
    • forecasts.time.min (or ftmin, forecastsTimeMin): The minimum runtime in milliseconds for forecasts related to the anomaly detection job.
    • forecasts.time.total (or ftt, forecastsTimeTotal): The total runtime in milliseconds for forecasts related to the anomaly detection job.
    • forecasts.total (or ft, forecastsTotal): The number of individual forecasts currently available for the job.
    • id: Identifier for the anomaly detection job.
    • model.bucket_allocation_failures (or mbaf, modelBucketAllocationFailures): The number of buckets for which new entities in incoming data were not processed due to insufficient model memory.
    • model.by_fields (or mbf, modelByFields): The number of by field values that were analyzed by the models. This value is cumulative for all detectors in the job.
    • model.bytes (or mb, modelBytes): The number of bytes of memory used by the models. This is the maximum value since the last time the model was persisted. If the job is closed, this value indicates the latest size.
    • model.bytes_exceeded (or mbe, modelBytesExceeded): The number of bytes over the high limit for memory usage at the last allocation failure.
    • model.categorization_status (or mcs, modelCategorizationStatus): The status of categorization for the job: ok or warn. If ok, categorization is performing acceptably well (or not being used at all). If warn, categorization is detecting a distribution of categories that suggests the input data is inappropriate for categorization. Problems could be that there is only one category, more than 90% of categories are rare, the number of categories is greater than 50% of the number of categorized documents, there are no frequently matched categories, or more than 50% of categories are dead.
    • model.categorized_doc_count (or mcdc, modelCategorizedDocCount): The number of documents that have had a field categorized.
    • model.dead_category_count (or mdcc, modelDeadCategoryCount): The number of categories created by categorization that will never be assigned again because another category’s definition makes it a superset of the dead category. Dead categories are a side effect of the way categorization has no prior training.
    • model.failed_category_count (or mdcc, modelFailedCategoryCount): The number of times that categorization wanted to create a new category but couldn’t because the job had hit its model memory limit. This count does not track which specific categories failed to be created. Therefore, you cannot use this value to determine the number of unique categories that were missed.
    • model.frequent_category_count (or mfcc, modelFrequentCategoryCount): The number of categories that match more than 1% of categorized documents.
    • model.log_time (or mlt, modelLogTime): The timestamp when the model stats were gathered, according to server time.
    • model.memory_limit (or mml, modelMemoryLimit): The timestamp when the model stats were gathered, according to server time.
    • model.memory_status (or mms, modelMemoryStatus): The status of the mathematical models: ok, soft_limit, or hard_limit. If ok, the models stayed below the configured value. If soft_limit, the models used more than 60% of the configured memory limit and older unused models will be pruned to free up space. Additionally, in categorization jobs no further category examples will be stored. If hard_limit, the models used more space than the configured memory limit. As a result, not all incoming data was processed.
    • model.over_fields (or mof, modelOverFields): The number of over field values that were analyzed by the models. This value is cumulative for all detectors in the job.
    • model.partition_fields (or mpf, modelPartitionFields): The number of partition field values that were analyzed by the models. This value is cumulative for all detectors in the job.
    • model.rare_category_count (or mrcc, modelRareCategoryCount): The number of categories that match just one categorized document.
    • model.timestamp (or mt, modelTimestamp): The timestamp of the last record when the model stats were gathered.
    • model.total_category_count (or mtcc, modelTotalCategoryCount): The number of categories created by categorization.
    • node.address (or na, nodeAddress): The network address of the node that runs the job. This information is available only for open jobs.
    • node.ephemeral_id (or ne, nodeEphemeralId): The ephemeral ID of the node that runs the job. This information is available only for open jobs.
    • node.id (or ni, nodeId): The unique identifier of the node that runs the job. This information is available only for open jobs.
    • node.name (or nn, nodeName): The name of the node that runs the job. This information is available only for open jobs.
    • opened_time (or ot): For open jobs only, the elapsed time for which the job has been open.
    • state (or s): The status of the anomaly detection job: closed, closing, failed, opened, or opening. If closed, the job finished successfully with its model state persisted. The job must be opened before it can accept further data. If closing, the job close action is in progress and has not yet completed. A closing job cannot accept further data. If failed, the job did not finish successfully due to an error. This situation can occur due to invalid input data, a fatal error occurring during the analysis, or an external interaction such as the process being killed by the Linux out of memory (OOM) killer. If the job had irrevocably failed, it must be force closed and then deleted. If the datafeed can be corrected, the job can be closed and then re-opened. If opened, the job is available to receive and process data. If opening, the job open action is in progress and has not yet completed.
  • time string

    The unit used to display time values.

    Values are nanos, micros, ms, s, m, h, or d.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • id string
    • state string

      Values are closing, closed, opened, failed, or opening.

    • For open jobs only, the amount of time the job has been opened.

    • For open anomaly detection jobs only, contains messages relating to the selection of a node to run the job.

    • The number of input documents that have been processed by the anomaly detection job. This value includes documents with missing fields, since they are nonetheless analyzed. If you use datafeeds and have aggregations in your search query, the processed_record_count is the number of aggregation results processed, not the number of Elasticsearch documents.

    • The total number of fields in all the documents that have been processed by the anomaly detection job. Only fields that are specified in the detector configuration object contribute to this count. The timestamp is not included in this count.

    • The number of input documents posted to the anomaly detection job.

    • The total number of fields in input documents posted to the anomaly detection job. This count includes fields that are not used in the analysis. However, be aware that if you are using a datafeed, it extracts only the required fields from the documents it retrieves before posting them to the job.

    • The number of input documents with either a missing date field or a date that could not be parsed.

    • The number of input documents that are missing a field that the anomaly detection job is configured to analyze. Input documents with missing fields are still processed because it is possible that not all fields are missing. If you are using datafeeds or posting data to the job in JSON format, a high missing_field_count is often not an indication of data issues. It is not necessarily a cause for concern.

    • The number of input documents that have a timestamp chronologically preceding the start of the current anomaly detection bucket offset by the latency window. This information is applicable only when you provide data to the anomaly detection job by using the post data API. These out of order documents are discarded, since jobs require time series data to be in ascending chronological order.

    • The number of buckets which did not contain any data. If your data contains many empty buckets, consider increasing your bucket_span or using functions that are tolerant to gaps in data such as mean, non_null_sum or non_zero_count.

    • The number of buckets that contained few data points compared to the expected number of data points. If your data contains many sparse buckets, consider using a longer bucket_span.

    • The total number of buckets processed.

    • The timestamp of the earliest chronologically input document.

    • The timestamp of the latest chronologically input document.

    • The timestamp at which data was last analyzed, according to server time.

    • The timestamp of the last bucket that did not contain any data.

    • The timestamp of the last bucket that was considered sparse.

    • Values are ok, soft_limit, or hard_limit.

    • The upper limit for model memory usage, checked on increasing values.

    • The number of by field values that were analyzed by the models. This value is cumulative for all detectors in the job.

    • The number of over field values that were analyzed by the models. This value is cumulative for all detectors in the job.

    • The number of partition field values that were analyzed by the models. This value is cumulative for all detectors in the job.

    • The number of buckets for which new entities in incoming data were not processed due to insufficient model memory. This situation is also signified by a hard_limit: memory_status property value.

    • Values are ok or warn.

    • The number of documents that have had a field categorized.

    • The number of categories created by categorization.

    • The number of categories that match more than 1% of categorized documents.

    • The number of categories that match just one categorized document.

    • The number of categories created by categorization that will never be assigned again because another category’s definition makes it a superset of the dead category. Dead categories are a side effect of the way categorization has no prior training.

    • The number of times that categorization wanted to create a new category but couldn’t because the job had hit its model_memory_limit. This count does not track which specific categories failed to be created. Therefore you cannot use this value to determine the number of unique categories that were missed.

    • The timestamp when the model stats were gathered, according to server time.

    • The timestamp of the last record when the model stats were gathered.

    • The number of individual forecasts currently available for the job. A value of one or more indicates that forecasts exist.

    • The minimum memory usage in bytes for forecasts related to the anomaly detection job.

    • The maximum memory usage in bytes for forecasts related to the anomaly detection job.

    • The average memory usage in bytes for forecasts related to the anomaly detection job.

    • The total memory usage in bytes for forecasts related to the anomaly detection job.

    • The minimum number of model_forecast documents written for forecasts related to the anomaly detection job.

    • The maximum number of model_forecast documents written for forecasts related to the anomaly detection job.

    • The average number of model_forecast documents written for forecasts related to the anomaly detection job.

    • The total number of model_forecast documents written for forecasts related to the anomaly detection job.

    • The minimum runtime in milliseconds for forecasts related to the anomaly detection job.

    • The maximum runtime in milliseconds for forecasts related to the anomaly detection job.

    • The average runtime in milliseconds for forecasts related to the anomaly detection job.

    • The total runtime in milliseconds for forecasts related to the anomaly detection job.

    • node.id string
    • The name of the assigned node.

    • The network address of the assigned node.

    • The number of bucket results produced by the job.

    • The sum of all bucket processing times, in milliseconds.

    • The minimum of all bucket processing times, in milliseconds.

    • The maximum of all bucket processing times, in milliseconds.

    • The exponential moving average of all bucket processing times, in milliseconds.

    • The exponential moving average of bucket processing times calculated in a one hour time window, in milliseconds.

GET /_cat/ml/anomaly_detectors
curl \
 --request GET 'http://api.example.com/_cat/ml/anomaly_detectors' \
 --header "Authorization: $API_KEY"
Response examples (200)
A successful response from `GET _cat/ml/anomaly_detectors?h=id,s,dpr,mb&v=true&format=json`.
[
  {
    "id": "high_sum_total_sales",
    "s": "closed",
    "dpr": "14022",
    "mb": "1.5mb"
  },
  {
    "id": "low_request_rate",
    "s": "closed",
    "dpr": "1216",
    "mb": "40.5kb"
  },
  {
    "id": "response_code_rates",
    "s": "closed",
    "dpr": "28146",
    "mb": "132.7kb"
  },
  {
    "id": "url_scanning",
    "s": "closed",
    "dpr": "28146",
    "mb": "501.6kb"
  }
]
























































Get snapshot information Added in 2.1.0

GET /_cat/snapshots

Get information about the snapshots stored in one or more repositories. A snapshot is a backup of an index or running Elasticsearch cluster. IMPORTANT: cat APIs are only intended for human consumption using the command line or Kibana console. They are not intended for use by applications. For application consumption, use the get snapshot API.

Query parameters

  • If true, the response does not include information from unavailable snapshots.

  • h string | array[string]

    List of columns to appear in the response. Supports simple wildcards.

  • s string | array[string]

    List of columns that determine how the table should be sorted. Sorting defaults to ascending and can be changed by setting :asc or :desc as a suffix to the column name.

  • Period to wait for a connection to the master node.

  • time string

    Unit used to display time values.

    Values are nanos, micros, ms, s, m, h, or d.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • id string

      The unique identifier for the snapshot.

    • The repository name.

    • status string

      The state of the snapshot process. Returned values include: FAILED: The snapshot process failed. INCOMPATIBLE: The snapshot process is incompatible with the current cluster version. IN_PROGRESS: The snapshot process started but has not completed. PARTIAL: The snapshot process completed with a partial success. SUCCESS: The snapshot process completed with a full success.

    • start_epoch number | string

      Some APIs will return values such as numbers also as a string (notably epoch timestamps). This behavior is used to capture this behavior while keeping the semantics of the field type.

      Depending on the target language, code generators can keep the union or remove it and leniently parse strings to the target type.

    • start_time string | object

      A time of day, expressed either as hh:mm, noon, midnight, or an hour/minutes structure.

      One of:
    • end_epoch number | string

      Some APIs will return values such as numbers also as a string (notably epoch timestamps). This behavior is used to capture this behavior while keeping the semantics of the field type.

      Depending on the target language, code generators can keep the union or remove it and leniently parse strings to the target type.

    • end_time string

      Time of day, expressed as HH:MM:SS

    • duration string

      A duration. Units can be nanos, micros, ms (milliseconds), s (seconds), m (minutes), h (hours) and d (days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.

    • indices string

      The number of indices in the snapshot.

    • The number of successful shards in the snapshot.

    • The number of failed shards in the snapshot.

    • The total number of shards in the snapshot.

    • reason string

      The reason for any snapshot failures.

GET /_cat/snapshots
curl \
 --request GET 'http://api.example.com/_cat/snapshots' \
 --header "Authorization: $API_KEY"
Response examples (200)
A successful response from `GET /_cat/snapshots/repo1?v=true&s=id&format=json`.
[
  {
    "id": "snap1",
    "repository": "repo1",
    "status": "FAILED",
    "start_epoch": "1445616705",
    "start_time": "18:11:45",
    "end_epoch": "1445616978",
    "end_time": "18:16:18",
    "duration": "4.6m",
    "indices": "1",
    "successful_shards": "4",
    "failed_shards": "1",
    "total_shards": "5"
  },
  {
    "id": "snap2",
    "repository": "repo1",
    "status": "SUCCESS",
    "start_epoch": "1445634298",
    "start_time": "23:04:58",
    "end_epoch": "1445634672",
    "end_time": "23:11:12",
    "duration": "6.2m",
    "indices": "2",
    "successful_shards": "10",
    "failed_shards": "0",
    "total_shards": "10"
  }
]






























































































































































































Get the cluster health Added in 8.7.0

GET /_health_report/{feature}

Get a report with the health status of an Elasticsearch cluster. The report contains a list of indicators that compose Elasticsearch functionality.

Each indicator has a health status of: green, unknown, yellow or red. The indicator will provide an explanation and metadata describing the reason for its current health status.

The cluster’s status is controlled by the worst indicator status.

In the event that an indicator’s status is non-green, a list of impacts may be present in the indicator result which detail the functionalities that are negatively affected by the health issue. Each impact carries with it a severity level, an area of the system that is affected, and a simple description of the impact on the system.

Some health indicators can determine the root cause of a health problem and prescribe a set of steps that can be performed in order to improve the health of the system. The root cause and remediation steps are encapsulated in a diagnosis. A diagnosis contains a cause detailing a root cause analysis, an action containing a brief description of the steps to take to fix the problem, the list of affected resources (if applicable), and a detailed step-by-step troubleshooting guide to fix the diagnosed problem.

NOTE: The health indicators perform root cause analysis of non-green health statuses. This can be computationally expensive when called frequently. When setting up automated polling of the API for health status, set verbose to false to disable the more expensive analysis logic.

Path parameters

  • feature string | array[string] Required

    A feature of the cluster, as returned by the top-level health report API.

Query parameters

  • timeout string

    Explicit operation timeout.

  • verbose boolean

    Opt-in for more information about the health of the system.

  • size number

    Limit the number of affected resources the health report API returns.

Responses

GET /_health_report/{feature}
curl \
 --request GET 'http://api.example.com/_health_report/{feature}' \
 --header "Authorization: $API_KEY"





































Claim a connector sync job Technical preview

PUT /_connector/_sync_job/{connector_sync_job_id}/_claim

This action updates the job status to in_progress and sets the last_seen and started_at timestamps to the current time. Additionally, it can set the sync_cursor property for the sync job.

This API is not intended for direct connector management by users. It supports the implementation of services that utilize the connector protocol to communicate with Elasticsearch.

To sync data using self-managed connectors, you need to deploy the Elastic connector service on your own infrastructure. This service runs automatically on Elastic Cloud for Elastic managed connectors.

Path parameters

application/json

Body Required

  • The cursor object from the last incremental sync job. This should reference the sync_cursor field in the connector state for which the job runs.

  • worker_hostname string Required

    The host name of the current system that will run the job.

Responses

PUT /_connector/_sync_job/{connector_sync_job_id}/_claim
curl \
 --request PUT 'http://api.example.com/_connector/_sync_job/{connector_sync_job_id}/_claim' \
 --header "Authorization: $API_KEY" \
 --header "Content-Type: application/json" \
 --data '{"sync_cursor":{},"worker_hostname":"string"}'
























































Path parameters

  • connector_id string Required

    The unique identifier of the connector to be updated

application/json

Body Required

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • result string Required

      Values are created, updated, deleted, not_found, or noop.

PUT /_connector/{connector_id}/_name
curl \
 --request PUT 'http://api.example.com/_connector/{connector_id}/_name' \
 --header "Authorization: $API_KEY" \
 --header "Content-Type: application/json" \
 --data '"{\n    \"name\": \"Custom connector\",\n    \"description\": \"This is my customized connector\"\n}"'
Request example
{
    "name": "Custom connector",
    "description": "This is my customized connector"
}
Response examples (200)
{
  "result": "updated"
}

























































Pause a follower Added in 6.5.0

POST /{index}/_ccr/pause_follow

Pause a cross-cluster replication follower index. The follower index will not fetch any additional operations from the leader index. You can resume following with the resume follower API. You can pause and resume a follower index to change the configuration of the following task.

Path parameters

  • index string Required

    The name of the follower index.

Query parameters

  • The period to wait for a connection to the master node. If the master node is not available before the timeout expires, the request fails and returns an error. It can also be set to -1 to indicate that the request should never timeout.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • acknowledged boolean Required

      For a successful response, this value is always true. On failure, an exception is returned instead.

POST /{index}/_ccr/pause_follow
curl \
 --request POST 'http://api.example.com/{index}/_ccr/pause_follow' \
 --header "Authorization: $API_KEY"
Response examples (200)
A successful response from `POST /follower_index/_ccr/pause_follow`, which pauses a follower index.
{
  "acknowledged" : true
}


























































































Create a new document in the index Added in 5.0.0

PUT /{index}/_create/{id}

You can index a new JSON document with the /<target>/_doc/ or /<target>/_create/<_id> APIs Using _create guarantees that the document is indexed only if it does not already exist. It returns a 409 response when a document with a same ID already exists in the index. To update an existing document, you must use the /<target>/_doc/ API.

If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:

  • To add a document using the PUT /<target>/_create/<_id> or POST /<target>/_create/<_id> request formats, you must have the create_doc, create, index, or write index privilege.
  • To automatically create a data stream or index with this API request, you must have the auto_configure, create_index, or manage index privilege.

Automatic data stream creation requires a matching index template with data stream enabled.

Automatically create data streams and indices

If the request's target doesn't exist and matches an index template with a data_stream definition, the index operation automatically creates the data stream.

If the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.

NOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.

If no mapping exists, the index operation creates a dynamic mapping. By default, new fields and objects are automatically added to the mapping if needed.

Automatic index creation is controlled by the action.auto_create_index setting. If it is true, any index can be created automatically. You can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to false to turn off automatic index creation entirely. Specify a comma-separated list of patterns you want to allow or prefix each pattern with + or - to indicate whether it should be allowed or blocked. When a list is specified, the default behaviour is to disallow.

NOTE: The action.auto_create_index setting affects the automatic creation of indices only. It does not affect the creation of data streams.

Routing

By default, shard placement — or routing — is controlled by using a hash of the document's ID value. For more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the routing parameter.

When setting up explicit mapping, you can also use the _routing field to direct the index operation to extract the routing value from the document itself. This does come at the (very minimal) cost of an additional document parsing pass. If the _routing mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.

NOTE: Data streams do not support custom routing unless they were created with the allow_custom_routing setting enabled in the template.

Distributed

The index operation is directed to the primary shard based on its route and performed on the actual node containing this shard. After the primary shard completes the operation, if needed, the update is distributed to applicable replicas.

Active shards

To improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation. If the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs. By default, write operations only wait for the primary shards to be active before proceeding (that is to say wait_for_active_shards is 1). This default can be overridden in the index settings dynamically by setting index.write.wait_for_active_shards. To alter this behavior per operation, use the wait_for_active_shards request parameter.

Valid values are all or any positive integer up to the total number of configured copies per shard in the index (which is number_of_replicas+1). Specifying a negative value or a number greater than the number of shard copies will throw an error.

For example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes). If you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding. This means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data. If wait_for_active_shards is set on the request to 3 (and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding. This requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard. However, if you set wait_for_active_shards to all (or to 4, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index. The operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.

It is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts. After the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary. The _shards section of the API response reveals the number of shard copies on which replication succeeded and failed.

External documentation

Path parameters

  • index string Required

    The name of the data stream or index to target. If the target doesn't exist and matches the name or wildcard (*) pattern of an index template with a data_stream definition, this request creates the data stream. If the target doesn't exist and doesn’t match a data stream template, this request creates the index.

  • id string Required

    A unique identifier for the document. To automatically generate a document ID, use the POST /<target>/_doc/ request format.

Query parameters

  • Only perform the operation if the document has this primary term.

  • Only perform the operation if the document has this sequence number.

  • True or false if to include the document source in the error message in case of parsing errors.

  • op_type string

    Set to create to only index the document if it does not already exist (put if absent). If a document with the specified _id already exists, the indexing operation will fail. The behavior is the same as using the <index>/_create endpoint. If a document ID is specified, this paramater defaults to index. Otherwise, it defaults to create. If the request targets a data stream, an op_type of create is required.

    Supported values include:

    • index: Overwrite any documents that already exist.
    • create: Only index documents that do not already exist.

    Values are index or create.

  • pipeline string

    The ID of the pipeline to use to preprocess incoming documents. If the index has a default ingest pipeline specified, setting the value to _none turns off the default ingest pipeline for this request. If a final pipeline is configured, it will always run regardless of the value of this parameter.

  • refresh string

    If true, Elasticsearch refreshes the affected shards to make this operation visible to search. If wait_for, it waits for a refresh to make this operation visible to search. If false, it does nothing with refreshes.

    Values are true, false, or wait_for.

  • If true, the destination must be an index alias.

  • If true, the request's actions must target a data stream (existing or to be created).

  • routing string

    A custom value that is used to route operations to a specific shard.

  • timeout string

    The period the request waits for the following operations: automatic index creation, dynamic mapping updates, waiting for active shards. Elasticsearch waits for at least the specified timeout period before failing. The actual wait time could be longer, particularly when multiple waits occur.

    This parameter is useful for situations where the primary shard assigned to perform the operation might not be available when the operation runs. Some reasons for this might be that the primary shard is currently recovering from a gateway or undergoing relocation. By default, the operation will wait on the primary shard to become available for at least 1 minute before failing and responding with an error. The actual wait time could be longer, particularly when multiple waits occur.

  • version number

    The explicit version number for concurrency control. It must be a non-negative long number.

  • The version type.

    Supported values include:

    • internal: Use internal versioning that starts at 1 and increments with each update or delete.
    • external: Only index the document if the specified version is strictly higher than the version of the stored document or if there is no existing document.
    • external_gte: Only index the document if the specified version is equal or higher than the version of the stored document or if there is no existing document. NOTE: The external_gte version type is meant for special use cases and should be used with care. If used incorrectly, it can result in loss of data.
    • force: This option is deprecated because it can cause primary and replica shards to diverge.

    Values are internal, external, external_gte, or force.

  • wait_for_active_shards number | string

    The number of shard copies that must be active before proceeding with the operation. You can set it to all or any positive integer up to the total number of shards in the index (number_of_replicas+1). The default value of 1 means it waits for each primary shard to be active.

application/json

Body Required

object object

Responses

PUT /{index}/_create/{id}
curl \
 --request PUT 'http://api.example.com/{index}/_create/{id}' \
 --header "Authorization: $API_KEY" \
 --header "Content-Type: application/json" \
 --data '"{\n  \"@timestamp\": \"2099-11-15T13:12:00\",\n  \"message\": \"GET /search HTTP/1.1 200 1070000\",\n  \"user\": {\n    \"id\": \"kimchy\"\n  }\n}"'
Request example
Run `PUT my-index-000001/_create/1` to index a document into the `my-index-000001` index if no document with that ID exists.
{
  "@timestamp": "2099-11-15T13:12:00",
  "message": "GET /search HTTP/1.1 200 1070000",
  "user": {
    "id": "kimchy"
  }
}






























































































































































Run an async ES|QL query Added in 8.13.0

POST /_query/async

Asynchronously run an ES|QL (Elasticsearch query language) query, monitor its progress, and retrieve results when they become available.

The API accepts the same parameters and request body as the synchronous query API, along with additional async related properties.

External documentation

Query parameters

  • If true, partial results will be returned if there are shard failures, but the query can continue to execute on other clusters and shards. If false, the query will fail if there are any failures.

    To override the default behavior, you can set the esql.query.allow_partial_results cluster setting to false.

  • The character to use between values within a CSV row. It is valid only for the CSV format.

  • Indicates whether columns that are entirely null will be removed from the columns and values portion of the results. If true, the response will include an extra section under the name all_columns which has the name of all the columns.

  • format string

    A short version of the Accept header, for example json or yaml.

    Values are csv, json, tsv, txt, yaml, cbor, smile, or arrow.

  • The period for which the query and its results are stored in the cluster. The default period is five days. When this period expires, the query and its results are deleted, even if the query is still ongoing. If the keep_on_completion parameter is false, Elasticsearch only stores async queries that do not complete within the period set by the wait_for_completion_timeout parameter, regardless of this value.

  • Indicates whether the query and its results are stored in the cluster. If false, the query and its results are stored in the cluster only if the request does not complete during the period set by the wait_for_completion_timeout parameter.

  • The period to wait for the request to finish. By default, the request waits for 1 second for the query results. If the query completes during this period, results are returned Otherwise, a query ID is returned that can later be used to retrieve the results.

application/json

Body Required

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • Time unit for milliseconds

    • is_partial boolean
    • all_columns array[object]
      Hide all_columns attributes Show all_columns attributes object
    • columns array[object] Required
      Hide columns attributes Show columns attributes object
    • values array[array] Required

      A field value.

      A field value.

    • Hide _clusters attributes Show _clusters attributes object
    • profile object

      Profiling information. Present if profile was true in the request. The contents of this field are currently unstable.

    • id string
    • is_running boolean Required
POST /_query/async
curl \
 --request POST 'http://api.example.com/_query/async' \
 --header "Authorization: $API_KEY" \
 --header "Content-Type: application/json" \
 --data '"{\n  \"query\": \"\"\"\n    FROM library,remote-*:library\n    | EVAL year = DATE_TRUNC(1 YEARS, release_date)\n    | STATS MAX(page_count) BY year\n    | SORT year\n    | LIMIT 5\n  \"\"\",\n  \"wait_for_completion_timeout\": \"2s\",\n  \"include_ccs_metadata\": true\n}"'
Request example
{
  "query": """
    FROM library,remote-*:library
    | EVAL year = DATE_TRUNC(1 YEARS, release_date)
    | STATS MAX(page_count) BY year
    | SORT year
    | LIMIT 5
  """,
  "wait_for_completion_timeout": "2s",
  "include_ccs_metadata": true
}
















































































































Get tokens from text analysis

GET /_analyze

The analyze API performs analysis on a text string and returns the resulting tokens.

Generating excessive amount of tokens may cause a node to run out of memory. The index.analyze.max_token_count setting enables you to limit the number of tokens that can be produced. If more than this limit of tokens gets generated, an error occurs. The _analyze endpoint without a specified index will always use 10000 as its limit.

External documentation

Query parameters

  • index string

    Index used to derive the analyzer. If specified, the analyzer or field parameter overrides this value. If no index is specified or the index does not have a default analyzer, the analyze API uses the standard analyzer.

application/json

Body

Responses

GET /_analyze
curl \
 --request GET 'http://api.example.com/_analyze' \
 --header "Authorization: $API_KEY" \
 --header "Content-Type: application/json" \
 --data '"{\n  \"analyzer\": \"standard\",\n  \"text\": \"this is a test\"\n}"'
You can apply any of the built-in analyzers to the text string without specifying an index.
{
  "analyzer": "standard",
  "text": "this is a test"
}
If the text parameter is provided as array of strings, it is analyzed as a multi-value field.
{
  "analyzer": "standard",
  "text": [
    "this is a test",
    "the second text"
  ]
}
You can test a custom transient analyzer built from tokenizers, token filters, and char filters. Token filters use the filter parameter.
{
  "tokenizer": "keyword",
  "filter": [
    "lowercase"
  ],
  "char_filter": [
    "html_strip"
  ],
  "text": "this is a <b>test</b>"
}
Custom tokenizers, token filters, and character filters can be specified in the request body.
{
  "tokenizer": "whitespace",
  "filter": [
    "lowercase",
    {
      "type": "stop",
      "stopwords": [
        "a",
        "is",
        "this"
      ]
    }
  ],
  "text": "this is a test"
}
Run `GET /analyze_sample/_analyze` to run an analysis on the text using the default index analyzer associated with the `analyze_sample` index. Alternatively, the analyzer can be derived based on a field mapping.
{
  "field": "obj1.field1",
  "text": "this is a test"
}
Run `GET /analyze_sample/_analyze` and supply a normalizer for a keyword field if there is a normalizer associated with the specified index.
{
  "normalizer": "my_normalizer",
  "text": "BaR"
}
If you want to get more advanced details, set `explain` to `true`. It will output all token attributes for each token. You can filter token attributes you want to output by setting the `attributes` option. NOTE: The format of the additional detail information is labelled as experimental in Lucene and it may change in the future.
{
  "tokenizer": "standard",
  "filter": [
    "snowball"
  ],
  "text": "detailed output",
  "explain": true,
  "attributes": [
    "keyword"
  ]
}
Response examples (200)
A successful response for an analysis with `explain` set to `true`.
{
  "detail": {
    "custom_analyzer": true,
    "charfilters": [],
    "tokenizer": {
      "name": "standard",
      "tokens": [
        {
          "token": "detailed",
          "start_offset": 0,
          "end_offset": 8,
          "type": "<ALPHANUM>",
          "position": 0
        },
        {
          "token": "output",
          "start_offset": 9,
          "end_offset": 15,
          "type": "<ALPHANUM>",
          "position": 1
        }
      ]
    },
    "tokenfilters": [
      {
        "name": "snowball",
        "tokens": [
          {
            "token": "detail",
            "start_offset": 0,
            "end_offset": 8,
            "type": "<ALPHANUM>",
            "position": 0,
            "keyword": false
          },
          {
            "token": "output",
            "start_offset": 9,
            "end_offset": 15,
            "type": "<ALPHANUM>",
            "position": 1,
            "keyword": false
          }
        ]
      }
    ]
  }
}












































































Delete an alias

DELETE /{index}/_aliases/{name}

Removes a data stream or index from an alias.

Path parameters

  • index string | array[string] Required

    Comma-separated list of data streams or indices used to limit the request. Supports wildcards (*).

  • name string | array[string] Required

    Comma-separated list of aliases to remove. Supports wildcards (*). To remove all aliases, use * or _all.

Query parameters

  • Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.

  • timeout string

    Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • acknowledged boolean Required

      For a successful response, this value is always true. On failure, an exception is returned instead.

DELETE /{index}/_aliases/{name}
curl \
 --request DELETE 'http://api.example.com/{index}/_aliases/{name}' \
 --header "Authorization: $API_KEY"












































































Force a merge Added in 2.1.0

POST /_forcemerge

Perform the force merge operation on the shards of one or more indices. For data streams, the API forces a merge on the shards of the stream's backing indices.

Merging reduces the number of segments in each shard by merging some of them together and also frees up the space used by deleted documents. Merging normally happens automatically, but sometimes it is useful to trigger a merge manually.

WARNING: We recommend force merging only a read-only index (meaning the index is no longer receiving writes). When documents are updated or deleted, the old version is not immediately removed but instead soft-deleted and marked with a "tombstone". These soft-deleted documents are automatically cleaned up during regular segment merges. But force merge can cause very large (greater than 5 GB) segments to be produced, which are not eligible for regular merges. So the number of soft-deleted documents can then grow rapidly, resulting in higher disk usage and worse search performance. If you regularly force merge an index receiving writes, this can also make snapshots more expensive, since the new documents can't be backed up incrementally.

Blocks during a force merge

Calls to this API block until the merge is complete (unless request contains wait_for_completion=false). If the client connection is lost before completion then the force merge process will continue in the background. Any new requests to force merge the same indices will also block until the ongoing force merge is complete.

Running force merge asynchronously

If the request contains wait_for_completion=false, Elasticsearch performs some preflight checks, launches the request, and returns a task you can use to get the status of the task. However, you can not cancel this task as the force merge task is not cancelable. Elasticsearch creates a record of this task as a document at _tasks/<task_id>. When you are done with a task, you should delete the task document so Elasticsearch can reclaim the space.

Force merging multiple indices

You can force merge multiple indices with a single request by targeting:

  • One or more data streams that contain multiple backing indices
  • Multiple indices
  • One or more aliases
  • All data streams and indices in a cluster

Each targeted shard is force-merged separately using the force_merge threadpool. By default each node only has a single force_merge thread which means that the shards on that node are force-merged one at a time. If you expand the force_merge threadpool on a node then it will force merge its shards in parallel

Force merge makes the storage for the shard being merged temporarily increase, as it may require free space up to triple its size in case max_num_segments parameter is set to 1, to rewrite all segments into a new one.

Data streams and time-based indices

Force-merging is useful for managing a data stream's older backing indices and other time-based indices, particularly after a rollover. In these cases, each index only receives indexing traffic for a certain period of time. Once an index receive no more writes, its shards can be force-merged to a single segment. This can be a good idea because single-segment shards can sometimes use simpler and more efficient data structures to perform searches. For example:

POST /.ds-my-data-stream-2099.03.07-000001/_forcemerge?max_num_segments=1
External documentation

Query parameters

  • Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes _all string or when no indices have been specified)

  • expand_wildcards string | array[string]

    Whether to expand wildcard expression to concrete indices that are open, closed or both.

    Supported values include:

    • all: Match any data stream or index, including hidden ones.
    • open: Match open, non-hidden indices. Also matches any non-hidden data stream.
    • closed: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.
    • hidden: Match hidden data streams and hidden indices. Must be combined with open, closed, or both.
    • none: Wildcard expressions are not accepted.
  • flush boolean

    Specify whether the index should be flushed after performing the operation (default: true)

  • Whether specified concrete indices should be ignored when unavailable (missing or closed)

  • The number of segments the index should be merged into (default: dynamic)

  • Specify whether the operation should only expunge deleted documents

  • Should the request wait until the force merge is completed.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • _shards object
      Hide _shards attributes Show _shards attributes object
    • task string

      task contains a task id returned when wait_for_completion=false, you can use the task_id to get the status of the task at _tasks/

POST /_forcemerge
curl \
 --request POST 'http://api.example.com/_forcemerge' \
 --header "Authorization: $API_KEY"
































































Get index templates

GET /_template

Get information about one or more index templates.

IMPORTANT: This documentation is about legacy index templates, which are deprecated and will be replaced by the composable templates introduced in Elasticsearch 7.8.

External documentation

Query parameters

  • If true, returns settings in flat format.

  • local boolean

    If true, the request retrieves information from the local node only.

  • Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.

Responses

GET /_template
curl \
 --request GET 'http://api.example.com/_template' \
 --header "Authorization: $API_KEY"
































Reload search analyzers Added in 7.3.0

POST /{index}/_reload_search_analyzers

Reload an index's search analyzers and their resources. For data streams, the API reloads search analyzers and resources for the stream's backing indices.

IMPORTANT: After reloading the search analyzers you should clear the request cache to make sure it doesn't contain responses derived from the previous versions of the analyzer.

You can use the reload search analyzers API to pick up changes to synonym files used in the synonym_graph or synonym token filter of a search analyzer. To be eligible, the token filter must have an updateable flag of true and only be used in search analyzers.

NOTE: This API does not perform a reload for each shard of an index. Instead, it performs a reload for each node containing index shards. As a result, the total shard count returned by the API can differ from the number of index shards. Because reloading affects every node with an index shard, it is important to update the synonym file on every data node in the cluster--including nodes that don't contain a shard replica--before using this API. This ensures the synonym file is updated everywhere in the cluster in case shards are relocated in the future.

External documentation

Path parameters

  • index string | array[string] Required

    A comma-separated list of index names to reload analyzers for

Query parameters

  • Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes _all string or when no indices have been specified)

  • expand_wildcards string | array[string]

    Whether to expand wildcard expression to concrete indices that are open, closed or both.

    Supported values include:

    • all: Match any data stream or index, including hidden ones.
    • open: Match open, non-hidden indices. Also matches any non-hidden data stream.
    • closed: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.
    • hidden: Match hidden data streams and hidden indices. Must be combined with open, closed, or both.
    • none: Wildcard expressions are not accepted.
  • Whether specified concrete indices should be ignored when unavailable (missing or closed)

  • resource string

    Changed resource to reload analyzers from if applicable

Responses

POST /{index}/_reload_search_analyzers
curl \
 --request POST 'http://api.example.com/{index}/_reload_search_analyzers' \
 --header "Authorization: $API_KEY"




Resolve the cluster Added in 8.13.0

GET /_resolve/cluster/{name}

Resolve the specified index expressions to return information about each cluster, including the local "querying" cluster, if included. If no index expression is provided, the API will return information about all the remote clusters that are configured on the querying cluster.

This endpoint is useful before doing a cross-cluster search in order to determine which remote clusters should be included in a search.

You use the same index expression with this endpoint as you would for cross-cluster search. Index and cluster exclusions are also supported with this endpoint.

For each cluster in the index expression, information is returned about:

  • Whether the querying ("local") cluster is currently connected to each remote cluster specified in the index expression. Note that this endpoint actively attempts to contact the remote clusters, unlike the remote/info endpoint.
  • Whether each remote cluster is configured with skip_unavailable as true or false.
  • Whether there are any indices, aliases, or data streams on that cluster that match the index expression.
  • Whether the search is likely to have errors returned when you do the cross-cluster search (including any authorization errors if you do not have permission to query the index).
  • Cluster version information, including the Elasticsearch server version.

For example, GET /_resolve/cluster/my-index-*,cluster*:my-index-* returns information about the local cluster and all remotely configured clusters that start with the alias cluster*. Each cluster returns information about whether it has any indices, aliases or data streams that match my-index-*.

Note on backwards compatibility

The ability to query without an index expression was added in version 8.18, so when querying remote clusters older than that, the local cluster will send the index expression dummy* to those remote clusters. Thus, if an errors occur, you may see a reference to that index expression even though you didn't request it. If it causes a problem, you can instead include an index expression like *:* to bypass the issue.

You may want to exclude a cluster or index from a search when:

  • A remote cluster is not currently connected and is configured with skip_unavailable=false. Running a cross-cluster search under those conditions will cause the entire search to fail.
  • A cluster has no matching indices, aliases or data streams for the index expression (or your user does not have permissions to search them). For example, suppose your index expression is logs*,remote1:logs* and the remote1 cluster has no indices, aliases or data streams that match logs*. In that case, that cluster will return no results from that cluster if you include it in a cross-cluster search.
  • The index expression (combined with any query parameters you specify) will likely cause an exception to be thrown when you do the search. In these cases, the "error" field in the _resolve/cluster response will be present. (This is also where security/permission errors will be shown.)
  • A remote cluster is an older version that does not support the feature you want to use in your search.

Test availability of remote clusters

The remote/info endpoint is commonly used to test whether the "local" cluster (the cluster being queried) is connected to its remote clusters, but it does not necessarily reflect whether the remote cluster is available or not. The remote cluster may be available, while the local cluster is not currently connected to it.

You can use the _resolve/cluster API to attempt to reconnect to remote clusters. For example with GET _resolve/cluster or GET _resolve/cluster/*:*. The connected field in the response will indicate whether it was successful. If a connection was (re-)established, this will also cause the remote/info endpoint to now indicate a connected status.

Path parameters

  • name string | array[string] Required

    A comma-separated list of names or index patterns for the indices, aliases, and data streams to resolve. Resources on remote clusters can be specified using the <cluster>:<name> syntax. Index and cluster exclusions (e.g., -cluster1:*) are also supported. If no index expression is specified, information about all remote clusters configured on the local cluster is returned without doing any index matching

Query parameters

  • If false, the request returns an error if any wildcard expression, index alias, or _all value targets only missing or closed indices. This behavior applies even if the request targets other open indices. For example, a request targeting foo*,bar* returns an error if an index starts with foo but no index starts with bar. NOTE: This option is only supported when specifying an index expression. You will get an error if you specify index options to the _resolve/cluster API endpoint that takes no index expression.

  • expand_wildcards string | array[string]

    Type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. Supports comma-separated values, such as open,hidden. Valid values are: all, open, closed, hidden, none. NOTE: This option is only supported when specifying an index expression. You will get an error if you specify index options to the _resolve/cluster API endpoint that takes no index expression.

    Supported values include:

    • all: Match any data stream or index, including hidden ones.
    • open: Match open, non-hidden indices. Also matches any non-hidden data stream.
    • closed: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.
    • hidden: Match hidden data streams and hidden indices. Must be combined with open, closed, or both.
    • none: Wildcard expressions are not accepted.
  • ignore_throttled boolean Deprecated

    If true, concrete, expanded, or aliased indices are ignored when frozen. NOTE: This option is only supported when specifying an index expression. You will get an error if you specify index options to the _resolve/cluster API endpoint that takes no index expression.

  • If false, the request returns an error if it targets a missing or closed index. NOTE: This option is only supported when specifying an index expression. You will get an error if you specify index options to the _resolve/cluster API endpoint that takes no index expression.

  • timeout string

    The maximum time to wait for remote clusters to respond. If a remote cluster does not respond within this timeout period, the API response will show the cluster as not connected and include an error message that the request timed out.

    The default timeout is unset and the query can take as long as the networking layer is configured to wait for remote clusters that are not responding (typically 30 seconds).

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • * object Additional properties
      Hide * attributes Show * attributes object
GET /_resolve/cluster/{name}
curl \
 --request GET 'http://api.example.com/_resolve/cluster/{name}' \
 --header "Authorization: $API_KEY"
Response examples (200)
A successful response from `GET /_resolve/cluster/my-index*,clust*:my-index*`. Each cluster has its own response section. The cluster you sent the request to is labelled as "(local)".
{
  "(local)": {
    "connected": true,
    "skip_unavailable": false,
    "matching_indices": true,
    "version": {
      "number": "8.13.0",
      "build_flavor": "default",
      "minimum_wire_compatibility_version": "7.17.0",
      "minimum_index_compatibility_version": "7.0.0"
    }
  },
  "cluster_one": {
    "connected": true,
    "skip_unavailable": true,
    "matching_indices": true,
    "version": {
      "number": "8.13.0",
      "build_flavor": "default",
      "minimum_wire_compatibility_version": "7.17.0",
      "minimum_index_compatibility_version": "7.0.0"
    }
  },
  "cluster_two": {
    "connected": true,
    "skip_unavailable": false,
    "matching_indices": true,
    "version": {
      "number": "8.13.0",
      "build_flavor": "default",
      "minimum_wire_compatibility_version": "7.17.0",
      "minimum_index_compatibility_version": "7.0.0"
    }
  }
}
A successful response from `GET /_resolve/cluster/not-present,clust*:my-index*,oldcluster:*?ignore_unavailable=false&timeout=5s`. This type of request can be used to identify potential problems with your cross-cluster search. Note also that a `timeout` of 5 seconds is sent, which sets the maximum time the query will wait for remote clusters to respond. The local cluster has no index called `not_present`. Searching with `ignore_unavailable=false` would return a "no such index" error. The `cluster_one` remote cluster has no indices that match the pattern `my-index*`. There may be no indices that match the pattern or the index could be closed. The `cluster_two` remote cluster is not connected (the attempt to connect failed). Since this cluster is marked as `skip_unavailable=false`, you should probably exclude this cluster from the search by adding `-cluster_two:*` to the search index expression. For `cluster_three`, the error message indicates that this remote cluster did not respond within the 5-second timeout window specified, so it is also marked as not connected. The `oldcluster` remote cluster shows that it has matching indices, but no version information is included. This indicates that the cluster version predates the introduction of the `_resolve/cluster` API, so you may want to exclude it from your cross-cluster search.
{
  "(local)": {
    "connected": true,
    "skip_unavailable": false,
    "error": "no such index [not_present]"
  },
  "cluster_one": {
    "connected": true,
    "skip_unavailable": true,
    "matching_indices": false,
    "version": {
      "number": "8.13.0",
      "build_flavor": "default",
      "minimum_wire_compatibility_version": "7.17.0",
      "minimum_index_compatibility_version": "7.0.0"
    }
  },
  "cluster_two": {
    "connected": false,
    "skip_unavailable": false
  },
  "cluster_three": {
    "connected": false,
    "skip_unavailable": false,
    "error": "Request timed out before receiving a response from the remote cluster"
  },
  "oldcluster": {
    "connected": true,
    "skip_unavailable": false,
    "matching_indices": true
  }
}


















































































































































































Delete an inference endpoint Added in 8.11.0

DELETE /_inference/{task_type}/{inference_id}

Path parameters

  • task_type string Required

    The task type

    Values are sparse_embedding, text_embedding, rerank, completion, or chat_completion.

  • inference_id string Required

    The inference identifier.

Query parameters

  • dry_run boolean

    When true, the endpoint is not deleted and a list of ingest processors which reference this endpoint is returned.

  • force boolean

    When true, the inference endpoint is forcefully deleted even if it is still being used by ingest processors or semantic text fields.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • acknowledged boolean Required

      For a successful response, this value is always true. On failure, an exception is returned instead.

    • pipelines array[string] Required
DELETE /_inference/{task_type}/{inference_id}
curl \
 --request DELETE 'http://api.example.com/_inference/{task_type}/{inference_id}' \
 --header "Authorization: $API_KEY"









































































































































































































































































Path parameters

  • calendar_id string Required

    A string that uniquely identifies a calendar. You can get information for multiple calendars by using a comma-separated list of ids or a wildcard expression. You can get information for all calendars by using _all or * or by omitting the calendar identifier.

Query parameters

  • from number

    Skips the specified number of calendars. This parameter is supported only when you omit the calendar identifier.

  • size number

    Specifies the maximum number of calendars to obtain. This parameter is supported only when you omit the calendar identifier.

application/json

Body

  • page object
    Hide page attributes Show page attributes object
    • from number

      Skips the specified number of items.

    • size number

      Specifies the maximum number of items to obtain.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • calendars array[object] Required
      Hide calendars attributes Show calendars attributes object
    • count number Required
POST /_ml/calendars/{calendar_id}
curl \
 --request POST 'http://api.example.com/_ml/calendars/{calendar_id}' \
 --header "Authorization: $API_KEY" \
 --header "Content-Type: application/json" \
 --data '{"page":{"from":42.0,"size":42.0}}'




Delete events from a calendar Added in 6.2.0

DELETE /_ml/calendars/{calendar_id}/events/{event_id}

Path parameters

  • calendar_id string Required

    A string that uniquely identifies a calendar.

  • event_id string Required

    Identifier for the scheduled event. You can obtain this identifier by using the get calendar events API.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • acknowledged boolean Required

      For a successful response, this value is always true. On failure, an exception is returned instead.

DELETE /_ml/calendars/{calendar_id}/events/{event_id}
curl \
 --request DELETE 'http://api.example.com/_ml/calendars/{calendar_id}/events/{event_id}' \
 --header "Authorization: $API_KEY"
Response examples (200)
A successful response when deleting a calendar event.
{
  "acknowledged": true
}




























































Delete an anomaly detection job Added in 5.4.0

DELETE /_ml/anomaly_detectors/{job_id}

All job configuration, model state and results are deleted. It is not currently possible to delete multiple jobs using wildcards or a comma separated list. If you delete a job that has a datafeed, the request first tries to delete the datafeed. This behavior is equivalent to calling the delete datafeed API with the same timeout and force parameters as the delete job request.

Path parameters

  • job_id string Required

    Identifier for the anomaly detection job.

Query parameters

  • force boolean

    Use to forcefully delete an opened job; this method is quicker than closing and deleting the job.

  • Specifies whether annotations that have been added by the user should be deleted along with any auto-generated annotations when the job is reset.

  • Specifies whether the request should return immediately or wait until the job deletion completes.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • acknowledged boolean Required

      For a successful response, this value is always true. On failure, an exception is returned instead.

DELETE /_ml/anomaly_detectors/{job_id}
curl \
 --request DELETE 'http://api.example.com/_ml/anomaly_detectors/{job_id}' \
 --header "Authorization: $API_KEY"
Response examples (200)
A successful response when deleting an anomaly detection job.
{
  "acknowledged": true
}
A successful response when deleting an anomaly detection job asynchronously. When the `wait_for_completion` query parameter is set to `false`, the response contains an identifier for the job deletion task.
{
  "task": "oTUltX4IQMOUUVeiohTt8A:39"
}
































































Get anomaly detection job results for categories Added in 5.4.0

POST /_ml/anomaly_detectors/{job_id}/results/categories

Path parameters

  • job_id string Required

    Identifier for the anomaly detection job.

Query parameters

  • from number

    Skips the specified number of categories.

  • Only return categories for the specified partition.

  • size number

    Specifies the maximum number of categories to obtain.

application/json

Body

  • page object
    Hide page attributes Show page attributes object
    • from number

      Skips the specified number of items.

    • size number

      Specifies the maximum number of items to obtain.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • categories array[object] Required
      Hide categories attributes Show categories attributes object
      • category_id number Required
      • examples array[string] Required

        A list of examples of actual values that matched the category.

      • job_id string Required
      • max_matching_length number Required
      • If per-partition categorization is enabled, this property identifies the field used to segment the categorization. It is not present when per-partition categorization is disabled.

      • If per-partition categorization is enabled, this property identifies the value of the partition_field_name for the category. It is not present when per-partition categorization is disabled.

      • regex string Required

        A regular expression that is used to search for values that match the category.

      • terms string Required

        A space separated list of the common tokens that are matched in values of the category.

      • The number of messages that have been matched by this category. This is only guaranteed to have the latest accurate count after a job _flush or _close

      • A list of category_id entries that this current category encompasses. Any new message that is processed by the categorizer will match against this category and not any of the categories in this list. This is only guaranteed to have the latest accurate list of categories after a job _flush or _close

      • p string
      • result_type string Required
      • mlcategory string Required
    • count number Required
POST /_ml/anomaly_detectors/{job_id}/results/categories
curl \
 --request POST 'http://api.example.com/_ml/anomaly_detectors/{job_id}/results/categories' \
 --header "Authorization: $API_KEY" \
 --header "Content-Type: application/json" \
 --data '{"page":{"from":42.0,"size":42.0}}'




































































































Stop datafeeds Added in 5.4.0

POST /_ml/datafeeds/{datafeed_id}/_stop

A datafeed that is stopped ceases to retrieve data from Elasticsearch. A datafeed can be started and stopped multiple times throughout its lifecycle.

Path parameters

  • datafeed_id string Required

    Identifier for the datafeed. You can stop multiple datafeeds in a single API request by using a comma-separated list of datafeeds or a wildcard expression. You can close all datafeeds by using _all or by specifying * as the identifier.

Query parameters

  • Specifies what to do when the request:

    • Contains wildcard expressions and there are no datafeeds that match.
    • Contains the _all string or no identifiers and there are no matches.
    • Contains wildcard expressions and there are only partial matches.

    If true, the API returns an empty datafeeds array when there are no matches and the subset of results when there are partial matches. If false, the API returns a 404 status code when there are no matches or only partial matches.

  • force boolean

    If true, the datafeed is stopped forcefully.

  • timeout string

    Specifies the amount of time to wait until a datafeed stops.

application/json

Body

  • Refer to the description for the allow_no_match query parameter.

  • force boolean

    Refer to the description for the force query parameter.

  • timeout string

    A duration. Units can be nanos, micros, ms (milliseconds), s (seconds), m (minutes), h (hours) and d (days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
POST /_ml/datafeeds/{datafeed_id}/_stop
curl \
 --request POST 'http://api.example.com/_ml/datafeeds/{datafeed_id}/_stop' \
 --header "Authorization: $API_KEY" \
 --header "Content-Type: application/json" \
 --data '{"allow_no_match":true,"force":true,"timeout":"string"}'





















Get data frame analytics job configuration info Added in 7.3.0

GET /_ml/data_frame/analytics/{id}

You can get information for multiple data frame analytics jobs in a single API request by using a comma-separated list of data frame analytics jobs or a wildcard expression.

Path parameters

  • id string Required

    Identifier for the data frame analytics job. If you do not specify this option, the API returns information for the first hundred data frame analytics jobs.

Query parameters

  • Specifies what to do when the request:

    1. Contains wildcard expressions and there are no data frame analytics jobs that match.
    2. Contains the _all string or no identifiers and there are no matches.
    3. Contains wildcard expressions and there are only partial matches.

    The default value returns an empty data_frame_analytics array when there are no matches and the subset of results when there are partial matches. If this parameter is false, the request returns a 404 status code when there are no matches or only partial matches.

  • from number

    Skips the specified number of data frame analytics jobs.

  • size number

    Specifies the maximum number of data frame analytics jobs to obtain.

  • Indicates if certain fields should be removed from the configuration on retrieval. This allows the configuration to be in an acceptable format to be retrieved and then added to another cluster.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • count number Required
    • data_frame_analytics array[object] Required

      An array of data frame analytics job resources, which are sorted by the id value in ascending order.

      Hide data_frame_analytics attributes Show data_frame_analytics attributes object
      • analysis object Required
        Hide analysis attributes Show analysis attributes object
        • Hide classification attributes Show classification attributes object
          • alpha number

            Advanced configuration option. Machine learning uses loss guided tree growing, which means that the decision trees grow where the regularized loss decreases most quickly. This parameter affects loss calculations by acting as a multiplier of the tree depth. Higher alpha values result in shallower trees and faster training times. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to zero.

          • dependent_variable string Required

            Defines which field of the document is to be predicted. It must match one of the fields in the index being used to train. If this field is missing from a document, then that document will not be used for training, but a prediction with the trained model will be generated for it. It is also known as continuous target variable. For classification analysis, the data type of the field must be numeric (integer, short, long, byte), categorical (ip or keyword), or boolean. There must be no more than 30 different values in this field. For regression analysis, the data type of the field must be numeric.

          • Advanced configuration option. Controls the fraction of data that is used to compute the derivatives of the loss function for tree training. A small value results in the use of a small fraction of the data. If this value is set to be less than 1, accuracy typically improves. However, too small a value may result in poor convergence for the ensemble and so require more trees. By default, this value is calculated during hyperparameter optimization. It must be greater than zero and less than or equal to 1.

          • Advanced configuration option. Specifies whether the training process should finish if it is not finding any better performing models. If disabled, the training process can take significantly longer and the chance of finding a better performing model is unremarkable.

          • eta number

            Advanced configuration option. The shrinkage applied to the weights. Smaller values result in larger forests which have a better generalization error. However, larger forests cause slower training. By default, this value is calculated during hyperparameter optimization. It must be a value between 0.001 and 1.

          • Advanced configuration option. Specifies the rate at which eta increases for each new tree that is added to the forest. For example, a rate of 1.05 increases eta by 5% for each extra tree. By default, this value is calculated during hyperparameter optimization. It must be between 0.5 and 2.

          • Advanced configuration option. Defines the fraction of features that will be used when selecting a random bag for each candidate split. By default, this value is calculated during hyperparameter optimization.

          • feature_processors array[object]

            Advanced configuration option. A collection of feature preprocessors that modify one or more included fields. The analysis uses the resulting one or more features instead of the original document field. However, these features are ephemeral; they are not stored in the destination index. Multiple feature_processors entries can refer to the same document fields. Automatic categorical feature encoding still occurs for the fields that are unprocessed by a custom processor or that have categorical values. Use this property only if you want to override the automatic feature encoding of the specified fields.

          • gamma number

            Advanced configuration option. Regularization parameter to prevent overfitting on the training data set. Multiplies a linear penalty associated with the size of individual trees in the forest. A high gamma value causes training to prefer small trees. A small gamma value results in larger individual trees and slower training. By default, this value is calculated during hyperparameter optimization. It must be a nonnegative value.

          • lambda number

            Advanced configuration option. Regularization parameter to prevent overfitting on the training data set. Multiplies an L2 regularization term which applies to leaf weights of the individual trees in the forest. A high lambda value causes training to favor small leaf weights. This behavior makes the prediction function smoother at the expense of potentially not being able to capture relevant relationships between the features and the dependent variable. A small lambda value results in large individual trees and slower training. By default, this value is calculated during hyperparameter optimization. It must be a nonnegative value.

          • Advanced configuration option. A multiplier responsible for determining the maximum number of hyperparameter optimization steps in the Bayesian optimization procedure. The maximum number of steps is determined based on the number of undefined hyperparameters times the maximum optimization rounds per hyperparameter. By default, this value is calculated during hyperparameter optimization.

          • Advanced configuration option. Defines the maximum number of decision trees in the forest. The maximum value is 2000. By default, this value is calculated during hyperparameter optimization.

          • Advanced configuration option. Specifies the maximum number of feature importance values per document to return. By default, no feature importance calculation occurs.

          • Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • Defines the seed for the random generator that is used to pick training data. By default, it is randomly generated. Set it to a specific value to use the same training data each time you start a job (assuming other related parameters such as source and analyzed_fields are the same).

          • Advanced configuration option. Machine learning uses loss guided tree growing, which means that the decision trees grow where the regularized loss decreases most quickly. This soft limit combines with the soft_tree_depth_tolerance to penalize trees that exceed the specified depth; the regularized loss increases quickly beyond this depth. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to 0.

          • Advanced configuration option. This option controls how quickly the regularized loss increases when the tree depth exceeds soft_tree_depth_limit. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to 0.01.

          • Defines the number of categories for which the predicted probabilities are reported. It must be non-negative or -1. If it is -1 or greater than the total number of categories, probabilities are reported for all categories; if you have a large number of categories, there could be a significant effect on the size of your destination index. NOTE: To use the AUC ROC evaluation method, num_top_classes must be set to -1 or a value greater than or equal to the total number of categories.

        • Hide outlier_detection attributes Show outlier_detection attributes object
          • Specifies whether the feature influence calculation is enabled.

          • The minimum outlier score that a document needs to have in order to calculate its feature influence score. Value range: 0-1.

          • method string

            The method that outlier detection uses. Available methods are lof, ldof, distance_kth_nn, distance_knn, and ensemble. The default value is ensemble, which means that outlier detection uses an ensemble of different methods and normalises and combines their individual outlier scores to obtain the overall outlier score.

          • Defines the value for how many nearest neighbors each method of outlier detection uses to calculate its outlier score. When the value is not set, different values are used for different ensemble members. This default behavior helps improve the diversity in the ensemble; only override it if you are confident that the value you choose is appropriate for the data set.

          • The proportion of the data set that is assumed to be outlying prior to outlier detection. For example, 0.05 means it is assumed that 5% of values are real outliers and 95% are inliers.

          • If true, the following operation is performed on the columns before computing outlier scores: (x_i - mean(x_i)) / sd(x_i).

        • Hide regression attributes Show regression attributes object
          • alpha number

            Advanced configuration option. Machine learning uses loss guided tree growing, which means that the decision trees grow where the regularized loss decreases most quickly. This parameter affects loss calculations by acting as a multiplier of the tree depth. Higher alpha values result in shallower trees and faster training times. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to zero.

          • dependent_variable string Required

            Defines which field of the document is to be predicted. It must match one of the fields in the index being used to train. If this field is missing from a document, then that document will not be used for training, but a prediction with the trained model will be generated for it. It is also known as continuous target variable. For classification analysis, the data type of the field must be numeric (integer, short, long, byte), categorical (ip or keyword), or boolean. There must be no more than 30 different values in this field. For regression analysis, the data type of the field must be numeric.

          • Advanced configuration option. Controls the fraction of data that is used to compute the derivatives of the loss function for tree training. A small value results in the use of a small fraction of the data. If this value is set to be less than 1, accuracy typically improves. However, too small a value may result in poor convergence for the ensemble and so require more trees. By default, this value is calculated during hyperparameter optimization. It must be greater than zero and less than or equal to 1.

          • Advanced configuration option. Specifies whether the training process should finish if it is not finding any better performing models. If disabled, the training process can take significantly longer and the chance of finding a better performing model is unremarkable.

          • eta number

            Advanced configuration option. The shrinkage applied to the weights. Smaller values result in larger forests which have a better generalization error. However, larger forests cause slower training. By default, this value is calculated during hyperparameter optimization. It must be a value between 0.001 and 1.

          • Advanced configuration option. Specifies the rate at which eta increases for each new tree that is added to the forest. For example, a rate of 1.05 increases eta by 5% for each extra tree. By default, this value is calculated during hyperparameter optimization. It must be between 0.5 and 2.

          • Advanced configuration option. Defines the fraction of features that will be used when selecting a random bag for each candidate split. By default, this value is calculated during hyperparameter optimization.

          • feature_processors array[object]

            Advanced configuration option. A collection of feature preprocessors that modify one or more included fields. The analysis uses the resulting one or more features instead of the original document field. However, these features are ephemeral; they are not stored in the destination index. Multiple feature_processors entries can refer to the same document fields. Automatic categorical feature encoding still occurs for the fields that are unprocessed by a custom processor or that have categorical values. Use this property only if you want to override the automatic feature encoding of the specified fields.

          • gamma number

            Advanced configuration option. Regularization parameter to prevent overfitting on the training data set. Multiplies a linear penalty associated with the size of individual trees in the forest. A high gamma value causes training to prefer small trees. A small gamma value results in larger individual trees and slower training. By default, this value is calculated during hyperparameter optimization. It must be a nonnegative value.

          • lambda number

            Advanced configuration option. Regularization parameter to prevent overfitting on the training data set. Multiplies an L2 regularization term which applies to leaf weights of the individual trees in the forest. A high lambda value causes training to favor small leaf weights. This behavior makes the prediction function smoother at the expense of potentially not being able to capture relevant relationships between the features and the dependent variable. A small lambda value results in large individual trees and slower training. By default, this value is calculated during hyperparameter optimization. It must be a nonnegative value.

          • Advanced configuration option. A multiplier responsible for determining the maximum number of hyperparameter optimization steps in the Bayesian optimization procedure. The maximum number of steps is determined based on the number of undefined hyperparameters times the maximum optimization rounds per hyperparameter. By default, this value is calculated during hyperparameter optimization.

          • Advanced configuration option. Defines the maximum number of decision trees in the forest. The maximum value is 2000. By default, this value is calculated during hyperparameter optimization.

          • Advanced configuration option. Specifies the maximum number of feature importance values per document to return. By default, no feature importance calculation occurs.

          • Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • Defines the seed for the random generator that is used to pick training data. By default, it is randomly generated. Set it to a specific value to use the same training data each time you start a job (assuming other related parameters such as source and analyzed_fields are the same).

          • Advanced configuration option. Machine learning uses loss guided tree growing, which means that the decision trees grow where the regularized loss decreases most quickly. This soft limit combines with the soft_tree_depth_tolerance to penalize trees that exceed the specified depth; the regularized loss increases quickly beyond this depth. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to 0.

          • Advanced configuration option. This option controls how quickly the regularized loss increases when the tree depth exceeds soft_tree_depth_limit. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to 0.01.

          • The loss function used during regression. Available options are mse (mean squared error), msle (mean squared logarithmic error), huber (Pseudo-Huber loss).

          • A positive number that is used as a parameter to the loss_function.

      • Hide analyzed_fields attributes Show analyzed_fields attributes object
        • includes array[string]

          An array of strings that defines the fields that will be excluded from the analysis. You do not need to add fields with unsupported data types to excludes, these fields are excluded from the analysis automatically.

        • excludes array[string]

          An array of strings that defines the fields that will be included in the analysis.

      • Hide authorization attributes Show authorization attributes object
        • api_key object
          Hide api_key attributes Show api_key attributes object
          • id string Required

            The identifier for the API key.

          • name string Required

            The name of the API key.

        • roles array[string]

          If a user ID was used for the most recent update to the job, its roles at the time of the update are listed in the response.

        • If a service account was used for the most recent update to the job, the account name is listed in the response.

      • Time unit for milliseconds

      • dest object Required
        Hide dest attributes Show dest attributes object
        • index string Required
        • Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

      • id string Required
      • source object Required
        Hide source attributes Show source attributes object
        • index string | array[string] Required
        • Hide runtime_mappings attribute Show runtime_mappings attribute object
          • * object Additional properties
            Hide * attributes Show * attributes object
            • fields object

              For type composite

            • fetch_fields array[object]

              For type lookup

            • format string

              A custom format for date type runtime fields.

            • Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

            • Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

            • script object
            • type string Required

              Values are boolean, composite, date, double, geo_point, geo_shape, ip, keyword, long, or lookup.

        • _source object
          Hide _source attributes Show _source attributes object
          • includes array[string]

            An array of strings that defines the fields that will be excluded from the analysis. You do not need to add fields with unsupported data types to excludes, these fields are excluded from the analysis automatically.

          • excludes array[string]

            An array of strings that defines the fields that will be included in the analysis.

        • query object

          The Elasticsearch query domain-specific language (DSL). This value corresponds to the query object in an Elasticsearch search POST body. All the options that are supported by Elasticsearch can be used, as this object is passed verbatim to Elasticsearch. By default, this property has the following value: {"match_all": {}}.

          Query DSL
      • version string
      • _meta object
        Hide _meta attribute Show _meta attribute object
        • * object Additional properties
GET /_ml/data_frame/analytics/{id}
curl \
 --request GET 'http://api.example.com/_ml/data_frame/analytics/{id}' \
 --header "Authorization: $API_KEY"





































































Clear trained model deployment cache Added in 8.5.0

POST /_ml/trained_models/{model_id}/deployment/cache/_clear

Cache will be cleared on all nodes where the trained model is assigned. A trained model deployment may have an inference cache enabled. As requests are handled by each allocated node, their responses may be cached on that individual node. Calling this API clears the caches without restarting the deployment.

Path parameters

  • model_id string Required

    The unique identifier of the trained model.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
POST /_ml/trained_models/{model_id}/deployment/cache/_clear
curl \
 --request POST 'http://api.example.com/_ml/trained_models/{model_id}/deployment/cache/_clear' \
 --header "Authorization: $API_KEY"
Response examples (200)
A successful response when clearing the inference cache.
{
  "cleared": true
}
































Evaluate a trained model Added in 8.3.0

POST /_ml/trained_models/{model_id}/_infer

Path parameters

  • model_id string Required

    The unique identifier of the trained model.

Query parameters

  • timeout string

    Controls the amount of time to wait for inference results.

application/json

Body Required

  • docs array[object] Required

    An array of objects to pass to the model for inference. The objects should contain a fields matching your configured trained model input. Typically, for NLP models, the field name is text_field. Currently, for NLP models, only a single value is allowed.

    Hide docs attribute Show docs attribute object
    • * object Additional properties
  • Hide inference_config attributes Show inference_config attributes object
    • Hide regression attributes Show regression attributes object
    • Hide classification attributes Show classification attributes object
      • Specifies the number of top class predictions to return. Defaults to 0.

      • Specifies the maximum number of feature importance values per document.

      • Specifies the type of the predicted field to write. Acceptable values are: string, number, boolean. When boolean is provided 1.0 is transformed to true and 0.0 to false.

      • The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.

      • Specifies the field to which the top classes are written. Defaults to top_classes.

    • Hide text_classification attributes Show text_classification attributes object
      • Specifies the number of top class predictions to return. Defaults to 0.

      • Hide tokenization attributes Show tokenization attributes object
        • truncate string

          Values are first, second, or none.

        • span number

          Span options to apply

      • The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.

      • Classification labels to apply other than the stored labels. Must have the same deminsions as the default configured labels

    • Hide zero_shot_classification attributes Show zero_shot_classification attributes object
      • Hide tokenization attributes Show tokenization attributes object
        • truncate string

          Values are first, second, or none.

        • span number

          Span options to apply

      • The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.

      • Update the configured multi label option. Indicates if more than one true label exists. Defaults to the configured value.

      • labels array[string] Required

        The labels to predict.

    • Hide fill_mask attributes Show fill_mask attributes object
      • Specifies the number of top class predictions to return. Defaults to 0.

      • Hide tokenization attributes Show tokenization attributes object
        • truncate string

          Values are first, second, or none.

        • span number

          Span options to apply

      • The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.

    • ner object
      Hide ner attributes Show ner attributes object
      • Hide tokenization attributes Show tokenization attributes object
        • truncate string

          Values are first, second, or none.

        • span number

          Span options to apply

      • The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.

    • Hide pass_through attributes Show pass_through attributes object
      • Hide tokenization attributes Show tokenization attributes object
        • truncate string

          Values are first, second, or none.

        • span number

          Span options to apply

      • The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.

    • Hide text_embedding attributes Show text_embedding attributes object
      • Hide tokenization attributes Show tokenization attributes object
        • truncate string

          Values are first, second, or none.

        • span number

          Span options to apply

      • The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.

    • Hide text_expansion attributes Show text_expansion attributes object
      • Hide tokenization attributes Show tokenization attributes object
        • truncate string

          Values are first, second, or none.

        • span number

          Span options to apply

      • The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.

    • Hide question_answering attributes Show question_answering attributes object
      • question string Required

        The question to answer given the inference context

      • Specifies the number of top class predictions to return. Defaults to 0.

      • Hide tokenization attributes Show tokenization attributes object
        • truncate string

          Values are first, second, or none.

        • span number

          Span options to apply

      • The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.

      • The maximum answer length to consider for extraction

Responses

POST /_ml/trained_models/{model_id}/_infer
curl \
 --request POST 'http://api.example.com/_ml/trained_models/{model_id}/_infer' \
 --header "Authorization: $API_KEY" \
 --header "Content-Type: application/json" \
 --data '{"docs":[{"additionalProperty1":{},"additionalProperty2":{}}],"inference_config":{"regression":{"results_field":"string","num_top_feature_importance_values":42.0},"classification":{"num_top_classes":42.0,"num_top_feature_importance_values":42.0,"prediction_field_type":"string","results_field":"string","top_classes_results_field":"string"},"text_classification":{"num_top_classes":42.0,"tokenization":{"truncate":"first","span":42.0},"results_field":"string","classification_labels":["string"]},"zero_shot_classification":{"tokenization":{"truncate":"first","span":42.0},"results_field":"string","multi_label":true,"labels":["string"]},"fill_mask":{"num_top_classes":42.0,"tokenization":{"truncate":"first","span":42.0},"results_field":"string"},"ner":{"tokenization":{"truncate":"first","span":42.0},"results_field":"string"},"pass_through":{"tokenization":{"truncate":"first","span":42.0},"results_field":"string"},"text_embedding":{"tokenization":{"truncate":"first","span":42.0},"results_field":"string"},"text_expansion":{"tokenization":{"truncate":"first","span":42.0},"results_field":"string"},"question_answering":{"question":"string","num_top_classes":42.0,"tokenization":{"truncate":"first","span":42.0},"results_field":"string","max_answer_length":42.0}}}'


























































Get the shutdown status Added in 7.13.0

GET /_nodes/{node_id}/shutdown

Get information about nodes that are ready to be shut down, have shut down preparations still in progress, or have stalled. The API returns status information for each part of the shut down process.

NOTE: This feature is designed for indirect use by Elasticsearch Service, Elastic Cloud Enterprise, and Elastic Cloud on Kubernetes. Direct use is not supported.

If the operator privileges feature is enabled, you must be an operator to use this API.

Path parameters

  • node_id string | array[string] Required

    Which node for which to retrieve the shutdown status

Query parameters

  • Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.

    Values are nanos, micros, ms, s, m, h, or d.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • nodes array[object] Required
      Hide nodes attributes Show nodes attributes object
      • node_id string Required
      • type string Required

        Values are remove or restart.

      • reason string Required
      • Time unit for milliseconds

      • status string Required

        Values are not_started, in_progress, stalled, or complete.

      • shard_migration object Required
        Hide shard_migration attribute Show shard_migration attribute object
        • status string Required

          Values are not_started, in_progress, stalled, or complete.

      • persistent_tasks object Required
        Hide persistent_tasks attribute Show persistent_tasks attribute object
        • status string Required

          Values are not_started, in_progress, stalled, or complete.

      • plugins object Required
        Hide plugins attribute Show plugins attribute object
        • status string Required

          Values are not_started, in_progress, stalled, or complete.

GET /_nodes/{node_id}/shutdown
curl \
 --request GET 'http://api.example.com/_nodes/{node_id}/shutdown' \
 --header "Authorization: $API_KEY"
Response examples (200)
Get the status of shutdown preparations with `GET /_nodes/USpTGYaBSIKbgSUJR2Z9lg/shutdown`. The response shows information about the shutdown preparations, including the status of shard migration, task migration, and plugin cleanup
{
    "nodes": [
        {
            "node_id": "USpTGYaBSIKbgSUJR2Z9lg",
            "type": "RESTART",
            "reason": "Demonstrating how the node shutdown API works",
            "shutdown_startedmillis": 1624406108685,
            "allocation_delay": "10m",
            "status": "COMPLETE",
            "shard_migration": {
                "status": "COMPLETE",
                "shard_migrations_remaining": 0,
                "explanation": "no shard relocation is necessary for a node restart"
            },
            "persistent_tasks": {
                "status": "COMPLETE"
            },
            "plugins": {
                "status": "COMPLETE"
            }
        }
    ]
}




Cancel node shutdown preparations Added in 7.13.0

DELETE /_nodes/{node_id}/shutdown

Remove a node from the shutdown list so it can resume normal operations. You must explicitly clear the shutdown request when a node rejoins the cluster or when a node has permanently left the cluster. Shutdown requests are never removed automatically by Elasticsearch.

NOTE: This feature is designed for indirect use by Elastic Cloud, Elastic Cloud Enterprise, and Elastic Cloud on Kubernetes. Direct use is not supported.

If the operator privileges feature is enabled, you must be an operator to use this API.

Path parameters

  • node_id string Required

    The node id of node to be removed from the shutdown state

Query parameters

  • Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.

    Values are nanos, micros, ms, s, m, h, or d.

  • timeout string

    Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.

    Values are nanos, micros, ms, s, m, h, or d.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • acknowledged boolean Required

      For a successful response, this value is always true. On failure, an exception is returned instead.

DELETE /_nodes/{node_id}/shutdown
curl \
 --request DELETE 'http://api.example.com/_nodes/{node_id}/shutdown' \
 --header "Authorization: $API_KEY"
Response examples (200)
A successful response from `DELETE /_nodes/USpTGYaBSIKbgSUJR2Z9lg/shutdown`.
{
    "acknowledged": true
}

















Get a query ruleset Added in 8.10.0

GET /_query_rules/{ruleset_id}

Get details about a query ruleset.

Path parameters

  • ruleset_id string Required

    The unique identifier of the query ruleset

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • ruleset_id string Required
    • rules array[object] Required

      Rules associated with the query ruleset.

      Hide rules attributes Show rules attributes object
      • rule_id string Required
      • type string Required

        Values are pinned or exclude.

      • criteria object | array[object] Required

        The criteria that must be met for the rule to be applied. If multiple criteria are specified for a rule, all criteria must be met for the rule to be applied.

        One of:
        Hide attributes Show attributes
        • type string Required

          Values are global, exact, exact_fuzzy, fuzzy, prefix, suffix, contains, lt, lte, gt, gte, or always.

        • metadata string

          The metadata field to match against. This metadata will be used to match against match_criteria sent in the rule. It is required for all criteria types except always.

        • values array[object]

          The values to match against the metadata field. Only one value must match for the criteria to be met. It is required for all criteria types except always.

      • actions object Required
        Hide actions attributes Show actions attributes object
        • ids array[string]

          The unique document IDs of the documents to apply the rule to. Only one of ids or docs may be specified and at least one must be specified.

        • docs array[object]

          The documents to apply the rule to. Only one of ids or docs may be specified and at least one must be specified. There is a maximum value of 100 documents in a rule. You can specify the following attributes for each document:

          • _index: The index of the document to pin.
          • _id: The unique document ID.
          Hide docs attributes Show docs attributes object
      • priority number
GET /_query_rules/{ruleset_id}
curl \
 --request GET 'http://api.example.com/_query_rules/{ruleset_id}' \
 --header "Authorization: $API_KEY"
Response examples (200)
A successful response from `GET _query_rules/my-ruleset/`.
{
    "ruleset_id": "my-ruleset",
    "rules": [
        {
            "rule_id": "my-rule1",
            "type": "pinned",
            "criteria": [
                {
                    "type": "contains",
                    "metadata": "query_string",
                    "values": [ "pugs", "puggles" ]
                }
            ],
            "actions": {
                "ids": [
                    "id1",
                    "id2"
                ]
            }
        },
        {
            "rule_id": "my-rule2",
            "type": "pinned",
            "criteria": [
                {
                    "type": "fuzzy",
                    "metadata": "query_string",
                    "values": [ "rescue dogs" ]
                }
            ],
            "actions": {
                "docs": [
                    {
                        "_index": "index1",
                        "_id": "id3"
                    },
                    {
                        "_index": "index2",
                        "_id": "id4"
                    }
                ]
            }
        }
    ]
}






















































































Create or update a script or search template

PUT /_scripts/{id}/{context}

Creates or updates a stored script or search template.

External documentation

Path parameters

  • id string Required

    The identifier for the stored script or search template. It must be unique within the cluster.

  • context string Required

    The context in which the script or search template should run. To prevent errors, the API immediately compiles the script or template in this context.

Query parameters

  • context string

    The context in which the script or search template should run. To prevent errors, the API immediately compiles the script or template in this context. If you specify both this and the <context> path parameter, the API uses the request path parameter.

  • The period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error. It can also be set to -1 to indicate that the request should never timeout.

  • timeout string

    The period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error. It can also be set to -1 to indicate that the request should never timeout.

application/json

Body Required

  • script object Required
    Hide script attributes Show script attributes object
    • lang string Required

      Any of:

      Values are painless, expression, mustache, or java.

    • options object
      Hide options attribute Show options attribute object
      • * string Additional properties
    • source string | object Required

      One of:

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • acknowledged boolean Required

      For a successful response, this value is always true. On failure, an exception is returned instead.

PUT /_scripts/{id}/{context}
curl \
 --request PUT 'http://api.example.com/_scripts/{id}/{context}' \
 --header "Authorization: $API_KEY" \
 --header "Content-Type: application/json" \
 --data '"{\n  \"script\": {\n    \"lang\": \"mustache\",\n    \"source\": {\n      \"query\": {\n        \"match\": {\n          \"message\": \"{{query_string}}\"\n        }\n      },\n      \"from\": \"{{from}}\",\n      \"size\": \"{{size}}\"\n    }\n  }\n}"'
Request examples
Run `PUT _scripts/my-search-template` to create a search template.
{
  "script": {
    "lang": "mustache",
    "source": {
      "query": {
        "match": {
          "message": "{{query_string}}"
        }
      },
      "from": "{{from}}",
      "size": "{{size}}"
    }
  }
}
Run `PUT _scripts/my-stored-script` to create a stored script.
{
  "script": {
    "lang": "painless",
    "source": "Math.log(_score * 2) + params['my_modifier']"
  }
}

Create or update a script or search template

POST /_scripts/{id}/{context}

Creates or updates a stored script or search template.

External documentation

Path parameters

  • id string Required

    The identifier for the stored script or search template. It must be unique within the cluster.

  • context string Required

    The context in which the script or search template should run. To prevent errors, the API immediately compiles the script or template in this context.

Query parameters

  • context string

    The context in which the script or search template should run. To prevent errors, the API immediately compiles the script or template in this context. If you specify both this and the <context> path parameter, the API uses the request path parameter.

  • The period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error. It can also be set to -1 to indicate that the request should never timeout.

  • timeout string

    The period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error. It can also be set to -1 to indicate that the request should never timeout.

application/json

Body Required

  • script object Required
    Hide script attributes Show script attributes object
    • lang string Required

      Any of:

      Values are painless, expression, mustache, or java.

    • options object
      Hide options attribute Show options attribute object
      • * string Additional properties
    • source string | object Required

      One of:

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • acknowledged boolean Required

      For a successful response, this value is always true. On failure, an exception is returned instead.

POST /_scripts/{id}/{context}
curl \
 --request POST 'http://api.example.com/_scripts/{id}/{context}' \
 --header "Authorization: $API_KEY" \
 --header "Content-Type: application/json" \
 --data '"{\n  \"script\": {\n    \"lang\": \"mustache\",\n    \"source\": {\n      \"query\": {\n        \"match\": {\n          \"message\": \"{{query_string}}\"\n        }\n      },\n      \"from\": \"{{from}}\",\n      \"size\": \"{{size}}\"\n    }\n  }\n}"'
Request examples
Run `PUT _scripts/my-search-template` to create a search template.
{
  "script": {
    "lang": "mustache",
    "source": {
      "query": {
        "match": {
          "message": "{{query_string}}"
        }
      },
      "from": "{{from}}",
      "size": "{{size}}"
    }
  }
}
Run `PUT _scripts/my-stored-script` to create a stored script.
{
  "script": {
    "lang": "painless",
    "source": "Math.log(_score * 2) + params['my_modifier']"
  }
}
















































































































































































































































































































Bulk delete roles Added in 8.15.0

DELETE /_security/role

The role management APIs are generally the preferred way to manage roles, rather than using file-based role management. The bulk delete roles API cannot delete roles that are defined in roles files.

Query parameters

  • refresh string

    If true (the default) then refresh the affected shards to make this operation visible to search, if wait_for then wait for a refresh to make this operation visible to search, if false then do nothing with refreshes.

    Values are true, false, or wait_for.

application/json

Body Required

  • names array[string] Required

    An array of role names to delete

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • deleted array[string]

      Array of deleted roles

    • not_found array[string]

      Array of roles that could not be found

    • errors object
      Hide errors attributes Show errors attributes object
      • count number Required

        The number of errors

      • details object Required

        Details about the errors, keyed by role name

        Hide details attribute Show details attribute object
        • * object
          Hide * attributes Show * attributes object
DELETE /_security/role
curl \
 --request DELETE 'http://api.example.com/_security/role' \
 --header "Authorization: $API_KEY" \
 --header "Content-Type: application/json" \
 --data '"{\n  \"names\": [\"my_admin_role\", \"my_user_role\"]\n}"'
Request example
Run DELETE /_security/role` to delete `my_admin_role` and `my_user_role` roles.
{
  "names": ["my_admin_role", "my_user_role"]
}
A successful response from `DELETE /_security/role`.
{
  "deleted": [
      "my_admin_role",
      "my_user_role"
  ]
}
A partially successful response from `DELETE /_security/role`. If a role cannot be found, it appears in the `not_found` list in the response.
{
  "deleted": [
      "my_admin_role"
  ],
  "not_found": [
      "not_an_existing_role"
  ]
}
A partially successful response from `DELETE /_security/role`. If part of a request fails or is invalid, the response includes `errors`.
{
  "deleted": [
      "my_admin_role"
  ],
  "errors": {
      "count": 1,
      "details": {
          "superuser": {
              "type": "illegal_argument_exception",
              "reason": "role [superuser] is reserved and cannot be deleted"
          }
      }
  }
}








































































































Get role mappings Added in 5.5.0

GET /_security/role_mapping/{name}

Role mappings define which roles are assigned to each user. The role mapping APIs are generally the preferred way to manage role mappings rather than using role mapping files. The get role mappings API cannot retrieve role mappings that are defined in role mapping files.

External documentation

Path parameters

  • name string | array[string] Required

    The distinct name that identifies the role mapping. The name is used solely as an identifier to facilitate interaction via the API; it does not affect the behavior of the mapping in any way. You can specify multiple mapping names as a comma-separated list. If you do not specify this parameter, the API returns information about all role mappings.

Responses

GET /_security/role_mapping/{name}
curl \
 --request GET 'http://api.example.com/_security/role_mapping/{name}' \
 --header "Authorization: $API_KEY"
Response examples (200)
A successful response from `GET /_security/role_mapping/mapping1`.
{
  "mapping1": {
    "enabled": true,
    "roles": [
      "user"
    ],
    "rules": {
      "field": {
        "username": "*"
      }
    },
    "metadata": {}
  }
}




























































































































































































































Create SAML service provider metadata Added in 7.11.0

GET /_security/saml/metadata/{realm_name}

Generate SAML metadata for a SAML 2.0 Service Provider.

The SAML 2.0 specification provides a mechanism for Service Providers to describe their capabilities and configuration using a metadata file. This API generates Service Provider metadata based on the configuration of a SAML realm in Elasticsearch.

Path parameters

  • realm_name string Required

    The name of the SAML realm in Elasticsearch.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • metadata string Required

      An XML string that contains a SAML Service Provider's metadata for the realm.

GET /_security/saml/metadata/{realm_name}
curl \
 --request GET 'http://api.example.com/_security/saml/metadata/{realm_name}' \
 --header "Authorization: $API_KEY"
Response examples (200)
A successful response from `POST /_security/profile/u_P_0BMHgaOK3p7k-PFWUCbw9dQ-UFjt01oWJ_Dp2PmPc_0/_data`, which indicates that the request is acknowledged.
{
  "acknowledged": true
}













































Create a snapshot Added in 0.0.0

POST /_snapshot/{repository}/{snapshot}

Take a snapshot of a cluster or of data streams and indices.

External documentation

Path parameters

  • repository string Required

    The name of the repository for the snapshot.

  • snapshot string Required

    The name of the snapshot. It supportes date math. It must be unique in the repository.

Query parameters

  • The period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.

  • If true, the request returns a response when the snapshot is complete. If false, the request returns a response when the snapshot initializes.

application/json

Body

  • expand_wildcards string | array[string]
  • feature_states array[string]

    The feature states to include in the snapshot. Each feature state includes one or more system indices containing related data. You can view a list of eligible features using the get features API.

    If include_global_state is true, all current feature states are included by default. If include_global_state is false, no feature states are included by default.

    Note that specifying an empty array will result in the default behavior. To exclude all feature states, regardless of the include_global_state value, specify an array with only the value none (["none"]).

  • If true, the request ignores data streams and indices in indices that are missing or closed. If false, the request returns an error for any data stream or index that is missing or closed.

  • If true, the current cluster state is included in the snapshot. The cluster state includes persistent cluster settings, composable index templates, legacy index templates, ingest pipelines, and ILM policies. It also includes data stored in system indices, such as Watches and task records (configurable via feature_states).

  • indices string | array[string]
  • metadata object
    Hide metadata attribute Show metadata attribute object
    • * object Additional properties
  • partial boolean

    If true, it enables you to restore a partial snapshot of indices with unavailable shards. Only shards that were successfully included in the snapshot will be restored. All missing shards will be recreated as empty.

    If false, the entire restore operation will fail if one or more indices included in the snapshot do not have all primary shards available.

Responses

POST /_snapshot/{repository}/{snapshot}
curl \
 --request POST 'http://api.example.com/_snapshot/{repository}/{snapshot}' \
 --header "Authorization: $API_KEY" \
 --header "Content-Type: application/json" \
 --data '"{\n  \"indices\": \"index_1,index_2\",\n  \"ignore_unavailable\": true,\n  \"include_global_state\": false,\n  \"metadata\": {\n    \"taken_by\": \"user123\",\n    \"taken_because\": \"backup before upgrading\"\n  }\n}"'
Request example
Run `PUT /_snapshot/my_repository/snapshot_2?wait_for_completion=true` to take a snapshot of `index_1` and `index_2`.
{
  "indices": "index_1,index_2",
  "ignore_unavailable": true,
  "include_global_state": false,
  "metadata": {
    "taken_by": "user123",
    "taken_because": "backup before upgrading"
  }
}
Response examples (200)
A successful response from `PUT /_snapshot/my_repository/snapshot_2?wait_for_completion=true`.
{
  "snapshot": {
    "snapshot": "snapshot_2",
    "uuid": "vdRctLCxSketdKb54xw67g",
    "repository": "my_repository",
    "version_id": <version_id>,
    "version": <version>,
    "indices": [],
    "data_streams": [],
    "feature_states": [],
    "include_global_state": false,
    "metadata": {
      "taken_by": "user123",
      "taken_because": "backup before upgrading"
    },
    "state": "SUCCESS",
    "start_time": "2020-06-25T14:00:28.850Z",
    "start_time_in_millis": 1593093628850,
    "end_time": "2020-06-25T14:00:28.850Z",
    "end_time_in_millis": 1593094752018,
    "duration_in_millis": 0,
    "failures": [],
    "shards": {
      "total": 0,
      "failed": 0,
      "successful": 0
    }
  }
}




















Query parameters

  • local boolean

    If true, the request gets information from the local node only. If false, the request gets information from the master node.

  • The period to wait for the master node. If the master node is not available before the timeout expires, the request fails and returns an error. To indicate that the request should never timeout, set it to -1.

Responses

GET /_snapshot
curl \
 --request GET 'http://api.example.com/_snapshot' \
 --header "Authorization: $API_KEY"
Response examples (200)
A successful response from `GET /_snapshot/my_repository`.
{
  "my_repository" : {
    "type" : "fs",
    "uuid" : "0JLknrXbSUiVPuLakHjBrQ",
    "settings" : {
      "location" : "my_backup_location"
    }
  }
}

























































Start snapshot lifecycle management Added in 7.6.0

POST /_slm/start

Snapshot lifecycle management (SLM) starts automatically when a cluster is formed. Manually starting SLM is necessary only if it has been stopped using the stop SLM API.

Query parameters

  • The period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error. To indicate that the request should never timeout, set it to -1.

  • timeout string

    The period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error. To indicate that the request should never timeout, set it to -1.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • acknowledged boolean Required

      For a successful response, this value is always true. On failure, an exception is returned instead.

POST /_slm/start
curl \
 --request POST 'http://api.example.com/_slm/start' \
 --header "Authorization: $API_KEY"
Response examples (200)
A successful response from `POST _slm/start`.
{
  "acknowledged": true
}





application/json

Body Required

  • cursor string Required

    Cursor to clear.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
POST /_sql/close
curl \
 --request POST 'http://api.example.com/_sql/close' \
 --header "Authorization: $API_KEY" \
 --header "Content-Type: application/json" \
 --data '"{\n  \"cursor\": \"sDXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAAEWYUpOYklQMHhRUEtld3RsNnFtYU1hQQ==:BAFmBGRhdGUBZgVsaWtlcwFzB21lc3NhZ2UBZgR1c2Vy9f///w8=\"\n}"'
Request example
Run `POST _sql/close` to clear an SQL search cursor.
{
  "cursor": "sDXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAAEWYUpOYklQMHhRUEtld3RsNnFtYU1hQQ==:BAFmBGRhdGUBZgVsaWtlcwFzB21lc3NhZ2UBZgR1c2Vy9f///w8="
}




















Translate SQL into Elasticsearch queries Added in 6.3.0

GET /_sql/translate

Translate an SQL search into a search API request containing Query DSL. It accepts the same request body parameters as the SQL search API, excluding cursor.

application/json

Body Required

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • size number
    • _source boolean | object

      Defines how to fetch a source. Fetching can be disabled entirely, or the source can be filtered.

      One of:
    • fields array[object]
      Hide fields attributes Show fields attributes object
      • field string Required

        Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

      • format string

        The format in which the values are returned.

    • query object

      An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.

      External documentation
    • sort string | object | array[string | object]

      One of:

      Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

GET /_sql/translate
curl \
 --request GET 'http://api.example.com/_sql/translate' \
 --header "Authorization: $API_KEY" \
 --header "Content-Type: application/json" \
 --data '"{\n  \"query\": \"SELECT * FROM library ORDER BY page_count DESC\",\n  \"fetch_size\": 10\n}"'
Request example
{
  "query": "SELECT * FROM library ORDER BY page_count DESC",
  "fetch_size": 10
}








































































































































Usage