Update the connector service type Beta; Added in 8.12.0

PUT /_connector/{connector_id}/_service_type

Path parameters

  • connector_id string Required

    The unique identifier of the connector to be updated

application/json

Body Required

  • service_type string Required

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • result string Required

      Values are created, updated, deleted, not_found, or noop.

PUT /_connector/{connector_id}/_service_type
PUT _connector/my-connector/_service_type
{
    "service_type": "sharepoint_online"
}
resp = client.connector.update_service_type(
    connector_id="my-connector",
    service_type="sharepoint_online",
)
const response = await client.connector.updateServiceType({
  connector_id: "my-connector",
  service_type: "sharepoint_online",
});
response = client.connector.update_service_type(
  connector_id: "my-connector",
  body: {
    "service_type": "sharepoint_online"
  }
)
$resp = $client->connector()->updateServiceType([
    "connector_id" => "my-connector",
    "body" => [
        "service_type" => "sharepoint_online",
    ],
]);
curl -X PUT -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"service_type":"sharepoint_online"}' "$ELASTICSEARCH_URL/_connector/my-connector/_service_type"
client.connector().updateServiceType(u -> u
    .connectorId("my-connector")
    .serviceType("sharepoint_online")
);
Request example
{
    "service_type": "sharepoint_online"
}
Response examples (200)
{
  "result": "updated"
}


























































































































































































































































































Import a dangling index Generally available; Added in 7.9.0

POST /_dangling/{index_uuid}

If Elasticsearch encounters index data that is absent from the current cluster state, those indices are considered to be dangling. For example, this can happen if you delete more than cluster.indices.tombstones.size indices while an Elasticsearch node is offline.

Required authorization

  • Cluster privileges: manage

Path parameters

  • index_uuid string Required

    The UUID of the index to import. Use the get dangling indices API to locate the UUID.

Query parameters

  • accept_data_loss boolean Required

    This parameter must be set to true to import a dangling index. Because Elasticsearch cannot know where the dangling index data came from or determine which shard copies are fresh and which are stale, it cannot guarantee that the imported data represents the latest state of the index when it was last in the cluster.

  • master_timeout string

    Specify timeout for connection to master

    Values are -1 or 0.

  • timeout string

    Explicit operation timeout

    Values are -1 or 0.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • acknowledged boolean Required

      For a successful response, this value is always true. On failure, an exception is returned instead.

POST /_dangling/{index_uuid}
POST /_dangling/zmM4e0JtBkeUjiHD-MihPQ?accept_data_loss=true
resp = client.dangling_indices.import_dangling_index(
    index_uuid="zmM4e0JtBkeUjiHD-MihPQ",
    accept_data_loss=True,
)
const response = await client.danglingIndices.importDanglingIndex({
  index_uuid: "zmM4e0JtBkeUjiHD-MihPQ",
  accept_data_loss: "true",
});
response = client.dangling_indices.import_dangling_index(
  index_uuid: "zmM4e0JtBkeUjiHD-MihPQ",
  accept_data_loss: "true"
)
$resp = $client->danglingIndices()->importDanglingIndex([
    "index_uuid" => "zmM4e0JtBkeUjiHD-MihPQ",
    "accept_data_loss" => "true",
]);
curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_dangling/zmM4e0JtBkeUjiHD-MihPQ?accept_data_loss=true"
client.danglingIndices().importDanglingIndex(i -> i
    .acceptDataLoss(true)
    .indexUuid("zmM4e0JtBkeUjiHD-MihPQ")
);
Response examples (200)
A successful response from `POST /_dangling/zmM4e0JtBkeUjiHD-MihPQ?accept_data_loss=true`.
{
  "acknowledged": true
}
































































































































Open a closed index Generally available

POST /{index}/_open

For data streams, the API opens any closed backing indices.

A closed index is blocked for read/write operations and does not allow all operations that opened indices allow. It is not possible to index documents or to search for documents in a closed index. This allows closed indices to not have to maintain internal data structures for indexing or searching documents, resulting in a smaller overhead on the cluster.

When opening or closing an index, the master is responsible for restarting the index shards to reflect the new state of the index. The shards will then go through the normal recovery process. The data of opened or closed indices is automatically replicated by the cluster to ensure that enough shard copies are safely kept around at all times.

You can open and close multiple indices. An error is thrown if the request explicitly refers to a missing index. This behavior can be turned off by using the ignore_unavailable=true parameter.

By default, you must explicitly name the indices you are opening or closing. To open or close indices with _all, *, or other wildcard expressions, change the action.destructive_requires_name setting to false. This setting can also be changed with the cluster update settings API.

Closed indices consume a significant amount of disk-space which can cause problems in managed environments. Closing indices can be turned off with the cluster settings API by setting cluster.indices.close.enable to false.

Because opening or closing an index allocates its shards, the wait_for_active_shards setting on index creation applies to the _open and _close index actions as well.

Required authorization

  • Index privileges: manage

Path parameters

  • index string | array[string] Required

    Comma-separated list of data streams, indices, and aliases used to limit the request. Supports wildcards (*). By default, you must explicitly name the indices you using to limit the request. To limit a request using _all, *, or other wildcard expressions, change the action.destructive_requires_name setting to false. You can update this setting in the elasticsearch.yml file or using the cluster update settings API.

Query parameters

  • allow_no_indices boolean

    If false, the request returns an error if any wildcard expression, index alias, or _all value targets only missing or closed indices. This behavior applies even if the request targets other open indices.

  • expand_wildcards string | array[string]

    Type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. Supports comma-separated values, such as open,hidden.

    Supported values include:

    • all: Match any data stream or index, including hidden ones.
    • open: Match open, non-hidden indices. Also matches any non-hidden data stream.
    • closed: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.
    • hidden: Match hidden data streams and hidden indices. Must be combined with open, closed, or both.
    • none: Wildcard expressions are not accepted.

    Values are all, open, closed, hidden, or none.

  • ignore_unavailable boolean

    If false, the request returns an error if it targets a missing or closed index.

  • master_timeout string

    Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.

    Values are -1 or 0.

  • timeout string

    Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.

    Values are -1 or 0.

  • wait_for_active_shards number | string

    The number of shard copies that must be active before proceeding with the operation. Set to all or any positive integer up to the total number of shards in the index (number_of_replicas+1).

    Values are all or index-setting.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • acknowledged boolean Required
    • shards_acknowledged boolean Required
POST /{index}/_open
POST /.ds-my-data-stream-2099.03.07-000001/_open/
resp = client.indices.open(
    index=".ds-my-data-stream-2099.03.07-000001",
)
const response = await client.indices.open({
  index: ".ds-my-data-stream-2099.03.07-000001",
});
response = client.indices.open(
  index: ".ds-my-data-stream-2099.03.07-000001"
)
$resp = $client->indices()->open([
    "index" => ".ds-my-data-stream-2099.03.07-000001",
]);
curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/.ds-my-data-stream-2099.03.07-000001/_open/"
client.indices().open(o -> o
    .index(".ds-my-data-stream-2099.03.07-000001")
);
Response examples (200)
A successful response for opening an index.
{
  "acknowledged" : true,
  "shards_acknowledged" : true
}

















































































































Inference

Inference APIs enable you to use certain services, such as built-in machine learning models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Azure, Google AI Studio or Hugging Face. For built-in models and models uploaded through Eland, the inference APIs offer an alternative way to use and manage trained models. However, if you do not plan to use the inference APIs to use these models or if you want to use non-NLP models, use the machine learning trained model APIs.











































































































































































































































































































Create an anomaly detection job Generally available; Added in 5.4.0

PUT /_ml/anomaly_detectors/{job_id}

If you include a datafeed_config, you must have read index privileges on the source index. If you include a datafeed_config but do not provide a query, the datafeed uses {"match_all": {"boost": 1}}.

Required authorization

  • Index privileges: read
  • Cluster privileges: manage_ml

Path parameters

  • job_id string Required

    The identifier for the anomaly detection job. This identifier can contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and underscores. It must start and end with alphanumeric characters.

Query parameters

  • allow_no_indices boolean

    If true, wildcard indices expressions that resolve into no concrete indices are ignored. This includes the _all string or when no indices are specified.

  • expand_wildcards string | array[string]

    Type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. Supports comma-separated values.

    Supported values include:

    • all: Match any data stream or index, including hidden ones.
    • open: Match open, non-hidden indices. Also matches any non-hidden data stream.
    • closed: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.
    • hidden: Match hidden data streams and hidden indices. Must be combined with open, closed, or both.
    • none: Wildcard expressions are not accepted.

    Values are all, open, closed, hidden, or none.

  • ignore_throttled boolean Deprecated

    If true, concrete, expanded or aliased indices are ignored when frozen.

  • ignore_unavailable boolean

    If true, unavailable indices (missing or closed) are ignored.

application/json

Body Required

  • allow_lazy_open boolean

    Advanced configuration option. Specifies whether this job can open when there is insufficient machine learning node capacity for it to be immediately assigned to a node. By default, if a machine learning node with capacity to run the job cannot immediately be found, the open anomaly detection jobs API returns an error. However, this is also subject to the cluster-wide xpack.ml.max_lazy_ml_nodes setting. If this option is set to true, the open anomaly detection jobs API does not return an error and the job waits in the opening state until sufficient machine learning node capacity is available.

    Default value is false.

  • analysis_config object Required

    Specifies how to analyze the data. After you create a job, you cannot change the analysis configuration; all the properties are informational.

    Hide analysis_config attributes Show analysis_config attributes object
    • bucket_span string

      The size of the interval that the analysis is aggregated into, typically between 5m and 1h. This value should be either a whole number of days or equate to a whole number of buckets in one day. If the anomaly detection job uses a datafeed with aggregations, this value must also be divisible by the interval of the date histogram aggregation.

    • categorization_analyzer string | object

      If categorization_field_name is specified, you can also define the analyzer that is used to interpret the categorization field. This property cannot be used at the same time as categorization_filters. The categorization analyzer specifies how the categorization_field is interpreted by the categorization process. The categorization_analyzer field can be specified either as a string or as an object. If it is a string, it must refer to a built-in analyzer or one added by another plugin.

      One of:

      If categorization_field_name is specified, you can also define the analyzer that is used to interpret the categorization field. This property cannot be used at the same time as categorization_filters. The categorization analyzer specifies how the categorization_field is interpreted by the categorization process. The categorization_analyzer field can be specified either as a string or as an object. If it is a string, it must refer to a built-in analyzer or one added by another plugin.

    • categorization_field_name string

      If this property is specified, the values of the specified field will be categorized. The resulting categories must be used in a detector by setting by_field_name, over_field_name, or partition_field_name to the keyword mlcategory.

    • categorization_filters array[string]

      If categorization_field_name is specified, you can also define optional filters. This property expects an array of regular expressions. The expressions are used to filter out matching sequences from the categorization field values. You can use this functionality to fine tune the categorization by excluding sequences from consideration when categories are defined. For example, you can exclude SQL statements that appear in your log files. This property cannot be used at the same time as categorization_analyzer. If you only want to define simple regular expression filters that are applied prior to tokenization, setting this property is the easiest method. If you also want to customize the tokenizer or post-tokenization filtering, use the categorization_analyzer property instead and include the filters as pattern_replace character filters. The effect is exactly the same.

    • detectors array[object] Required

      Detector configuration objects specify which data fields a job analyzes. They also specify which analytical functions are used. You can specify multiple detectors for a job. If the detectors array does not contain at least one detector, no analysis can occur and an error is returned.

      Hide detectors attributes Show detectors attributes object
      • by_field_name string

        The field used to split the data. In particular, this property is used for analyzing the splits with respect to their own history. It is used for finding unusual values in the context of the split.

      • custom_rules array[object]

        Custom rules enable you to customize the way detectors operate. For example, a rule may dictate conditions under which results should be skipped. Kibana refers to custom rules as job rules.

        Hide custom_rules attributes Show custom_rules attributes object
        • actions array[string]

          The set of actions to be triggered when the rule applies. If more than one action is specified the effects of all actions are combined.

          Supported values include:

          • skip_result: The result will not be created. Unless you also specify skip_model_update, the model will be updated as usual with the corresponding series value.
          • skip_model_update: The value for that series will not be used to update the model. Unless you also specify skip_result, the results will be created as usual. This action is suitable when certain values are expected to be consistently anomalous and they affect the model in a way that negatively impacts the rest of the results.

          Values are skip_result or skip_model_update. Default value is ["skip_result"].

        • conditions array[object]

          An array of numeric conditions when the rule applies. A rule must either have a non-empty scope or at least one condition. Multiple conditions are combined together with a logical AND.

        • scope object

          A scope of series where the rule applies. A rule must either have a non-empty scope or at least one condition. By default, the scope includes all series. Scoping is allowed for any of the fields that are also specified in by_field_name, over_field_name, or partition_field_name.

      • detector_description string

        A description of the detector.

      • detector_index number

        A unique identifier for the detector. This identifier is based on the order of the detectors in the analysis_config, starting at zero. If you specify a value for this property, it is ignored.

      • exclude_frequent string

        If set, frequent entities are excluded from influencing the anomaly results. Entities can be considered frequent over time or frequent in a population. If you are working with both over and by fields, you can set exclude_frequent to all for both fields, or to by or over for those specific fields.

        Values are all, none, by, or over.

      • field_name string

        The field that the detector uses in the function. If you use an event rate function such as count or rare, do not specify this field. The field_name cannot contain double quotes or backslashes.

      • function string

        The analysis function that is used. For example, count, rare, mean, min, max, or sum.

      • over_field_name string

        The field used to split the data. In particular, this property is used for analyzing the splits with respect to the history of all splits. It is used for finding unusual values in the population of all splits.

      • partition_field_name string

        The field used to segment the analysis. When you use this property, you have completely independent baselines for each value of this field.

      • use_null boolean

        Defines whether a new series is used as the null series when there is no value for the by or partition fields.

        Default value is false.

    • influencers array[string]

      A comma separated list of influencer field names. Typically these can be the by, over, or partition fields that are used in the detector configuration. You might also want to use a field name that is not specifically named in a detector, but is available as part of the input data. When you use multiple detectors, the use of influencers is recommended as it aggregates results for each influencer entity.

    • latency string

      The size of the window in which to expect data that is out of time order. If you specify a non-zero value, it must be greater than or equal to one second. NOTE: Latency is applicable only when you send data by using the post data API.

    • model_prune_window string

      Advanced configuration option. Affects the pruning of models that have not been updated for the given time duration. The value must be set to a multiple of the bucket_span. If set too low, important information may be removed from the model. For jobs created in 8.1 and later, the default value is the greater of 30d or 20 times bucket_span.

    • multivariate_by_fields boolean

      This functionality is reserved for internal use. It is not supported for use in customer environments and is not subject to the support SLA of official GA features. If set to true, the analysis will automatically find correlations between metrics for a given by field value and report anomalies when those correlations cease to hold. For example, suppose CPU and memory usage on host A is usually highly correlated with the same metrics on host B. Perhaps this correlation occurs because they are running a load-balanced application. If you enable this property, anomalies will be reported when, for example, CPU usage on host A is high and the value of CPU usage on host B is low. That is to say, you’ll see an anomaly when the CPU of host A is unusual given the CPU of host B. To use the multivariate_by_fields property, you must also specify by_field_name in your detector.

    • per_partition_categorization object

      Settings related to how categorization interacts with partition fields.

      Hide per_partition_categorization attributes Show per_partition_categorization attributes object
      • enabled boolean

        To enable this setting, you must also set the partition_field_name property to the same value in every detector that uses the keyword mlcategory. Otherwise, job creation fails.

      • stop_on_warn boolean

        This setting can be set to true only if per-partition categorization is enabled. If true, both categorization and subsequent anomaly detection stops for partitions where the categorization status changes to warn. This setting makes it viable to have a job where it is expected that categorization works well for some partitions but not others; you do not pay the cost of bad categorization forever in the partitions where it works badly.

    • summary_count_field_name string

      If this property is specified, the data that is fed to the job is expected to be pre-summarized. This property value is the name of the field that contains the count of raw data points that have been summarized. The same summary_count_field_name applies to all detectors in the job. NOTE: The summary_count_field_name property cannot be used with the metric function.

  • analysis_limits object

    Limits can be applied for the resources required to hold the mathematical models in memory. These limits are approximate and can be set per job. They do not control the memory used by other processes, for example the Elasticsearch Java processes.

    Hide analysis_limits attributes Show analysis_limits attributes object
    • categorization_examples_limit number

      The maximum number of examples stored per category in memory and in the results data store. If you increase this value, more examples are available, however it requires that you have more storage available. If you set this value to 0, no examples are stored. NOTE: The categorization_examples_limit applies only to analysis that uses categorization.

      Default value is 4.

    • model_memory_limit number | string

      The approximate maximum amount of memory resources that are required for analytical processing. Once this limit is approached, data pruning becomes more aggressive. Upon exceeding this limit, new entities are not modeled. If the xpack.ml.max_model_memory_limit setting has a value greater than 0 and less than 1024mb, that value is used instead of the default. The default value is relatively small to ensure that high resource usage is a conscious decision. If you have jobs that are expected to analyze high cardinality fields, you will likely need to use a higher value. If you specify a number instead of a string, the units are assumed to be MiB. Specifying a string is recommended for clarity. If you specify a byte size unit of b or kb and the number does not equate to a discrete number of megabytes, it is rounded down to the closest MiB. The minimum valid value is 1 MiB. If you specify a value less than 1 MiB, an error occurs. If you specify a value for the xpack.ml.max_model_memory_limit setting, an error occurs when you try to create jobs that have model_memory_limit values greater than that setting value.

      One of:

      The approximate maximum amount of memory resources that are required for analytical processing. Once this limit is approached, data pruning becomes more aggressive. Upon exceeding this limit, new entities are not modeled. If the xpack.ml.max_model_memory_limit setting has a value greater than 0 and less than 1024mb, that value is used instead of the default. The default value is relatively small to ensure that high resource usage is a conscious decision. If you have jobs that are expected to analyze high cardinality fields, you will likely need to use a higher value. If you specify a number instead of a string, the units are assumed to be MiB. Specifying a string is recommended for clarity. If you specify a byte size unit of b or kb and the number does not equate to a discrete number of megabytes, it is rounded down to the closest MiB. The minimum valid value is 1 MiB. If you specify a value less than 1 MiB, an error occurs. If you specify a value for the xpack.ml.max_model_memory_limit setting, an error occurs when you try to create jobs that have model_memory_limit values greater than that setting value.

  • background_persist_interval string

    Advanced configuration option. The time between each periodic persistence of the model. The default value is a randomized value between 3 to 4 hours, which avoids all jobs persisting at exactly the same time. The smallest allowed value is 1 hour. For very large models (several GB), persistence could take 10-20 minutes, so do not set the background_persist_interval value too low.

  • custom_settings object

    Advanced configuration option. Contains custom meta data about the job.

  • daily_model_snapshot_retention_after_days number

    Advanced configuration option, which affects the automatic removal of old model snapshots for this job. It specifies a period of time (in days) after which only the first snapshot per day is retained. This period is relative to the timestamp of the most recent snapshot for this job. Valid values range from 0 to model_snapshot_retention_days.

    Default value is 1.

  • data_description object Required

    Defines the format of the input data when you send data to the job by using the post data API. Note that when configure a datafeed, these properties are automatically set. When data is received via the post data API, it is not stored in Elasticsearch. Only the results for anomaly detection are retained.

    Hide data_description attributes Show data_description attributes object
    • format string

      Only JSON format is supported at this time.

    • time_field string

      The name of the field that contains the timestamp.

    • time_format string

      The time format, which can be epoch, epoch_ms, or a custom pattern. The value epoch refers to UNIX or Epoch time (the number of seconds since 1 Jan 1970). The value epoch_ms indicates that time is measured in milliseconds since the epoch. The epoch and epoch_ms time formats accept either integer or real values. Custom patterns must conform to the Java DateTimeFormatter class. When you use date-time formatting patterns, it is recommended that you provide the full date, time and time zone. For example: yyyy-MM-dd'T'HH:mm:ssX. If the pattern that you specify is not sufficient to produce a complete timestamp, job creation fails.

      Default value is epoch.

    • field_delimiter string
  • datafeed_config object

    Defines a datafeed for the anomaly detection job. If Elasticsearch security features are enabled, your datafeed remembers which roles the user who created it had at the time of creation and runs the query using those same roles. If you provide secondary authorization headers, those credentials are used instead.

    Hide datafeed_config attributes Show datafeed_config attributes object
    • aggregations object

      If set, the datafeed performs aggregation searches. Support for aggregations is limited and should be used only with low cardinality data.

    • chunking_config object

      Datafeeds might be required to search over long time periods, for several months or years. This search is split into time chunks in order to ensure the load on Elasticsearch is managed. Chunking configuration controls how the size of these time chunks are calculated and is an advanced configuration option.

      Hide chunking_config attributes Show chunking_config attributes object
      • mode string Required

        If the mode is auto, the chunk size is dynamically calculated; this is the recommended value when the datafeed does not use aggregations. If the mode is manual, chunking is applied according to the specified time_span; use this mode when the datafeed uses aggregations. If the mode is off, no chunking is applied.

        Values are auto, manual, or off.

      • time_span string

        The time span that each search will be querying. This setting is applicable only when the mode is set to manual.

    • datafeed_id string

      A numerical character string that uniquely identifies the datafeed. This identifier can contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and underscores. It must start and end with alphanumeric characters. The default value is the job identifier.

    • delayed_data_check_config object

      Specifies whether the datafeed checks for missing data and the size of the window. The datafeed can optionally search over indices that have already been read in an effort to determine whether any data has subsequently been added to the index. If missing data is found, it is a good indication that the query_delay option is set too low and the data is being indexed after the datafeed has passed that moment in time. This check runs only on real-time datafeeds.

      Hide delayed_data_check_config attributes Show delayed_data_check_config attributes object
      • check_window string

        The window of time that is searched for late data. This window of time ends with the latest finalized bucket. It defaults to null, which causes an appropriate check_window to be calculated when the real-time datafeed runs. In particular, the default check_window span calculation is based on the maximum of 2h or 8 * bucket_span.

      • enabled boolean Required

        Specifies whether the datafeed periodically checks for delayed data.

    • frequency string

      The interval at which scheduled queries are made while the datafeed runs in real time. The default value is either the bucket span for short bucket spans, or, for longer bucket spans, a sensible fraction of the bucket span. For example: 150s. When frequency is shorter than the bucket span, interim results for the last (partial) bucket are written then eventually overwritten by the full bucket results. If the datafeed uses aggregations, this value must be divisible by the interval of the date histogram aggregation.

    • indices string | array[string]

      An array of index names. Wildcards are supported. If any indices are in remote clusters, the machine learning nodes must have the remote_cluster_client role.

    • indices_options object

      Specifies index expansion options that are used during search.

      Hide indices_options attributes Show indices_options attributes object
      • allow_no_indices boolean

        If false, the request returns an error if any wildcard expression, index alias, or _all value targets only missing or closed indices. This behavior applies even if the request targets other open indices. For example, a request targeting foo*,bar* returns an error if an index starts with foo but no index starts with bar.

      • expand_wildcards string | array[string]

        Type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. Supports comma-separated values, such as open,hidden.

        Supported values include:

        • all: Match any data stream or index, including hidden ones.
        • open: Match open, non-hidden indices. Also matches any non-hidden data stream.
        • closed: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.
        • hidden: Match hidden data streams and hidden indices. Must be combined with open, closed, or both.
        • none: Wildcard expressions are not accepted.
      • ignore_unavailable boolean

        If true, missing or closed indices are not included in the response.

        Default value is false.

      • ignore_throttled boolean

        If true, concrete, expanded or aliased indices are ignored when frozen.

        Default value is true.

    • job_id string
    • max_empty_searches number

      If a real-time datafeed has never seen any data (including during any initial training period) then it will automatically stop itself and close its associated job after this many real-time searches that return no documents. In other words, it will stop after frequency times max_empty_searches of real-time operation. If not set then a datafeed with no end time that sees no data will remain started until it is explicitly stopped.

    • query object

      The Elasticsearch query domain-specific language (DSL). This value corresponds to the query object in an Elasticsearch search POST body. All the options that are supported by Elasticsearch can be used, as this object is passed verbatim to Elasticsearch.

      External documentation
    • query_delay string

      The number of seconds behind real time that data is queried. For example, if data from 10:04 a.m. might not be searchable in Elasticsearch until 10:06 a.m., set this property to 120 seconds. The default value is randomly selected between 60s and 120s. This randomness improves the query performance when there are multiple jobs running on the same node.

    • runtime_mappings object

      Specifies runtime fields for the datafeed search.

      Hide runtime_mappings attribute Show runtime_mappings attribute object
      • * object Additional properties
        Hide * attributes Show * attributes object
        • fields object

          For type composite

          Hide fields attribute Show fields attribute object
          • * object Additional properties
        • fetch_fields array[object]

          For type lookup

        • format string

          A custom format for date type runtime fields.

        • input_field string

          For type lookup

        • target_field string

          For type lookup

        • target_index string

          For type lookup

        • script object

          Painless script executed at query time.

        • type string Required

          Field type, which can be: boolean, composite, date, double, geo_point, ip,keyword, long, or lookup.

          Values are boolean, composite, date, double, geo_point, geo_shape, ip, keyword, long, or lookup.

    • script_fields object

      Specifies scripts that evaluate custom expressions and returns script fields to the datafeed. The detector configuration objects in a job can contain functions that use these script fields.

      Hide script_fields attribute Show script_fields attribute object
      • * object Additional properties
        Hide * attributes Show * attributes object
        • script object Required
          Hide script attributes Show script attributes object
          • source string

            The script source.

          • params object

            Specifies any named parameters that are passed into the script as variables. Use parameters instead of hard-coded values to decrease compile time.

          • options object
        • ignore_failure boolean
    • scroll_size number

      The size parameter that is used in Elasticsearch searches when the datafeed does not use aggregations. The maximum value is the value of index.max_result_window, which is 10,000 by default.

      Default value is 1000.

  • description string

    A description of the job.

  • job_id string

    The identifier for the anomaly detection job. This identifier can contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and underscores. It must start and end with alphanumeric characters.

  • groups array[string]

    A list of job groups. A job can belong to no groups or many.

  • model_plot_config object

    This advanced configuration option stores model information along with the results. It provides a more detailed view into anomaly detection. If you enable model plot it can add considerable overhead to the performance of the system; it is not feasible for jobs with many entities. Model plot provides a simplified and indicative view of the model and its bounds. It does not display complex features such as multivariate correlations or multimodal data. As such, anomalies may occasionally be reported which cannot be seen in the model plot. Model plot config can be configured when the job is created or updated later. It must be disabled if performance issues are experienced.

    Hide model_plot_config attributes Show model_plot_config attributes object
    • annotations_enabled boolean Generally available; Added in 7.9.0

      If true, enables calculation and storage of the model change annotations for each entity that is being analyzed.

      Default value is true.

    • enabled boolean

      If true, enables calculation and storage of the model bounds for each entity that is being analyzed.

      Default value is false.

    • terms string

      Limits data collection to this comma separated list of partition or by field values. If terms are not specified or it is an empty string, no filtering is applied. Wildcards are not supported. Only the specified terms can be viewed when using the Single Metric Viewer.

  • model_snapshot_retention_days number

    Advanced configuration option, which affects the automatic removal of old model snapshots for this job. It specifies the maximum period of time (in days) that snapshots are retained. This period is relative to the timestamp of the most recent snapshot for this job. By default, snapshots ten days older than the newest snapshot are deleted.

    Default value is 10.

  • renormalization_window_days number

    Advanced configuration option. The period over which adjustments to the score are applied, as new data is seen. The default value is the longer of 30 days or 100 bucket spans.

  • results_index_name string

    A text string that affects the name of the machine learning results index. By default, the job generates an index named .ml-anomalies-shared.

  • results_retention_days number

    Advanced configuration option. The period of time (in days) that results are retained. Age is calculated relative to the timestamp of the latest bucket result. If this property has a non-null value, once per day at 00:30 (server time), results that are the specified number of days older than the latest bucket result are deleted from Elasticsearch. The default value is null, which means all results are retained. Annotations generated by the system also count as results for retention purposes; they are deleted after the same number of days as results. Annotations added by users are retained forever.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • allow_lazy_open boolean Required
    • analysis_config object Required
      Hide analysis_config attributes Show analysis_config attributes object
      • bucket_span string Required

        The size of the interval that the analysis is aggregated into, typically between 5m and 1h.

      • categorization_analyzer string | object

        If categorization_field_name is specified, you can also define the analyzer that is used to interpret the categorization field. This property cannot be used at the same time as categorization_filters. The categorization analyzer specifies how the categorization_field is interpreted by the categorization process.

        One of:

        If categorization_field_name is specified, you can also define the analyzer that is used to interpret the categorization field. This property cannot be used at the same time as categorization_filters. The categorization analyzer specifies how the categorization_field is interpreted by the categorization process.

      • categorization_field_name string

        If this property is specified, the values of the specified field will be categorized. The resulting categories must be used in a detector by setting by_field_name, over_field_name, or partition_field_name to the keyword mlcategory.

      • categorization_filters array[string]

        If categorization_field_name is specified, you can also define optional filters. This property expects an array of regular expressions. The expressions are used to filter out matching sequences from the categorization field values.

      • detectors array[object] Required

        An array of detector configuration objects. Detector configuration objects specify which data fields a job analyzes. They also specify which analytical functions are used. You can specify multiple detectors for a job.

        Hide detectors attributes Show detectors attributes object
        • by_field_name string

          The field used to split the data. In particular, this property is used for analyzing the splits with respect to their own history. It is used for finding unusual values in the context of the split.

        • custom_rules array[object]

          An array of custom rule objects, which enable you to customize the way detectors operate. For example, a rule may dictate to the detector conditions under which results should be skipped. Kibana refers to custom rules as job rules.

        • detector_description string

          A description of the detector.

        • detector_index number

          A unique identifier for the detector. This identifier is based on the order of the detectors in the analysis_config, starting at zero.

        • exclude_frequent string

          Contains one of the following values: all, none, by, or over. If set, frequent entities are excluded from influencing the anomaly results. Entities can be considered frequent over time or frequent in a population. If you are working with both over and by fields, then you can set exclude_frequent to all for both fields, or to by or over for those specific fields.

          Values are all, none, by, or over.

        • field_name string

          The field that the detector uses in the function. If you use an event rate function such as count or rare, do not specify this field.

        • function string Required

          The analysis function that is used. For example, count, rare, mean, min, max, and sum.

        • over_field_name string

          The field used to split the data. In particular, this property is used for analyzing the splits with respect to the history of all splits. It is used for finding unusual values in the population of all splits.

        • partition_field_name string

          The field used to segment the analysis. When you use this property, you have completely independent baselines for each value of this field.

        • use_null boolean

          Defines whether a new series is used as the null series when there is no value for the by or partition fields.

          Default value is false.

      • influencers array[string] Required

        A comma separated list of influencer field names. Typically these can be the by, over, or partition fields that are used in the detector configuration. You might also want to use a field name that is not specifically named in a detector, but is available as part of the input data. When you use multiple detectors, the use of influencers is recommended as it aggregates results for each influencer entity.

      • model_prune_window string

        Advanced configuration option. Affects the pruning of models that have not been updated for the given time duration. The value must be set to a multiple of the bucket_span. If set too low, important information may be removed from the model. Typically, set to 30d or longer. If not set, model pruning only occurs if the model memory status reaches the soft limit or the hard limit. For jobs created in 8.1 and later, the default value is the greater of 30d or 20 times bucket_span.

      • latency string

        The size of the window in which to expect data that is out of time order. Defaults to no latency. If you specify a non-zero value, it must be greater than or equal to one second.

      • multivariate_by_fields boolean

        This functionality is reserved for internal use. It is not supported for use in customer environments and is not subject to the support SLA of official GA features. If set to true, the analysis will automatically find correlations between metrics for a given by field value and report anomalies when those correlations cease to hold.

      • per_partition_categorization object

        Settings related to how categorization interacts with partition fields.

        Hide per_partition_categorization attributes Show per_partition_categorization attributes object
        • enabled boolean

          To enable this setting, you must also set the partition_field_name property to the same value in every detector that uses the keyword mlcategory. Otherwise, job creation fails.

        • stop_on_warn boolean

          This setting can be set to true only if per-partition categorization is enabled. If true, both categorization and subsequent anomaly detection stops for partitions where the categorization status changes to warn. This setting makes it viable to have a job where it is expected that categorization works well for some partitions but not others; you do not pay the cost of bad categorization forever in the partitions where it works badly.

      • summary_count_field_name string

        If this property is specified, the data that is fed to the job is expected to be pre-summarized. This property value is the name of the field that contains the count of raw data points that have been summarized. The same summary_count_field_name applies to all detectors in the job.

    • analysis_limits object Required
      Hide analysis_limits attributes Show analysis_limits attributes object
      • categorization_examples_limit number

        The maximum number of examples stored per category in memory and in the results data store. If you increase this value, more examples are available, however it requires that you have more storage available. If you set this value to 0, no examples are stored. NOTE: The categorization_examples_limit applies only to analysis that uses categorization.

        Default value is 4.

      • model_memory_limit number | string

        The approximate maximum amount of memory resources that are required for analytical processing. Once this limit is approached, data pruning becomes more aggressive. Upon exceeding this limit, new entities are not modeled. If the xpack.ml.max_model_memory_limit setting has a value greater than 0 and less than 1024mb, that value is used instead of the default. The default value is relatively small to ensure that high resource usage is a conscious decision. If you have jobs that are expected to analyze high cardinality fields, you will likely need to use a higher value. If you specify a number instead of a string, the units are assumed to be MiB. Specifying a string is recommended for clarity. If you specify a byte size unit of b or kb and the number does not equate to a discrete number of megabytes, it is rounded down to the closest MiB. The minimum valid value is 1 MiB. If you specify a value less than 1 MiB, an error occurs. If you specify a value for the xpack.ml.max_model_memory_limit setting, an error occurs when you try to create jobs that have model_memory_limit values greater than that setting value.

        One of:

        The approximate maximum amount of memory resources that are required for analytical processing. Once this limit is approached, data pruning becomes more aggressive. Upon exceeding this limit, new entities are not modeled. If the xpack.ml.max_model_memory_limit setting has a value greater than 0 and less than 1024mb, that value is used instead of the default. The default value is relatively small to ensure that high resource usage is a conscious decision. If you have jobs that are expected to analyze high cardinality fields, you will likely need to use a higher value. If you specify a number instead of a string, the units are assumed to be MiB. Specifying a string is recommended for clarity. If you specify a byte size unit of b or kb and the number does not equate to a discrete number of megabytes, it is rounded down to the closest MiB. The minimum valid value is 1 MiB. If you specify a value less than 1 MiB, an error occurs. If you specify a value for the xpack.ml.max_model_memory_limit setting, an error occurs when you try to create jobs that have model_memory_limit values greater than that setting value.

    • background_persist_interval string

      A duration. Units can be nanos, micros, ms (milliseconds), s (seconds), m (minutes), h (hours) and d (days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.

    • create_time string | number

      One of:
    • custom_settings object

      Custom metadata about the job

    • daily_model_snapshot_retention_after_days number Required
    • data_description object Required
      Hide data_description attributes Show data_description attributes object
      • format string

        Only JSON format is supported at this time.

      • time_field string

        The name of the field that contains the timestamp.

      • time_format string

        The time format, which can be epoch, epoch_ms, or a custom pattern. The value epoch refers to UNIX or Epoch time (the number of seconds since 1 Jan 1970). The value epoch_ms indicates that time is measured in milliseconds since the epoch. The epoch and epoch_ms time formats accept either integer or real values. Custom patterns must conform to the Java DateTimeFormatter class. When you use date-time formatting patterns, it is recommended that you provide the full date, time and time zone. For example: yyyy-MM-dd'T'HH:mm:ssX. If the pattern that you specify is not sufficient to produce a complete timestamp, job creation fails.

        Default value is epoch.

      • field_delimiter string
    • datafeed_config object
      Hide datafeed_config attributes Show datafeed_config attributes object
      • aggregations object
      • authorization object

        The security privileges that the datafeed uses to run its queries. If Elastic Stack security features were disabled at the time of the most recent update to the datafeed, this property is omitted.

        Hide authorization attributes Show authorization attributes object
        • api_key object

          If an API key was used for the most recent update to the datafeed, its name and identifier are listed in the response.

        • roles array[string]

          If a user ID was used for the most recent update to the datafeed, its roles at the time of the update are listed in the response.

        • service_account string

          If a service account was used for the most recent update to the datafeed, the account name is listed in the response.

      • chunking_config object
        Hide chunking_config attributes Show chunking_config attributes object
        • mode string Required

          If the mode is auto, the chunk size is dynamically calculated; this is the recommended value when the datafeed does not use aggregations. If the mode is manual, chunking is applied according to the specified time_span; use this mode when the datafeed uses aggregations. If the mode is off, no chunking is applied.

          Values are auto, manual, or off.

        • time_span string

          The time span that each search will be querying. This setting is applicable only when the mode is set to manual.

      • datafeed_id string Required
      • frequency string

        A duration. Units can be nanos, micros, ms (milliseconds), s (seconds), m (minutes), h (hours) and d (days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.

      • indices array[string] Required
      • indexes array[string]
      • job_id string Required
      • max_empty_searches number
      • query_delay string

        A duration. Units can be nanos, micros, ms (milliseconds), s (seconds), m (minutes), h (hours) and d (days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.

      • script_fields object
        Hide script_fields attribute Show script_fields attribute object
        • * object Additional properties
          Hide * attributes Show * attributes object
          • script object Required
          • ignore_failure boolean
      • scroll_size number
      • delayed_data_check_config object Required
        Hide delayed_data_check_config attributes Show delayed_data_check_config attributes object
        • check_window string

          The window of time that is searched for late data. This window of time ends with the latest finalized bucket. It defaults to null, which causes an appropriate check_window to be calculated when the real-time datafeed runs. In particular, the default check_window span calculation is based on the maximum of 2h or 8 * bucket_span.

        • enabled boolean Required

          Specifies whether the datafeed periodically checks for delayed data.

      • runtime_mappings object
        Hide runtime_mappings attribute Show runtime_mappings attribute object
        • * object Additional properties
          Hide * attributes Show * attributes object
          • fields object

            For type composite

          • fetch_fields array[object]

            For type lookup

          • format string

            A custom format for date type runtime fields.

      • indices_options object

        Controls how to deal with unavailable concrete indices (closed or missing), how wildcard expressions are expanded to actual indices (all, closed or open indices) and how to deal with wildcard expressions that resolve to no indices.

        Hide indices_options attributes Show indices_options attributes object
        • allow_no_indices boolean

          If false, the request returns an error if any wildcard expression, index alias, or _all value targets only missing or closed indices. This behavior applies even if the request targets other open indices. For example, a request targeting foo*,bar* returns an error if an index starts with foo but no index starts with bar.

        • expand_wildcards string | array[string]

          Type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. Supports comma-separated values, such as open,hidden.

          Supported values include:

          • all: Match any data stream or index, including hidden ones.
          • open: Match open, non-hidden indices. Also matches any non-hidden data stream.
          • closed: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.
          • hidden: Match hidden data streams and hidden indices. Must be combined with open, closed, or both.
          • none: Wildcard expressions are not accepted.
        • ignore_unavailable boolean

          If true, missing or closed indices are not included in the response.

          Default value is false.

        • ignore_throttled boolean

          If true, concrete, expanded or aliased indices are ignored when frozen.

          Default value is true.

      • query object Required

        The Elasticsearch query domain-specific language (DSL). This value corresponds to the query object in an Elasticsearch search POST body. All the options that are supported by Elasticsearch can be used, as this object is passed verbatim to Elasticsearch. By default, this property has the following value: {"match_all": {"boost": 1}}.

        Query DSL
    • description string
    • groups array[string]
    • job_id string Required
    • job_type string Required
    • job_version string Required
    • model_plot_config object
      Hide model_plot_config attributes Show model_plot_config attributes object
      • annotations_enabled boolean Generally available; Added in 7.9.0

        If true, enables calculation and storage of the model change annotations for each entity that is being analyzed.

        Default value is true.

      • enabled boolean

        If true, enables calculation and storage of the model bounds for each entity that is being analyzed.

        Default value is false.

      • terms string

        Limits data collection to this comma separated list of partition or by field values. If terms are not specified or it is an empty string, no filtering is applied. Wildcards are not supported. Only the specified terms can be viewed when using the Single Metric Viewer.

    • model_snapshot_id string
    • model_snapshot_retention_days number Required
    • renormalization_window_days number
    • results_index_name string Required
    • results_retention_days number
PUT /_ml/anomaly_detectors/{job_id}
PUT /_ml/anomaly_detectors/job-01
{
  "analysis_config": {
    "bucket_span": "15m",
    "detectors": [
      {
        "detector_description": "Sum of bytes",
        "function": "sum",
        "field_name": "bytes"
      }
    ]
  },
  "data_description": {
    "time_field": "timestamp",
    "time_format": "epoch_ms"
  },
  "analysis_limits": {
    "model_memory_limit": "11MB"
  },
  "model_plot_config": {
    "enabled": true,
    "annotations_enabled": true
  },
  "results_index_name": "test-job1",
  "datafeed_config": {
    "indices": [
      "kibana_sample_data_logs"
    ],
    "query": {
      "bool": {
        "must": [
          {
            "match_all": {}
          }
        ]
      }
    },
    "runtime_mappings": {
      "hour_of_day": {
        "type": "long",
        "script": {
          "source": "emit(doc['timestamp'].value.getHour());"
        }
      }
    },
    "datafeed_id": "datafeed-test-job1"
  }
}
resp = client.ml.put_job(
    job_id="job-01",
    analysis_config={
        "bucket_span": "15m",
        "detectors": [
            {
                "detector_description": "Sum of bytes",
                "function": "sum",
                "field_name": "bytes"
            }
        ]
    },
    data_description={
        "time_field": "timestamp",
        "time_format": "epoch_ms"
    },
    analysis_limits={
        "model_memory_limit": "11MB"
    },
    model_plot_config={
        "enabled": True,
        "annotations_enabled": True
    },
    results_index_name="test-job1",
    datafeed_config={
        "indices": [
            "kibana_sample_data_logs"
        ],
        "query": {
            "bool": {
                "must": [
                    {
                        "match_all": {}
                    }
                ]
            }
        },
        "runtime_mappings": {
            "hour_of_day": {
                "type": "long",
                "script": {
                    "source": "emit(doc['timestamp'].value.getHour());"
                }
            }
        },
        "datafeed_id": "datafeed-test-job1"
    },
)
const response = await client.ml.putJob({
  job_id: "job-01",
  analysis_config: {
    bucket_span: "15m",
    detectors: [
      {
        detector_description: "Sum of bytes",
        function: "sum",
        field_name: "bytes",
      },
    ],
  },
  data_description: {
    time_field: "timestamp",
    time_format: "epoch_ms",
  },
  analysis_limits: {
    model_memory_limit: "11MB",
  },
  model_plot_config: {
    enabled: true,
    annotations_enabled: true,
  },
  results_index_name: "test-job1",
  datafeed_config: {
    indices: ["kibana_sample_data_logs"],
    query: {
      bool: {
        must: [
          {
            match_all: {},
          },
        ],
      },
    },
    runtime_mappings: {
      hour_of_day: {
        type: "long",
        script: {
          source: "emit(doc['timestamp'].value.getHour());",
        },
      },
    },
    datafeed_id: "datafeed-test-job1",
  },
});
response = client.ml.put_job(
  job_id: "job-01",
  body: {
    "analysis_config": {
      "bucket_span": "15m",
      "detectors": [
        {
          "detector_description": "Sum of bytes",
          "function": "sum",
          "field_name": "bytes"
        }
      ]
    },
    "data_description": {
      "time_field": "timestamp",
      "time_format": "epoch_ms"
    },
    "analysis_limits": {
      "model_memory_limit": "11MB"
    },
    "model_plot_config": {
      "enabled": true,
      "annotations_enabled": true
    },
    "results_index_name": "test-job1",
    "datafeed_config": {
      "indices": [
        "kibana_sample_data_logs"
      ],
      "query": {
        "bool": {
          "must": [
            {
              "match_all": {}
            }
          ]
        }
      },
      "runtime_mappings": {
        "hour_of_day": {
          "type": "long",
          "script": {
            "source": "emit(doc['timestamp'].value.getHour());"
          }
        }
      },
      "datafeed_id": "datafeed-test-job1"
    }
  }
)
$resp = $client->ml()->putJob([
    "job_id" => "job-01",
    "body" => [
        "analysis_config" => [
            "bucket_span" => "15m",
            "detectors" => array(
                [
                    "detector_description" => "Sum of bytes",
                    "function" => "sum",
                    "field_name" => "bytes",
                ],
            ),
        ],
        "data_description" => [
            "time_field" => "timestamp",
            "time_format" => "epoch_ms",
        ],
        "analysis_limits" => [
            "model_memory_limit" => "11MB",
        ],
        "model_plot_config" => [
            "enabled" => true,
            "annotations_enabled" => true,
        ],
        "results_index_name" => "test-job1",
        "datafeed_config" => [
            "indices" => array(
                "kibana_sample_data_logs",
            ),
            "query" => [
                "bool" => [
                    "must" => array(
                        [
                            "match_all" => new ArrayObject([]),
                        ],
                    ),
                ],
            ],
            "runtime_mappings" => [
                "hour_of_day" => [
                    "type" => "long",
                    "script" => [
                        "source" => "emit(doc['timestamp'].value.getHour());",
                    ],
                ],
            ],
            "datafeed_id" => "datafeed-test-job1",
        ],
    ],
]);
curl -X PUT -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"analysis_config":{"bucket_span":"15m","detectors":[{"detector_description":"Sum of bytes","function":"sum","field_name":"bytes"}]},"data_description":{"time_field":"timestamp","time_format":"epoch_ms"},"analysis_limits":{"model_memory_limit":"11MB"},"model_plot_config":{"enabled":true,"annotations_enabled":true},"results_index_name":"test-job1","datafeed_config":{"indices":["kibana_sample_data_logs"],"query":{"bool":{"must":[{"match_all":{}}]}},"runtime_mappings":{"hour_of_day":{"type":"long","script":{"source":"emit(doc['"'"'timestamp'"'"'].value.getHour());"}}},"datafeed_id":"datafeed-test-job1"}}' "$ELASTICSEARCH_URL/_ml/anomaly_detectors/job-01"
client.ml().putJob(p -> p
    .analysisConfig(a -> a
        .bucketSpan(b -> b
            .time("15m")
        )
        .detectors(d -> d
            .detectorDescription("Sum of bytes")
            .fieldName("bytes")
            .function("sum")
        )
    )
    .analysisLimits(an -> an
        .modelMemoryLimit("11MB")
    )
    .dataDescription(d -> d
        .timeField("timestamp")
        .timeFormat("epoch_ms")
    )
    .datafeedConfig(d -> d
        .datafeedId("datafeed-test-job1")
        .indices("kibana_sample_data_logs")
        .query(q -> q
            .bool(b -> b
                .must(m -> m
                    .matchAll(ma -> ma)
                )
            )
        )
        .runtimeMappings("hour_of_day", r -> r
            .script(s -> s
                .source(so -> so
                    .scriptString("emit(doc['timestamp'].value.getHour());")
                )
            )
            .type(RuntimeFieldType.Long)
        )
    )
    .jobId("job-01")
    .modelPlotConfig(m -> m
        .annotationsEnabled(true)
        .enabled(true)
    )
    .resultsIndexName("test-job1")
);
Request example
A request to create an anomaly detection job and datafeed.
{
  "analysis_config": {
    "bucket_span": "15m",
    "detectors": [
      {
        "detector_description": "Sum of bytes",
        "function": "sum",
        "field_name": "bytes"
      }
    ]
  },
  "data_description": {
    "time_field": "timestamp",
    "time_format": "epoch_ms"
  },
  "analysis_limits": {
    "model_memory_limit": "11MB"
  },
  "model_plot_config": {
    "enabled": true,
    "annotations_enabled": true
  },
  "results_index_name": "test-job1",
  "datafeed_config": {
    "indices": [
      "kibana_sample_data_logs"
    ],
    "query": {
      "bool": {
        "must": [
          {
            "match_all": {}
          }
        ]
      }
    },
    "runtime_mappings": {
      "hour_of_day": {
        "type": "long",
        "script": {
          "source": "emit(doc['timestamp'].value.getHour());"
        }
      }
    },
    "datafeed_id": "datafeed-test-job1"
  }
}
Response examples (200)
A successful response when creating an anomaly detection job and datafeed.
{
  "job_id": "test-job1",
  "job_type": "anomaly_detector",
  "job_version": "8.4.0",
  "create_time": 1656087283340,
  "datafeed_config": {
    "datafeed_id": "datafeed-test-job1",
    "job_id": "test-job1",
    "authorization": {
      "roles": [
        "superuser"
      ]
    },
    "query_delay": "61499ms",
    "chunking_config": {
      "mode": "auto"
    },
    "indices_options": {
      "expand_wildcards": [
        "open"
      ],
      "ignore_unavailable": false,
      "allow_no_indices": true,
      "ignore_throttled": true
    },
    "query": {
      "bool": {
        "must": [
          {
            "match_all": {}
          }
        ]
      }
    },
    "indices": [
      "kibana_sample_data_logs"
    ],
    "scroll_size": 1000,
    "delayed_data_check_config": {
      "enabled": true
    },
    "runtime_mappings": {
      "hour_of_day": {
        "type": "long",
        "script": {
          "source": "emit(doc['timestamp'].value.getHour());"
        }
      }
    }
  },
  "analysis_config": {
    "bucket_span": "15m",
    "detectors": [
      {
        "detector_description": "Sum of bytes",
        "function": "sum",
        "field_name": "bytes",
        "detector_index": 0
      }
    ],
    "influencers": [],
    "model_prune_window": "30d"
  },
  "analysis_limits": {
    "model_memory_limit": "11mb",
    "categorization_examples_limit": 4
  },
  "data_description": {
    "time_field": "timestamp",
    "time_format": "epoch_ms"
  },
  "model_plot_config": {
    "enabled": true,
    "annotations_enabled": true
  },
  "model_snapshot_retention_days": 10,
  "daily_model_snapshot_retention_after_days": 1,
  "results_index_name": "custom-test-job1",
  "allow_lazy_open": false
}





































































































































Get data frame analytics job stats Generally available; Added in 7.3.0

GET /_ml/data_frame/analytics/{id}/_stats

All methods and paths for this operation:

GET /_ml/data_frame/analytics/_stats

GET /_ml/data_frame/analytics/{id}/_stats

Required authorization

  • Cluster privileges: monitor_ml

Path parameters

  • id string Required

    Identifier for the data frame analytics job. If you do not specify this option, the API returns information for the first hundred data frame analytics jobs.

Query parameters

  • allow_no_match boolean

    Specifies what to do when the request:

    1. Contains wildcard expressions and there are no data frame analytics jobs that match.
    2. Contains the _all string or no identifiers and there are no matches.
    3. Contains wildcard expressions and there are only partial matches.

    The default value returns an empty data_frame_analytics array when there are no matches and the subset of results when there are partial matches. If this parameter is false, the request returns a 404 status code when there are no matches or only partial matches.

  • from number

    Skips the specified number of data frame analytics jobs.

  • size number

    Specifies the maximum number of data frame analytics jobs to obtain.

  • verbose boolean

    Defines whether the stats response should be verbose.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • count number Required
    • data_frame_analytics array[object] Required

      An array of objects that contain usage information for data frame analytics jobs, which are sorted by the id value in ascending order.

      Hide data_frame_analytics attributes Show data_frame_analytics attributes object
      • analysis_stats object

        An object containing information about the analysis job.

        Hide analysis_stats attributes Show analysis_stats attributes object
        • classification_stats object

          An object containing information about the classification analysis job.

        • outlier_detection_stats object

          An object containing information about the outlier detection job.

        • regression_stats object

          An object containing information about the regression analysis.

      • assignment_explanation string

        For running jobs only, contains messages relating to the selection of a node to run the job.

      • data_counts object Required

        An object that provides counts for the quantity of documents skipped, used in training, or available for testing.

        Hide data_counts attributes Show data_counts attributes object
        • skipped_docs_count number Required

          The number of documents that are skipped during the analysis because they contained values that are not supported by the analysis. For example, outlier detection does not support missing fields so it skips documents with missing fields. Likewise, all types of analysis skip documents that contain arrays with more than one element.

        • test_docs_count number Required

          The number of documents that are not used for training the model and can be used for testing.

        • training_docs_count number Required

          The number of documents that are used for training the model.

      • id string Required

        The unique identifier of the data frame analytics job.

      • memory_usage object Required

        An object describing memory usage of the analytics. It is present only after the job is started and memory usage is reported.

        Hide memory_usage attributes Show memory_usage attributes object
        • memory_reestimate_bytes number

          This value is present when the status is hard_limit and it is a new estimate of how much memory the job needs.

        • peak_usage_bytes number Required

          The number of bytes used at the highest peak of memory usage.

        • status string Required

          The memory usage status.

      • node object

        Contains properties for the node that runs the job. This information is available only for running jobs.

        Hide node attributes Show node attributes object
        • attributes object Required

          Lists node attributes.

          Hide attributes attribute Show attributes attribute object
          • * string Additional properties
        • ephemeral_id string Required

          The ephemeral ID of the node.

        • id string

          The unique identifier of the node.

        • name string Required

          The unique identifier of the node.

        • transport_address string Required

          The host and port where transport HTTP connections are accepted.

      • progress array[object] Required

        The progress report of the data frame analytics job by phase.

        Hide progress attributes Show progress attributes object
        • phase string Required

          Defines the phase of the data frame analytics job.

        • progress_percent number Required

          The progress that the data frame analytics job has made expressed in percentage.

      • state string Required

        The status of the data frame analytics job, which can be one of the following values: failed, started, starting, stopping, stopped.

        Values are started, stopped, starting, stopping, or failed.

GET /_ml/data_frame/analytics/{id}/_stats
GET _ml/data_frame/analytics/weblog-outliers/_stats
resp = client.ml.get_data_frame_analytics_stats(
    id="weblog-outliers",
)
const response = await client.ml.getDataFrameAnalyticsStats({
  id: "weblog-outliers",
});
response = client.ml.get_data_frame_analytics_stats(
  id: "weblog-outliers"
)
$resp = $client->ml()->getDataFrameAnalyticsStats([
    "id" => "weblog-outliers",
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_ml/data_frame/analytics/weblog-outliers/_stats"
client.ml().getDataFrameAnalyticsStats(g -> g
    .id("weblog-outliers")
);


















































































































































































































































Open a point in time Generally available; Added in 7.10.0

POST /{index}/_pit

A search request by default runs against the most recent visible data of the target indices, which is called point in time. Elasticsearch pit (point in time) is a lightweight view into the state of the data as it existed when initiated. In some cases, it’s preferred to perform multiple search requests using the same point in time. For example, if refreshes happen between search_after requests, then the results of those requests might not be consistent as changes happening between searches are only visible to the more recent point in time.

A point in time must be opened explicitly before being used in search requests.

A subsequent search request with the pit parameter must not specify index, routing, or preference values as these parameters are copied from the point in time.

Just like regular searches, you can use from and size to page through point in time search results, up to the first 10,000 hits. If you want to retrieve more hits, use PIT with search_after.

IMPORTANT: The open point in time request and each subsequent search request can return different identifiers; always use the most recently received ID for the next search request.

When a PIT that contains shard failures is used in a search request, the missing are always reported in the search response as a NoShardAvailableActionException exception. To get rid of these exceptions, a new PIT needs to be created so that shards missing from the previous PIT can be handled, assuming they become available in the meantime.

Keeping point in time alive

The keep_alive parameter, which is passed to a open point in time request and search request, extends the time to live of the corresponding point in time. The value does not need to be long enough to process all data — it just needs to be long enough for the next request.

Normally, the background merge process optimizes the index by merging together smaller segments to create new, bigger segments. Once the smaller segments are no longer needed they are deleted. However, open point-in-times prevent the old segments from being deleted since they are still in use.

TIP: Keeping older segments alive means that more disk space and file handles are needed. Ensure that you have configured your nodes to have ample free file handles.

Additionally, if a segment contains deleted or updated documents then the point in time must keep track of whether each document in the segment was live at the time of the initial search request. Ensure that your nodes have sufficient heap space if you have many open point-in-times on an index that is subject to ongoing deletes or updates. Note that a point-in-time doesn't prevent its associated indices from being deleted. You can check how many point-in-times (that is, search contexts) are open with the nodes stats API.

Required authorization

  • Index privileges: read

Path parameters

  • index string | array[string] Required

    A comma-separated list of index names to open point in time; use _all or empty string to perform the operation on all indices

Query parameters

  • keep_alive string Required

    Extend the length of time that the point in time persists.

    Values are -1 or 0.

  • ignore_unavailable boolean

    If false, the request returns an error if it targets a missing or closed index.

  • preference string

    The node or shard the operation should be performed on. By default, it is random.

  • routing string

    A custom value that is used to route operations to a specific shard.

  • expand_wildcards string | array[string]

    The type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. It supports comma-separated values, such as open,hidden.

    Supported values include:

    • all: Match any data stream or index, including hidden ones.
    • open: Match open, non-hidden indices. Also matches any non-hidden data stream.
    • closed: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.
    • hidden: Match hidden data streams and hidden indices. Must be combined with open, closed, or both.
    • none: Wildcard expressions are not accepted.

    Values are all, open, closed, hidden, or none.

  • allow_partial_search_results boolean

    Indicates whether the point in time tolerates unavailable shards or shard failures when initially creating the PIT. If false, creating a point in time request when a shard is missing or unavailable will throw an exception. If true, the point in time will contain all the shards that are available at the time of the request.

  • max_concurrent_shard_requests number

    Maximum number of concurrent shard requests that each sub-search request executes per node.

application/json

Body

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • _shards object Required

      Shards used to create the PIT

      Hide _shards attributes Show _shards attributes object
      • failed number Required

        The number of shards the operation or search attempted to run on but failed.

      • successful number Required

        The number of shards the operation or search succeeded on.

      • total number Required

        The number of shards the operation or search will run on overall.

      • failures array[object]
        Hide failures attributes Show failures attributes object
        • index string
        • node string
        • reason object Required

          Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

        • shard number
        • status string
        • primary boolean
      • skipped number
    • id string Required
POST /{index}/_pit
POST /my-index-000001/_pit?keep_alive=1m&allow_partial_search_results=true
resp = client.open_point_in_time(
    index="my-index-000001",
    keep_alive="1m",
    allow_partial_search_results=True,
)
const response = await client.openPointInTime({
  index: "my-index-000001",
  keep_alive: "1m",
  allow_partial_search_results: "true",
});
response = client.open_point_in_time(
  index: "my-index-000001",
  keep_alive: "1m",
  allow_partial_search_results: "true"
)
$resp = $client->openPointInTime([
    "index" => "my-index-000001",
    "keep_alive" => "1m",
    "allow_partial_search_results" => "true",
]);
curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/my-index-000001/_pit?keep_alive=1m&allow_partial_search_results=true"
client.openPointInTime(o -> o
    .allowPartialSearchResults(true)
    .index("my-index-000001")
    .keepAlive(k -> k
        .offset(1)
    )
);
Response examples (200)
A successful response from `POST /my-index-000001/_pit?keep_alive=1m&allow_partial_search_results=true`. It includes a summary of the total number of shards, as well as the number of successful shards when creating the PIT.
{
  "id": "46ToAwMDaWR5BXV1aWQyKwZub2RlXzMAAAAAAAAAACoBYwADaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQADaWR5BXV1aWQyKgZub2RlXzIAAAAAAAAAAAwBYgACBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA=",
  "_shards": {
    "total": 10,
    "successful": 10,
    "skipped": 0,
    "failed": 0
  }
}





















































































































































































































































































































































































































Get the snapshot lifecycle management status Generally available; Added in 7.6.0

GET /_slm/status

Required authorization

  • Cluster privileges: read_slm

Query parameters

  • master_timeout string

    The period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error. To indicate that the request should never timeout, set it to -1.

    Values are -1 or 0.

  • timeout string

    The period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error. To indicate that the request should never timeout, set it to -1.

    Values are -1 or 0.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • operation_mode string Required

      Values are RUNNING, STOPPING, or STOPPED.

GET /_slm/status
GET _slm/status
resp = client.slm.get_status()
const response = await client.slm.getStatus();
response = client.slm.get_status
$resp = $client->slm()->getStatus();
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_slm/status"
client.slm().getStatus(g -> g);
Response examples (200)
A successful response from `GET _slm/status`.
{
  "operation_mode": "RUNNING"
}































































Cancel a task Technical preview; Added in 2.3.0

POST /_tasks/{task_id}/_cancel

All methods and paths for this operation:

POST /_tasks/_cancel

POST /_tasks/{task_id}/_cancel

WARNING: The task management API is new and should still be considered a beta feature. The API may change in ways that are not backwards compatible.

A task may continue to run for some time after it has been cancelled because it may not be able to safely stop its current activity straight away. It is also possible that Elasticsearch must complete its work on other tasks before it can process the cancellation. The get task information API will continue to list these cancelled tasks until they complete. The cancelled flag in the response indicates that the cancellation command has been processed and the task will stop as soon as possible.

To troubleshoot why a cancelled task does not complete promptly, use the get task information API with the ?detailed parameter to identify the other tasks the system is running. You can also use the node hot threads API to obtain detailed information about the work the system is doing instead of completing the cancelled task.

Required authorization

  • Cluster privileges: manage

Path parameters

  • task_id string | number Required

    The task identifier.

Query parameters

  • actions string | array[string]

    A comma-separated list or wildcard expression of actions that is used to limit the request.

  • nodes array[string]

    A comma-separated list of node IDs or names that is used to limit the request.

  • parent_task_id string

    A parent task ID that is used to limit the tasks.

  • wait_for_completion boolean

    If true, the request blocks until all found tasks are complete.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • node_failures array[object]

      Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

      Hide node_failures attributes Show node_failures attributes object
      • type string Required

        The type of error

      • reason string | null

        A human-readable explanation of the error, in English.

      • stack_trace string

        The server stack trace. Present only if the error_trace=true parameter was sent with the request.

      • caused_by object

        Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

      • root_cause array[object]

        Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

        Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

      • suppressed array[object]

        Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

        Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

    • task_failures array[object]
      Hide task_failures attributes Show task_failures attributes object
      • task_id number Required
      • node_id string Required
      • status string Required
      • reason object Required

        Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

        Hide reason attributes Show reason attributes object
        • type string Required

          The type of error

        • reason string | null

          A human-readable explanation of the error, in English.

        • stack_trace string

          The server stack trace. Present only if the error_trace=true parameter was sent with the request.

        • caused_by object

          Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

        • root_cause array[object]

          Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

          Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

        • suppressed array[object]

          Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

          Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

    • nodes object

      Task information grouped by node, if group_by was set to node (the default).

      Hide nodes attribute Show nodes attribute object
      • * object Additional properties
        Hide * attributes Show * attributes object
        • name string
        • transport_address string
        • host string
        • ip string
        • roles array[string]
        • attributes object
          Hide attributes attribute Show attributes attribute object
          • * string Additional properties
        • tasks object Required
          Hide tasks attribute Show tasks attribute object
          • * object Additional properties
            Hide * attributes Show * attributes object
            • action string Required
            • cancelled boolean
            • cancellable boolean Required
            • description string

              Human readable text that identifies the particular request that the task is performing. For example, it might identify the search request being performed by a search task. Other kinds of tasks have different descriptions, like _reindex which has the source and the destination, or _bulk which just has the number of requests and the destination indices. Many requests will have only an empty description because more detailed information about the request is not easily available or particularly helpful in identifying the request.

            • headers object Required
              Hide headers attribute Show headers attribute object
              • * string Additional properties
            • id number Required
            • node string Required
            • running_time string

              A duration. Units can be nanos, micros, ms (milliseconds), s (seconds), m (minutes), h (hours) and d (days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.

            • running_time_in_nanos
            • start_time_in_millis
            • status object

              The internal status of the task, which varies from task to task. The format also varies. While the goal is to keep the status for a particular task consistent from version to version, this is not always possible because sometimes the implementation changes. Fields might be removed from the status for a particular request so any parsing you do of the status might break in minor releases.

            • type string Required
            • parent_task_id
    • tasks array[object] | object

      Either a flat list of tasks if group_by was set to none, or grouped by parents if group_by was set to parents.

      One of:

      Either a flat list of tasks if group_by was set to none, or grouped by parents if group_by was set to parents.

POST /_tasks/{task_id}/_cancel
POST _tasks/<task_id>/_cancel
resp = client.tasks.cancel(
    task_id="<task_id>",
)
const response = await client.tasks.cancel({
  task_id: "<task_id>",
});
response = client.tasks.cancel(
  task_id: "<task_id>"
)
$resp = $client->tasks()->cancel([
    "task_id" => "<task_id>",
]);
curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_tasks/<task_id>/_cancel"
client.tasks().cancel(c -> c
    .taskId("<task_id>")
);