Create or update an autoscaling policy
Added in 7.11.0
NOTE: This feature is designed for indirect use by Elasticsearch Service, Elastic Cloud Enterprise, and Elastic Cloud on Kubernetes. Direct use is not supported.
Path parameters
-
name
string Required the name of the autoscaling policy
Query parameters
-
master_timeout
string Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.
-
timeout
string Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.
Body
Required
-
roles
array[string] Required -
deciders
object Required Decider settings.
External documentation
curl \
--request PUT 'http://api.example.com/_autoscaling/policy/{name}' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"roles\": [],\n \"deciders\": {\n \"fixed\": {\n }\n }\n}"'
{
"roles": [],
"deciders": {
"fixed": {
}
}
}
{
"roles" : [ "data_hot" ],
"deciders": {
"fixed": {
}
}
}
{
"acknowledged": true
}
Create a behavioral analytics collection
Deprecated
Technical preview
Path parameters
-
name
string Required The name of the analytics collection to be created or updated.
curl \
--request PUT 'http://api.example.com/_application/analytics/{name}' \
--header "Authorization: $API_KEY"
Compact and aligned text (CAT)
The compact and aligned text (CAT) APIs aim are intended only for human consumption using the Kibana console or command line. They are not intended for use by applications. For application consumption, it's recommend to use a corresponding JSON API.
All the cat commands accept a query string parameter help
to see all the headers and info they provide, and the /_cat
command alone lists all the available commands.
Get shard allocation information
Get a snapshot of the number of shards allocated to each data node and their disk space.
IMPORTANT: CAT APIs are only intended for human consumption using the command line or Kibana console. They are not intended for use by applications.
Path parameters
-
node_id
string | array[string] Required A comma-separated list of node identifiers or names used to limit the returned information.
Query parameters
-
bytes
string The unit used to display byte values.
Values are
b
,kb
,mb
,gb
,tb
, orpb
. -
h
string | array[string] List of columns to appear in the response. Supports simple wildcards.
-
s
string | array[string] List of columns that determine how the table should be sorted. Sorting defaults to ascending and can be changed by setting
:asc
or:desc
as a suffix to the column name. -
local
boolean If
true
, the request computes the list of selected nodes from the local cluster state. Iffalse
the list of selected nodes are computed from the cluster state of the master node. In both cases the coordinating node will send requests for further information to each selected node. -
master_timeout
string Period to wait for a connection to the master node.
curl \
--request GET 'http://api.example.com/_cat/allocation/{node_id}' \
--header "Authorization: $API_KEY"
[
{
"shards": "1",
"shards.undesired": "0",
"write_load.forecast": "0.0",
"disk.indices.forecast": "260b",
"disk.indices": "260b",
"disk.used": "47.3gb",
"disk.avail": "43.4gb",
"disk.total": "100.7gb",
"disk.percent": "46",
"host": "127.0.0.1",
"ip": "127.0.0.1",
"node": "CSUXak2",
"node.role": "himrst"
}
]
Get a document count
Get quick access to a document count for a data stream, an index, or an entire cluster. The document count only includes live documents, not deleted documents which have not yet been removed by the merge process.
IMPORTANT: CAT APIs are only intended for human consumption using the command line or Kibana console. They are not intended for use by applications. For application consumption, use the count API.
curl \
--request GET 'http://api.example.com/_cat/count' \
--header "Authorization: $API_KEY"
[
{
"epoch": "1475868259",
"timestamp": "15:24:20",
"count": "120"
}
]
[
{
"epoch": "1475868259",
"timestamp": "15:24:20",
"count": "121"
}
]
Get the cluster health status
IMPORTANT: CAT APIs are only intended for human consumption using the command line or Kibana console.
They are not intended for use by applications. For application consumption, use the cluster health API.
This API is often used to check malfunctioning clusters.
To help you track cluster health alongside log files and alerting systems, the API returns timestamps in two formats:
HH:MM:SS
, which is human-readable but includes no date information;
Unix epoch time
, which is machine-sortable and includes date information.
The latter format is useful for cluster recoveries that take multiple days.
You can use the cat health API to verify cluster health across multiple nodes.
You also can use the API to track the recovery of a large cluster over a longer period of time.
Query parameters
-
time
string The unit used to display time values.
Values are
nanos
,micros
,ms
,s
,m
,h
, ord
. -
ts
boolean If true, returns
HH:MM:SS
and Unix epoch timestamps. -
h
string | array[string] List of columns to appear in the response. Supports simple wildcards.
-
s
string | array[string] List of columns that determine how the table should be sorted. Sorting defaults to ascending and can be changed by setting
:asc
or:desc
as a suffix to the column name.
curl \
--request GET 'http://api.example.com/_cat/health' \
--header "Authorization: $API_KEY"
[
{
"epoch": "1475871424",
"timestamp": "16:17:04",
"cluster": "elasticsearch",
"status": "green",
"node.total": "1",
"node.data": "1",
"shards": "1",
"pri": "1",
"relo": "0",
"init": "0",
"unassign": "0",
"unassign.pri": "0",
"pending_tasks": "0",
"max_task_wait_time": "-",
"active_shards_percent": "100.0%"
}
]
curl \
--request GET 'http://api.example.com/_cat' \
--header "Authorization: $API_KEY"
Get index information
Get high-level information about indices in a cluster, including backing indices for data streams.
Use this request to get the following information for each index in a cluster:
- shard count
- document count
- deleted document count
- primary store size
- total store size of all shards, including shard replicas
These metrics are retrieved directly from Lucene, which Elasticsearch uses internally to power indexing and search. As a result, all document counts include hidden nested documents. To get an accurate count of Elasticsearch documents, use the cat count or count APIs.
CAT APIs are only intended for human consumption using the command line or Kibana console. They are not intended for use by applications. For application consumption, use an index endpoint.
Query parameters
-
bytes
string The unit used to display byte values.
Values are
b
,kb
,mb
,gb
,tb
, orpb
. -
expand_wildcards
string | array[string] The type of index that wildcard patterns can match.
Supported values include:
all
: Match any data stream or index, including hidden ones.open
: Match open, non-hidden indices. Also matches any non-hidden data stream.closed
: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.hidden
: Match hidden data streams and hidden indices. Must be combined withopen
,closed
, orboth
.none
: Wildcard expressions are not accepted.
-
health
string The health status used to limit returned indices. By default, the response includes indices of any health status.
Supported values include:
green
(orGREEN
): All shards are assigned.yellow
(orYELLOW
): All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired.red
(orRED
): One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned.
Values are
green
,GREEN
,yellow
,YELLOW
,red
, orRED
. -
include_unloaded_segments
boolean If true, the response includes information from segments that are not loaded into memory.
-
pri
boolean If true, the response only includes information from primary shards.
-
time
string The unit used to display time values.
Values are
nanos
,micros
,ms
,s
,m
,h
, ord
. -
master_timeout
string Period to wait for a connection to the master node.
-
h
string | array[string] List of columns to appear in the response. Supports simple wildcards.
-
s
string | array[string] List of columns that determine how the table should be sorted. Sorting defaults to ascending and can be changed by setting
:asc
or:desc
as a suffix to the column name.
curl \
--request GET 'http://api.example.com/_cat/indices' \
--header "Authorization: $API_KEY"
[
{
"health": "yellow",
"status": "open",
"index": "my-index-000001",
"uuid": "u8FNjxh8Rfy_awN11oDKYQ",
"pri": "1",
"rep": "1",
"docs.count": "1200",
"docs.deleted": "0",
"store.size": "88.1kb",
"pri.store.size": "88.1kb",
"dataset.size": "88.1kb"
},
{
"health": "green",
"status": "open",
"index": "my-index-000002",
"uuid": "nYFWZEO7TUiOjLQXBaYJpA ",
"pri": "1",
"rep": "0",
"docs.count": "0",
"docs.deleted": "0",
"store.size": "260b",
"pri.store.size": "260b",
"dataset.size": "260b"
}
]
Get datafeeds
Added in 7.7.0
Get configuration and usage information about datafeeds.
This API returns a maximum of 10,000 datafeeds.
If the Elasticsearch security features are enabled, you must have monitor_ml
, monitor
, manage_ml
, or manage
cluster privileges to use this API.
IMPORTANT: CAT APIs are only intended for human consumption using the Kibana console or command line. They are not intended for use by applications. For application consumption, use the get datafeed statistics API.
Query parameters
-
allow_no_match
boolean Specifies what to do when the request:
- Contains wildcard expressions and there are no datafeeds that match.
- Contains the
_all
string or no identifiers and there are no matches. - Contains wildcard expressions and there are only partial matches.
If
true
, the API returns an empty datafeeds array when there are no matches and the subset of results when there are partial matches. Iffalse
, the API returns a 404 status code when there are no matches or only partial matches. -
h
string | array[string] Comma-separated list of column names to display.
Supported values include:
ae
(orassignment_explanation
): For started datafeeds only, contains messages relating to the selection of a node.bc
(orbuckets.count
,bucketsCount
): The number of buckets processed.id
: A numerical character string that uniquely identifies the datafeed.na
(ornode.address
,nodeAddress
): For started datafeeds only, the network address of the node where the datafeed is started.ne
(ornode.ephemeral_id
,nodeEphemeralId
): For started datafeeds only, the ephemeral ID of the node where the datafeed is started.ni
(ornode.id
,nodeId
): For started datafeeds only, the unique identifier of the node where the datafeed is started.nn
(ornode.name
,nodeName
): For started datafeeds only, the name of the node where the datafeed is started.sba
(orsearch.bucket_avg
,searchBucketAvg
): The average search time per bucket, in milliseconds.sc
(orsearch.count
,searchCount
): The number of searches run by the datafeed.seah
(orsearch.exp_avg_hour
,searchExpAvgHour
): The exponential average search time per hour, in milliseconds.st
(orsearch.time
,searchTime
): The total time the datafeed spent searching, in milliseconds.s
(orstate
): The status of the datafeed:starting
,started
,stopping
, orstopped
. Ifstarting
, the datafeed has been requested to start but has not yet started. Ifstarted
, the datafeed is actively receiving data. Ifstopping
, the datafeed has been requested to stop gracefully and is completing its final action. Ifstopped
, the datafeed is stopped and will not receive data until it is re-started.
-
s
string | array[string] Comma-separated list of column names or column aliases used to sort the response.
Supported values include:
ae
(orassignment_explanation
): For started datafeeds only, contains messages relating to the selection of a node.bc
(orbuckets.count
,bucketsCount
): The number of buckets processed.id
: A numerical character string that uniquely identifies the datafeed.na
(ornode.address
,nodeAddress
): For started datafeeds only, the network address of the node where the datafeed is started.ne
(ornode.ephemeral_id
,nodeEphemeralId
): For started datafeeds only, the ephemeral ID of the node where the datafeed is started.ni
(ornode.id
,nodeId
): For started datafeeds only, the unique identifier of the node where the datafeed is started.nn
(ornode.name
,nodeName
): For started datafeeds only, the name of the node where the datafeed is started.sba
(orsearch.bucket_avg
,searchBucketAvg
): The average search time per bucket, in milliseconds.sc
(orsearch.count
,searchCount
): The number of searches run by the datafeed.seah
(orsearch.exp_avg_hour
,searchExpAvgHour
): The exponential average search time per hour, in milliseconds.st
(orsearch.time
,searchTime
): The total time the datafeed spent searching, in milliseconds.s
(orstate
): The status of the datafeed:starting
,started
,stopping
, orstopped
. Ifstarting
, the datafeed has been requested to start but has not yet started. Ifstarted
, the datafeed is actively receiving data. Ifstopping
, the datafeed has been requested to stop gracefully and is completing its final action. Ifstopped
, the datafeed is stopped and will not receive data until it is re-started.
-
time
string The unit used to display time values.
Values are
nanos
,micros
,ms
,s
,m
,h
, ord
.
curl \
--request GET 'http://api.example.com/_cat/ml/datafeeds' \
--header "Authorization: $API_KEY"
[
{
"id": "datafeed-high_sum_total_sales",
"state": "stopped",
"buckets.count": "743",
"search.count": "7"
},
{
"id": "datafeed-low_request_rate",
"state": "stopped",
"buckets.count": "1457",
"search.count": "3"
},
{
"id": "datafeed-response_code_rates",
"state": "stopped",
"buckets.count": "1460",
"search.count": "18"
},
{
"id": "datafeed-url_scanning",
"state": "stopped",
"buckets.count": "1460",
"search.count": "18"
}
]
Get datafeeds
Added in 7.7.0
Get configuration and usage information about datafeeds.
This API returns a maximum of 10,000 datafeeds.
If the Elasticsearch security features are enabled, you must have monitor_ml
, monitor
, manage_ml
, or manage
cluster privileges to use this API.
IMPORTANT: CAT APIs are only intended for human consumption using the Kibana console or command line. They are not intended for use by applications. For application consumption, use the get datafeed statistics API.
Path parameters
-
datafeed_id
string Required A numerical character string that uniquely identifies the datafeed.
Query parameters
-
allow_no_match
boolean Specifies what to do when the request:
- Contains wildcard expressions and there are no datafeeds that match.
- Contains the
_all
string or no identifiers and there are no matches. - Contains wildcard expressions and there are only partial matches.
If
true
, the API returns an empty datafeeds array when there are no matches and the subset of results when there are partial matches. Iffalse
, the API returns a 404 status code when there are no matches or only partial matches. -
h
string | array[string] Comma-separated list of column names to display.
Supported values include:
ae
(orassignment_explanation
): For started datafeeds only, contains messages relating to the selection of a node.bc
(orbuckets.count
,bucketsCount
): The number of buckets processed.id
: A numerical character string that uniquely identifies the datafeed.na
(ornode.address
,nodeAddress
): For started datafeeds only, the network address of the node where the datafeed is started.ne
(ornode.ephemeral_id
,nodeEphemeralId
): For started datafeeds only, the ephemeral ID of the node where the datafeed is started.ni
(ornode.id
,nodeId
): For started datafeeds only, the unique identifier of the node where the datafeed is started.nn
(ornode.name
,nodeName
): For started datafeeds only, the name of the node where the datafeed is started.sba
(orsearch.bucket_avg
,searchBucketAvg
): The average search time per bucket, in milliseconds.sc
(orsearch.count
,searchCount
): The number of searches run by the datafeed.seah
(orsearch.exp_avg_hour
,searchExpAvgHour
): The exponential average search time per hour, in milliseconds.st
(orsearch.time
,searchTime
): The total time the datafeed spent searching, in milliseconds.s
(orstate
): The status of the datafeed:starting
,started
,stopping
, orstopped
. Ifstarting
, the datafeed has been requested to start but has not yet started. Ifstarted
, the datafeed is actively receiving data. Ifstopping
, the datafeed has been requested to stop gracefully and is completing its final action. Ifstopped
, the datafeed is stopped and will not receive data until it is re-started.
-
s
string | array[string] Comma-separated list of column names or column aliases used to sort the response.
Supported values include:
ae
(orassignment_explanation
): For started datafeeds only, contains messages relating to the selection of a node.bc
(orbuckets.count
,bucketsCount
): The number of buckets processed.id
: A numerical character string that uniquely identifies the datafeed.na
(ornode.address
,nodeAddress
): For started datafeeds only, the network address of the node where the datafeed is started.ne
(ornode.ephemeral_id
,nodeEphemeralId
): For started datafeeds only, the ephemeral ID of the node where the datafeed is started.ni
(ornode.id
,nodeId
): For started datafeeds only, the unique identifier of the node where the datafeed is started.nn
(ornode.name
,nodeName
): For started datafeeds only, the name of the node where the datafeed is started.sba
(orsearch.bucket_avg
,searchBucketAvg
): The average search time per bucket, in milliseconds.sc
(orsearch.count
,searchCount
): The number of searches run by the datafeed.seah
(orsearch.exp_avg_hour
,searchExpAvgHour
): The exponential average search time per hour, in milliseconds.st
(orsearch.time
,searchTime
): The total time the datafeed spent searching, in milliseconds.s
(orstate
): The status of the datafeed:starting
,started
,stopping
, orstopped
. Ifstarting
, the datafeed has been requested to start but has not yet started. Ifstarted
, the datafeed is actively receiving data. Ifstopping
, the datafeed has been requested to stop gracefully and is completing its final action. Ifstopped
, the datafeed is stopped and will not receive data until it is re-started.
-
time
string The unit used to display time values.
Values are
nanos
,micros
,ms
,s
,m
,h
, ord
.
curl \
--request GET 'http://api.example.com/_cat/ml/datafeeds/{datafeed_id}' \
--header "Authorization: $API_KEY"
[
{
"id": "datafeed-high_sum_total_sales",
"state": "stopped",
"buckets.count": "743",
"search.count": "7"
},
{
"id": "datafeed-low_request_rate",
"state": "stopped",
"buckets.count": "1457",
"search.count": "3"
},
{
"id": "datafeed-response_code_rates",
"state": "stopped",
"buckets.count": "1460",
"search.count": "18"
},
{
"id": "datafeed-url_scanning",
"state": "stopped",
"buckets.count": "1460",
"search.count": "18"
}
]
Get anomaly detection jobs
Added in 7.7.0
Get configuration and usage information for anomaly detection jobs.
This API returns a maximum of 10,000 jobs.
If the Elasticsearch security features are enabled, you must have monitor_ml
,
monitor
, manage_ml
, or manage
cluster privileges to use this API.
IMPORTANT: CAT APIs are only intended for human consumption using the Kibana console or command line. They are not intended for use by applications. For application consumption, use the get anomaly detection job statistics API.
Query parameters
-
allow_no_match
boolean Specifies what to do when the request:
- Contains wildcard expressions and there are no jobs that match.
- Contains the
_all
string or no identifiers and there are no matches. - Contains wildcard expressions and there are only partial matches.
If
true
, the API returns an empty jobs array when there are no matches and the subset of results when there are partial matches. Iffalse
, the API returns a 404 status code when there are no matches or only partial matches. -
bytes
string The unit used to display byte values.
Values are
b
,kb
,mb
,gb
,tb
, orpb
. -
h
string | array[string] Comma-separated list of column names to display.
Supported values include:
assignment_explanation
(orae
): For open anomaly detection jobs only, contains messages relating to the selection of a node to run the job.buckets.count
(orbc
,bucketsCount
): The number of bucket results produced by the job.buckets.time.exp_avg
(orbtea
,bucketsTimeExpAvg
): Exponential moving average of all bucket processing times, in milliseconds.buckets.time.exp_avg_hour
(orbteah
,bucketsTimeExpAvgHour
): Exponentially-weighted moving average of bucket processing times calculated in a 1 hour time window, in milliseconds.buckets.time.max
(orbtmax
,bucketsTimeMax
): Maximum among all bucket processing times, in milliseconds.buckets.time.min
(orbtmin
,bucketsTimeMin
): Minimum among all bucket processing times, in milliseconds.buckets.time.total
(orbtt
,bucketsTimeTotal
): Sum of all bucket processing times, in milliseconds.data.buckets
(ordb
,dataBuckets
): The number of buckets processed.data.earliest_record
(order
,dataEarliestRecord
): The timestamp of the earliest chronologically input document.data.empty_buckets
(ordeb
,dataEmptyBuckets
): The number of buckets which did not contain any data.data.input_bytes
(ordib
,dataInputBytes
): The number of bytes of input data posted to the anomaly detection job.data.input_fields
(ordif
,dataInputFields
): The total number of fields in input documents posted to the anomaly detection job. This count includes fields that are not used in the analysis. However, be aware that if you are using a datafeed, it extracts only the required fields from the documents it retrieves before posting them to the job.data.input_records
(ordir
,dataInputRecords
): The number of input documents posted to the anomaly detection job.data.invalid_dates
(ordid
,dataInvalidDates
): The number of input documents with either a missing date field or a date that could not be parsed.data.last
(ordl
,dataLast
): The timestamp at which data was last analyzed, according to server time.data.last_empty_bucket
(ordleb
,dataLastEmptyBucket
): The timestamp of the last bucket that did not contain any data.data.last_sparse_bucket
(ordlsb
,dataLastSparseBucket
): The timestamp of the last bucket that was considered sparse.data.latest_record
(ordlr
,dataLatestRecord
): The timestamp of the latest chronologically input document.data.missing_fields
(ordmf
,dataMissingFields
): The number of input documents that are missing a field that the anomaly detection job is configured to analyze. Input documents with missing fields are still processed because it is possible that not all fields are missing.data.out_of_order_timestamps
(ordoot
,dataOutOfOrderTimestamps
): The number of input documents that have a timestamp chronologically preceding the start of the current anomaly detection bucket offset by the latency window. This information is applicable only when you provide data to the anomaly detection job by using the post data API. These out of order documents are discarded, since jobs require time series data to be in ascending chronological order.data.processed_fields
(ordpf
,dataProcessedFields
): The total number of fields in all the documents that have been processed by the anomaly detection job. Only fields that are specified in the detector configuration object contribute to this count. The timestamp is not included in this count.data.processed_records
(ordpr
,dataProcessedRecords
): The number of input documents that have been processed by the anomaly detection job. This value includes documents with missing fields, since they are nonetheless analyzed. If you use datafeeds and have aggregations in your search query, the processed record count is the number of aggregation results processed, not the number of Elasticsearch documents.data.sparse_buckets
(ordsb
,dataSparseBuckets
): The number of buckets that contained few data points compared to the expected number of data points.forecasts.memory.avg
(orfmavg
,forecastsMemoryAvg
): The average memory usage in bytes for forecasts related to the anomaly detection job.forecasts.memory.max
(orfmmax
,forecastsMemoryMax
): The maximum memory usage in bytes for forecasts related to the anomaly detection job.forecasts.memory.min
(orfmmin
,forecastsMemoryMin
): The minimum memory usage in bytes for forecasts related to the anomaly detection job.forecasts.memory.total
(orfmt
,forecastsMemoryTotal
): The total memory usage in bytes for forecasts related to the anomaly detection job.forecasts.records.avg
(orfravg
,forecastsRecordsAvg
): The average number ofm
odel_forecast` documents written for forecasts related to the anomaly detection job.forecasts.records.max
(orfrmax
,forecastsRecordsMax
): The maximum number ofmodel_forecast
documents written for forecasts related to the anomaly detection job.forecasts.records.min
(orfrmin
,forecastsRecordsMin
): The minimum number ofmodel_forecast
documents written for forecasts related to the anomaly detection job.forecasts.records.total
(orfrt
,forecastsRecordsTotal
): The total number ofmodel_forecast
documents written for forecasts related to the anomaly detection job.forecasts.time.avg
(orftavg
,forecastsTimeAvg
): The average runtime in milliseconds for forecasts related to the anomaly detection job.forecasts.time.max
(orftmax
,forecastsTimeMax
): The maximum runtime in milliseconds for forecasts related to the anomaly detection job.forecasts.time.min
(orftmin
,forecastsTimeMin
): The minimum runtime in milliseconds for forecasts related to the anomaly detection job.forecasts.time.total
(orftt
,forecastsTimeTotal
): The total runtime in milliseconds for forecasts related to the anomaly detection job.forecasts.total
(orft
,forecastsTotal
): The number of individual forecasts currently available for the job.id
: Identifier for the anomaly detection job.model.bucket_allocation_failures
(ormbaf
,modelBucketAllocationFailures
): The number of buckets for which new entities in incoming data were not processed due to insufficient model memory.model.by_fields
(ormbf
,modelByFields
): The number of by field values that were analyzed by the models. This value is cumulative for all detectors in the job.model.bytes
(ormb
,modelBytes
): The number of bytes of memory used by the models. This is the maximum value since the last time the model was persisted. If the job is closed, this value indicates the latest size.model.bytes_exceeded
(ormbe
,modelBytesExceeded
): The number of bytes over the high limit for memory usage at the last allocation failure.model.categorization_status
(ormcs
,modelCategorizationStatus
): The status of categorization for the job:ok
orwarn
. Ifok
, categorization is performing acceptably well (or not being used at all). Ifwarn
, categorization is detecting a distribution of categories that suggests the input data is inappropriate for categorization. Problems could be that there is only one category, more than 90% of categories are rare, the number of categories is greater than 50% of the number of categorized documents, there are no frequently matched categories, or more than 50% of categories are dead.model.categorized_doc_count
(ormcdc
,modelCategorizedDocCount
): The number of documents that have had a field categorized.model.dead_category_count
(ormdcc
,modelDeadCategoryCount
): The number of categories created by categorization that will never be assigned again because another category’s definition makes it a superset of the dead category. Dead categories are a side effect of the way categorization has no prior training.model.failed_category_count
(ormdcc
,modelFailedCategoryCount
): The number of times that categorization wanted to create a new category but couldn’t because the job had hit its model memory limit. This count does not track which specific categories failed to be created. Therefore, you cannot use this value to determine the number of unique categories that were missed.model.frequent_category_count
(ormfcc
,modelFrequentCategoryCount
): The number of categories that match more than 1% of categorized documents.model.log_time
(ormlt
,modelLogTime
): The timestamp when the model stats were gathered, according to server time.model.memory_limit
(ormml
,modelMemoryLimit
): The timestamp when the model stats were gathered, according to server time.model.memory_status
(ormms
,modelMemoryStatus
): The status of the mathematical models:ok
,soft_limit
, orhard_limit
. Ifok
, the models stayed below the configured value. Ifsoft_limit
, the models used more than 60% of the configured memory limit and older unused models will be pruned to free up space. Additionally, in categorization jobs no further category examples will be stored. Ifhard_limit
, the models used more space than the configured memory limit. As a result, not all incoming data was processed.model.over_fields
(ormof
,modelOverFields
): The number of over field values that were analyzed by the models. This value is cumulative for all detectors in the job.model.partition_fields
(ormpf
,modelPartitionFields
): The number of partition field values that were analyzed by the models. This value is cumulative for all detectors in the job.model.rare_category_count
(ormrcc
,modelRareCategoryCount
): The number of categories that match just one categorized document.model.timestamp
(ormt
,modelTimestamp
): The timestamp of the last record when the model stats were gathered.model.total_category_count
(ormtcc
,modelTotalCategoryCount
): The number of categories created by categorization.node.address
(orna
,nodeAddress
): The network address of the node that runs the job. This information is available only for open jobs.node.ephemeral_id
(orne
,nodeEphemeralId
): The ephemeral ID of the node that runs the job. This information is available only for open jobs.node.id
(orni
,nodeId
): The unique identifier of the node that runs the job. This information is available only for open jobs.node.name
(ornn
,nodeName
): The name of the node that runs the job. This information is available only for open jobs.opened_time
(orot
): For open jobs only, the elapsed time for which the job has been open.state
(ors
): The status of the anomaly detection job:closed
,closing
,failed
,opened
, oropening
. Ifclosed
, the job finished successfully with its model state persisted. The job must be opened before it can accept further data. Ifclosing
, the job close action is in progress and has not yet completed. A closing job cannot accept further data. Iffailed
, the job did not finish successfully due to an error. This situation can occur due to invalid input data, a fatal error occurring during the analysis, or an external interaction such as the process being killed by the Linux out of memory (OOM) killer. If the job had irrevocably failed, it must be force closed and then deleted. If the datafeed can be corrected, the job can be closed and then re-opened. Ifopened
, the job is available to receive and process data. Ifopening
, the job open action is in progress and has not yet completed.
-
s
string | array[string] Comma-separated list of column names or column aliases used to sort the response.
Supported values include:
assignment_explanation
(orae
): For open anomaly detection jobs only, contains messages relating to the selection of a node to run the job.buckets.count
(orbc
,bucketsCount
): The number of bucket results produced by the job.buckets.time.exp_avg
(orbtea
,bucketsTimeExpAvg
): Exponential moving average of all bucket processing times, in milliseconds.buckets.time.exp_avg_hour
(orbteah
,bucketsTimeExpAvgHour
): Exponentially-weighted moving average of bucket processing times calculated in a 1 hour time window, in milliseconds.buckets.time.max
(orbtmax
,bucketsTimeMax
): Maximum among all bucket processing times, in milliseconds.buckets.time.min
(orbtmin
,bucketsTimeMin
): Minimum among all bucket processing times, in milliseconds.buckets.time.total
(orbtt
,bucketsTimeTotal
): Sum of all bucket processing times, in milliseconds.data.buckets
(ordb
,dataBuckets
): The number of buckets processed.data.earliest_record
(order
,dataEarliestRecord
): The timestamp of the earliest chronologically input document.data.empty_buckets
(ordeb
,dataEmptyBuckets
): The number of buckets which did not contain any data.data.input_bytes
(ordib
,dataInputBytes
): The number of bytes of input data posted to the anomaly detection job.data.input_fields
(ordif
,dataInputFields
): The total number of fields in input documents posted to the anomaly detection job. This count includes fields that are not used in the analysis. However, be aware that if you are using a datafeed, it extracts only the required fields from the documents it retrieves before posting them to the job.data.input_records
(ordir
,dataInputRecords
): The number of input documents posted to the anomaly detection job.data.invalid_dates
(ordid
,dataInvalidDates
): The number of input documents with either a missing date field or a date that could not be parsed.data.last
(ordl
,dataLast
): The timestamp at which data was last analyzed, according to server time.data.last_empty_bucket
(ordleb
,dataLastEmptyBucket
): The timestamp of the last bucket that did not contain any data.data.last_sparse_bucket
(ordlsb
,dataLastSparseBucket
): The timestamp of the last bucket that was considered sparse.data.latest_record
(ordlr
,dataLatestRecord
): The timestamp of the latest chronologically input document.data.missing_fields
(ordmf
,dataMissingFields
): The number of input documents that are missing a field that the anomaly detection job is configured to analyze. Input documents with missing fields are still processed because it is possible that not all fields are missing.data.out_of_order_timestamps
(ordoot
,dataOutOfOrderTimestamps
): The number of input documents that have a timestamp chronologically preceding the start of the current anomaly detection bucket offset by the latency window. This information is applicable only when you provide data to the anomaly detection job by using the post data API. These out of order documents are discarded, since jobs require time series data to be in ascending chronological order.data.processed_fields
(ordpf
,dataProcessedFields
): The total number of fields in all the documents that have been processed by the anomaly detection job. Only fields that are specified in the detector configuration object contribute to this count. The timestamp is not included in this count.data.processed_records
(ordpr
,dataProcessedRecords
): The number of input documents that have been processed by the anomaly detection job. This value includes documents with missing fields, since they are nonetheless analyzed. If you use datafeeds and have aggregations in your search query, the processed record count is the number of aggregation results processed, not the number of Elasticsearch documents.data.sparse_buckets
(ordsb
,dataSparseBuckets
): The number of buckets that contained few data points compared to the expected number of data points.forecasts.memory.avg
(orfmavg
,forecastsMemoryAvg
): The average memory usage in bytes for forecasts related to the anomaly detection job.forecasts.memory.max
(orfmmax
,forecastsMemoryMax
): The maximum memory usage in bytes for forecasts related to the anomaly detection job.forecasts.memory.min
(orfmmin
,forecastsMemoryMin
): The minimum memory usage in bytes for forecasts related to the anomaly detection job.forecasts.memory.total
(orfmt
,forecastsMemoryTotal
): The total memory usage in bytes for forecasts related to the anomaly detection job.forecasts.records.avg
(orfravg
,forecastsRecordsAvg
): The average number ofm
odel_forecast` documents written for forecasts related to the anomaly detection job.forecasts.records.max
(orfrmax
,forecastsRecordsMax
): The maximum number ofmodel_forecast
documents written for forecasts related to the anomaly detection job.forecasts.records.min
(orfrmin
,forecastsRecordsMin
): The minimum number ofmodel_forecast
documents written for forecasts related to the anomaly detection job.forecasts.records.total
(orfrt
,forecastsRecordsTotal
): The total number ofmodel_forecast
documents written for forecasts related to the anomaly detection job.forecasts.time.avg
(orftavg
,forecastsTimeAvg
): The average runtime in milliseconds for forecasts related to the anomaly detection job.forecasts.time.max
(orftmax
,forecastsTimeMax
): The maximum runtime in milliseconds for forecasts related to the anomaly detection job.forecasts.time.min
(orftmin
,forecastsTimeMin
): The minimum runtime in milliseconds for forecasts related to the anomaly detection job.forecasts.time.total
(orftt
,forecastsTimeTotal
): The total runtime in milliseconds for forecasts related to the anomaly detection job.forecasts.total
(orft
,forecastsTotal
): The number of individual forecasts currently available for the job.id
: Identifier for the anomaly detection job.model.bucket_allocation_failures
(ormbaf
,modelBucketAllocationFailures
): The number of buckets for which new entities in incoming data were not processed due to insufficient model memory.model.by_fields
(ormbf
,modelByFields
): The number of by field values that were analyzed by the models. This value is cumulative for all detectors in the job.model.bytes
(ormb
,modelBytes
): The number of bytes of memory used by the models. This is the maximum value since the last time the model was persisted. If the job is closed, this value indicates the latest size.model.bytes_exceeded
(ormbe
,modelBytesExceeded
): The number of bytes over the high limit for memory usage at the last allocation failure.model.categorization_status
(ormcs
,modelCategorizationStatus
): The status of categorization for the job:ok
orwarn
. Ifok
, categorization is performing acceptably well (or not being used at all). Ifwarn
, categorization is detecting a distribution of categories that suggests the input data is inappropriate for categorization. Problems could be that there is only one category, more than 90% of categories are rare, the number of categories is greater than 50% of the number of categorized documents, there are no frequently matched categories, or more than 50% of categories are dead.model.categorized_doc_count
(ormcdc
,modelCategorizedDocCount
): The number of documents that have had a field categorized.model.dead_category_count
(ormdcc
,modelDeadCategoryCount
): The number of categories created by categorization that will never be assigned again because another category’s definition makes it a superset of the dead category. Dead categories are a side effect of the way categorization has no prior training.model.failed_category_count
(ormdcc
,modelFailedCategoryCount
): The number of times that categorization wanted to create a new category but couldn’t because the job had hit its model memory limit. This count does not track which specific categories failed to be created. Therefore, you cannot use this value to determine the number of unique categories that were missed.model.frequent_category_count
(ormfcc
,modelFrequentCategoryCount
): The number of categories that match more than 1% of categorized documents.model.log_time
(ormlt
,modelLogTime
): The timestamp when the model stats were gathered, according to server time.model.memory_limit
(ormml
,modelMemoryLimit
): The timestamp when the model stats were gathered, according to server time.model.memory_status
(ormms
,modelMemoryStatus
): The status of the mathematical models:ok
,soft_limit
, orhard_limit
. Ifok
, the models stayed below the configured value. Ifsoft_limit
, the models used more than 60% of the configured memory limit and older unused models will be pruned to free up space. Additionally, in categorization jobs no further category examples will be stored. Ifhard_limit
, the models used more space than the configured memory limit. As a result, not all incoming data was processed.model.over_fields
(ormof
,modelOverFields
): The number of over field values that were analyzed by the models. This value is cumulative for all detectors in the job.model.partition_fields
(ormpf
,modelPartitionFields
): The number of partition field values that were analyzed by the models. This value is cumulative for all detectors in the job.model.rare_category_count
(ormrcc
,modelRareCategoryCount
): The number of categories that match just one categorized document.model.timestamp
(ormt
,modelTimestamp
): The timestamp of the last record when the model stats were gathered.model.total_category_count
(ormtcc
,modelTotalCategoryCount
): The number of categories created by categorization.node.address
(orna
,nodeAddress
): The network address of the node that runs the job. This information is available only for open jobs.node.ephemeral_id
(orne
,nodeEphemeralId
): The ephemeral ID of the node that runs the job. This information is available only for open jobs.node.id
(orni
,nodeId
): The unique identifier of the node that runs the job. This information is available only for open jobs.node.name
(ornn
,nodeName
): The name of the node that runs the job. This information is available only for open jobs.opened_time
(orot
): For open jobs only, the elapsed time for which the job has been open.state
(ors
): The status of the anomaly detection job:closed
,closing
,failed
,opened
, oropening
. Ifclosed
, the job finished successfully with its model state persisted. The job must be opened before it can accept further data. Ifclosing
, the job close action is in progress and has not yet completed. A closing job cannot accept further data. Iffailed
, the job did not finish successfully due to an error. This situation can occur due to invalid input data, a fatal error occurring during the analysis, or an external interaction such as the process being killed by the Linux out of memory (OOM) killer. If the job had irrevocably failed, it must be force closed and then deleted. If the datafeed can be corrected, the job can be closed and then re-opened. Ifopened
, the job is available to receive and process data. Ifopening
, the job open action is in progress and has not yet completed.
-
time
string The unit used to display time values.
Values are
nanos
,micros
,ms
,s
,m
,h
, ord
.
curl \
--request GET 'http://api.example.com/_cat/ml/anomaly_detectors' \
--header "Authorization: $API_KEY"
[
{
"id": "high_sum_total_sales",
"s": "closed",
"dpr": "14022",
"mb": "1.5mb"
},
{
"id": "low_request_rate",
"s": "closed",
"dpr": "1216",
"mb": "40.5kb"
},
{
"id": "response_code_rates",
"s": "closed",
"dpr": "28146",
"mb": "132.7kb"
},
{
"id": "url_scanning",
"s": "closed",
"dpr": "28146",
"mb": "501.6kb"
}
]
Get plugin information
Get a list of plugins running on each node of a cluster. IMPORTANT: cat APIs are only intended for human consumption using the command line or Kibana console. They are not intended for use by applications. For application consumption, use the nodes info API.
Query parameters
-
h
string | array[string] List of columns to appear in the response. Supports simple wildcards.
-
s
string | array[string] List of columns that determine how the table should be sorted. Sorting defaults to ascending and can be changed by setting
:asc
or:desc
as a suffix to the column name. -
include_bootstrap
boolean Include bootstrap plugins in the response
-
local
boolean If
true
, the request computes the list of selected nodes from the local cluster state. Iffalse
the list of selected nodes are computed from the cluster state of the master node. In both cases the coordinating node will send requests for further information to each selected node. -
master_timeout
string Period to wait for a connection to the master node.
curl \
--request GET 'http://api.example.com/_cat/plugins' \
--header "Authorization: $API_KEY"
[
{ "name": "U7321H6", "component": "analysis-icu", "version": "8.17.0", "description": "The ICU Analysis plugin integrates the Lucene ICU module into Elasticsearch, adding ICU-related analysis components."},
{"name": "U7321H6", "component": "analysis-kuromoji", "verison": "8.17.0", description: "The Japanese (kuromoji) Analysis plugin integrates Lucene kuromoji analysis module into elasticsearch."},
{"name" "U7321H6", "component": "analysis-nori", "version": "8.17.0", "description": "The Korean (nori) Analysis plugin integrates Lucene nori analysis module into elasticsearch."},
{"name": "U7321H6", "component": "analysis-phonetic", "verison": "8.17.0", "description": "The Phonetic Analysis plugin integrates phonetic token filter analysis with elasticsearch."},
{"name": "U7321H6", "component": "analysis-smartcn", "verison": "8.17.0", "description": "Smart Chinese Analysis plugin integrates Lucene Smart Chinese analysis module into elasticsearch."},
{"name": "U7321H6", "component": "analysis-stempel", "verison": "8.17.0", "description": "The Stempel (Polish) Analysis plugin integrates Lucene stempel (polish) analysis module into elasticsearch."},
{"name": "U7321H6", "component": "analysis-ukrainian", "verison": "8.17.0", "description": "The Ukrainian Analysis plugin integrates the Lucene UkrainianMorfologikAnalyzer into elasticsearch."},
{"name": "U7321H6", "component": "discovery-azure-classic", "verison": "8.17.0", "description": "The Azure Classic Discovery plugin allows to use Azure Classic API for the unicast discovery mechanism"},
{"name": "U7321H6", "component": "discovery-ec2", "verison": "8.17.0", "description": "The EC2 discovery plugin allows to use AWS API for the unicast discovery mechanism."},
{"name": "U7321H6", "component": "discovery-gce", "verison": "8.17.0", "description": "The Google Compute Engine (GCE) Discovery plugin allows to use GCE API for the unicast discovery mechanism."},
{"name": "U7321H6", "component": "mapper-annotated-text", "verison": "8.17.0", "description": "The Mapper Annotated_text plugin adds support for text fields with markup used to inject annotation tokens into the index."},
{"name": "U7321H6", "component": "mapper-murmur3", "verison": "8.17.0", "description": "The Mapper Murmur3 plugin allows to compute hashes of a field's values at index-time and to store them in the index."},
{"name": "U7321H6", "component": "mapper-size", "verison": "8.17.0", "description": "The Mapper Size plugin allows document to record their uncompressed size at index time."},
{"name": "U7321H6", "component": "store-smb", "verison": "8.17.0", "description": "The Store SMB plugin adds support for SMB stores."}
]
Get shard recovery information
Get information about ongoing and completed shard recoveries. Shard recovery is the process of initializing a shard copy, such as restoring a primary shard from a snapshot or syncing a replica shard from a primary shard. When a shard recovery completes, the recovered shard is available for search and indexing. For data streams, the API returns information about the stream’s backing indices. IMPORTANT: cat APIs are only intended for human consumption using the command line or Kibana console. They are not intended for use by applications. For application consumption, use the index recovery API.
Path parameters
-
index
string | array[string] Required A comma-separated list of data streams, indices, and aliases used to limit the request. Supports wildcards (
*
). To target all data streams and indices, omit this parameter or use*
or_all
.
Query parameters
-
active_only
boolean If
true
, the response only includes ongoing shard recoveries. -
bytes
string The unit used to display byte values.
Values are
b
,kb
,mb
,gb
,tb
, orpb
. -
detailed
boolean If
true
, the response includes detailed information about shard recoveries. -
index
string | array[string] Comma-separated list or wildcard expression of index names to limit the returned information
-
h
string | array[string] List of columns to appear in the response. Supports simple wildcards.
-
s
string | array[string] List of columns that determine how the table should be sorted. Sorting defaults to ascending and can be changed by setting
:asc
or:desc
as a suffix to the column name. -
time
string Unit used to display time values.
Values are
nanos
,micros
,ms
,s
,m
,h
, ord
.
curl \
--request GET 'http://api.example.com/_cat/recovery/{index}' \
--header "Authorization: $API_KEY"
[
{
"index": "my-index-000001 ",
"shard": "0",
"time": "13ms",
"type": "store",
"stage": "done",
"source_host": "n/a",
"source_node": "n/a",
"target_host": "127.0.0.1",
"target_node": "node-0",
"repository": "n/a",
"snapshot": "n/a",
"files": "0",
"files_recovered": "0",
"files_percent": "100.0%",
"files_total": "13",
"bytes": "0b",
"bytes_recovered": "0b",
"bytes_percent": "100.0%",
"bytes_total": "9928b",
"translog_ops": "0",
"translog_ops_recovered": "0",
"translog_ops_percent": "100.0%"
}
]
[
{
"i": "my-index-000001",
"s": "0",
"t": "1252ms",
"ty": "peer",
"st": "done",
"shost": "192.168.1.1",
"thost": "192.168.1.1",
"f": "0",
"fp": "100.0%",
"b": "0b",
"bp": "100.0%",
}
]
[
{
"i": "my-index-000001",
"s": "0",
"t": "1978ms",
"ty": "snapshot",
"st": "done",
"rep": "my-repo",
"snap": "snap-1",
"f": "79",
"fp": "8.0%",
"b": "12086",
"bp": "9.0%"
}
]
Get snapshot repository information
Added in 2.1.0
Get a list of snapshot repositories for a cluster. IMPORTANT: cat APIs are only intended for human consumption using the command line or Kibana console. They are not intended for use by applications. For application consumption, use the get snapshot repository API.
Query parameters
-
h
string | array[string] List of columns to appear in the response. Supports simple wildcards.
-
s
string | array[string] List of columns that determine how the table should be sorted. Sorting defaults to ascending and can be changed by setting
:asc
or:desc
as a suffix to the column name. -
local
boolean If
true
, the request computes the list of selected nodes from the local cluster state. Iffalse
the list of selected nodes are computed from the cluster state of the master node. In both cases the coordinating node will send requests for further information to each selected node. -
master_timeout
string Period to wait for a connection to the master node.
curl \
--request GET 'http://api.example.com/_cat/repositories' \
--header "Authorization: $API_KEY"
[
{
"id": "repo1",
"type": "fs"
},
{
"id": "repo2",
"type": "s3"
}
]
Get segment information
Get low-level information about the Lucene segments in index shards. For data streams, the API returns information about the backing indices. IMPORTANT: cat APIs are only intended for human consumption using the command line or Kibana console. They are not intended for use by applications. For application consumption, use the index segments API.
Query parameters
-
bytes
string The unit used to display byte values.
Values are
b
,kb
,mb
,gb
,tb
, orpb
. -
h
string | array[string] List of columns to appear in the response. Supports simple wildcards.
-
s
string | array[string] List of columns that determine how the table should be sorted. Sorting defaults to ascending and can be changed by setting
:asc
or:desc
as a suffix to the column name. -
local
boolean If
true
, the request computes the list of selected nodes from the local cluster state. Iffalse
the list of selected nodes are computed from the cluster state of the master node. In both cases the coordinating node will send requests for further information to each selected node. -
master_timeout
string Period to wait for a connection to the master node.
curl \
--request GET 'http://api.example.com/_cat/segments' \
--header "Authorization: $API_KEY"
[
{
"index": "test",
"shard": "0",
"prirep": "p",
"ip": "127.0.0.1",
"segment": "_0",
"generation": "0",
"docs.count": "1",
"docs.deleted": "0",
"size": "3kb",
"size.memory": "0",
"committed": "false",
"searchable": "true",
"version": "9.12.0",
"compound": "true"
},
{
"index": "test1",
"shard": "0",
"prirep": "p",
"ip": "127.0.0.1",
"segment": "_0",
"generation": "0",
"docs.count": "1",
"docs.deleted": "0",
"size": "3kb",
"size.memory": "0",
"committed": "false",
"searchable": "true",
"version": "9.12.0",
"compound": "true"
}
]
Get segment information
Get low-level information about the Lucene segments in index shards. For data streams, the API returns information about the backing indices. IMPORTANT: cat APIs are only intended for human consumption using the command line or Kibana console. They are not intended for use by applications. For application consumption, use the index segments API.
Path parameters
-
index
string | array[string] Required A comma-separated list of data streams, indices, and aliases used to limit the request. Supports wildcards (
*
). To target all data streams and indices, omit this parameter or use*
or_all
.
Query parameters
-
bytes
string The unit used to display byte values.
Values are
b
,kb
,mb
,gb
,tb
, orpb
. -
h
string | array[string] List of columns to appear in the response. Supports simple wildcards.
-
s
string | array[string] List of columns that determine how the table should be sorted. Sorting defaults to ascending and can be changed by setting
:asc
or:desc
as a suffix to the column name. -
local
boolean If
true
, the request computes the list of selected nodes from the local cluster state. Iffalse
the list of selected nodes are computed from the cluster state of the master node. In both cases the coordinating node will send requests for further information to each selected node. -
master_timeout
string Period to wait for a connection to the master node.
curl \
--request GET 'http://api.example.com/_cat/segments/{index}' \
--header "Authorization: $API_KEY"
[
{
"index": "test",
"shard": "0",
"prirep": "p",
"ip": "127.0.0.1",
"segment": "_0",
"generation": "0",
"docs.count": "1",
"docs.deleted": "0",
"size": "3kb",
"size.memory": "0",
"committed": "false",
"searchable": "true",
"version": "9.12.0",
"compound": "true"
},
{
"index": "test1",
"shard": "0",
"prirep": "p",
"ip": "127.0.0.1",
"segment": "_0",
"generation": "0",
"docs.count": "1",
"docs.deleted": "0",
"size": "3kb",
"size.memory": "0",
"committed": "false",
"searchable": "true",
"version": "9.12.0",
"compound": "true"
}
]
Get shard information
Get information about the shards in a cluster. For data streams, the API returns information about the backing indices. IMPORTANT: cat APIs are only intended for human consumption using the command line or Kibana console. They are not intended for use by applications.
Path parameters
-
index
string | array[string] Required A comma-separated list of data streams, indices, and aliases used to limit the request. Supports wildcards (
*
). To target all data streams and indices, omit this parameter or use*
or_all
.
Query parameters
-
bytes
string The unit used to display byte values.
Values are
b
,kb
,mb
,gb
,tb
, orpb
. -
h
string | array[string] List of columns to appear in the response. Supports simple wildcards.
-
s
string | array[string] List of columns that determine how the table should be sorted. Sorting defaults to ascending and can be changed by setting
:asc
or:desc
as a suffix to the column name. -
master_timeout
string Period to wait for a connection to the master node.
-
time
string Unit used to display time values.
Values are
nanos
,micros
,ms
,s
,m
,h
, ord
.
curl \
--request GET 'http://api.example.com/_cat/shards/{index}' \
--header "Authorization: $API_KEY"
[
{
"index": "my-index-000001",
"shard": "0",
"prirep": "p",
"state": "STARTED",
"docs": "3014",
"store": "31.1mb",
"dataset": "249b",
"ip": "192.168.56.10",
"node": "H5dfFeA"
}
]
[
{
"index": "my-index-000001",
"shard": "0",
"prirep": "p",
"state": "STARTED",
"docs": "3014",
"store": "31.1mb",
"dataset": "249b",
"ip": "192.168.56.10",
"node": "H5dfFeA"
}
]
[
{
"index": "my-index-000001",
"shard": "0",
"prirep": "p",
"state": "RELOCATING",
"docs": "3014",
"store": "31.1mb",
"dataset": "249b",
"ip": "192.168.56.10",
"node": "H5dfFeA -> -> 192.168.56.30 bGG90GE"
}
]
[
{
"index": "my-index-000001",
"shard": "0",
"prirep": "p",
"state": "STARTED",
"docs": "3014",
"store": "31.1mb",
"dataset": "249b",
"ip": "192.168.56.10",
"node": "H5dfFeA"
},
{
"index": "my-index-000001",
"shard": "0",
"prirep": "r",
"state": "INITIALIZING",
"docs": "0",
"store": "14.3mb",
"dataset": "249b",
"ip": "192.168.56.30",
"node": "bGG90GE"
}
]
[
{
"index": "my-index-000001",
"shard": "0",
"prirep": "p",
"state": "STARTED",
"unassigned.reason": "3014 31.1mb 192.168.56.10 H5dfFeA"
},
{
"index": "my-index-000001",
"shard": "0",
"prirep": "r",
"state": "STARTED",
"unassigned.reason": "3014 31.1mb 192.168.56.30 bGG90GE"
},
{
"index": "my-index-000001",
"shard": "0",
"prirep": "r",
"state": "STARTED",
"unassigned.reason": "3014 31.1mb 192.168.56.20 I8hydUG"
},
{
"index": "my-index-000001",
"shard": "0",
"prirep": "r",
"state": "UNASSIGNED",
"unassigned.reason": "ALLOCATION_FAILED"
}
]
Get snapshot information
Added in 2.1.0
Get information about the snapshots stored in one or more repositories. A snapshot is a backup of an index or running Elasticsearch cluster. IMPORTANT: cat APIs are only intended for human consumption using the command line or Kibana console. They are not intended for use by applications. For application consumption, use the get snapshot API.
Query parameters
-
h
string | array[string] List of columns to appear in the response. Supports simple wildcards.
-
s
string | array[string] List of columns that determine how the table should be sorted. Sorting defaults to ascending and can be changed by setting
:asc
or:desc
as a suffix to the column name. -
master_timeout
string Period to wait for a connection to the master node.
-
time
string Unit used to display time values.
Values are
nanos
,micros
,ms
,s
,m
,h
, ord
.
curl \
--request GET 'http://api.example.com/_cat/snapshots' \
--header "Authorization: $API_KEY"
[
{
"id": "snap1",
"repository": "repo1",
"status": "FAILED",
"start_epoch": "1445616705",
"start_time": "18:11:45",
"end_epoch": "1445616978",
"end_time": "18:16:18",
"duration": "4.6m",
"indices": "1",
"successful_shards": "4",
"failed_shards": "1",
"total_shards": "5"
},
{
"id": "snap2",
"repository": "repo1",
"status": "SUCCESS",
"start_epoch": "1445634298",
"start_time": "23:04:58",
"end_epoch": "1445634672",
"end_time": "23:11:12",
"duration": "6.2m",
"indices": "2",
"successful_shards": "10",
"failed_shards": "0",
"total_shards": "10"
}
]
Get task information
Technical preview
Get information about tasks currently running in the cluster. IMPORTANT: cat APIs are only intended for human consumption using the command line or Kibana console. They are not intended for use by applications. For application consumption, use the task management API.
Query parameters
-
actions
array[string] The task action names, which are used to limit the response.
-
detailed
boolean If
true
, the response includes detailed information about shard recoveries. -
nodes
array[string] Unique node identifiers, which are used to limit the response.
-
parent_task_id
string The parent task identifier, which is used to limit the response.
-
h
string | array[string] List of columns to appear in the response. Supports simple wildcards.
-
s
string | array[string] List of columns that determine how the table should be sorted. Sorting defaults to ascending and can be changed by setting
:asc
or:desc
as a suffix to the column name. -
time
string Unit used to display time values.
Values are
nanos
,micros
,ms
,s
,m
,h
, ord
. -
timeout
string Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.
-
wait_for_completion
boolean If
true
, the request blocks until the task has completed.
curl \
--request GET 'http://api.example.com/_cat/tasks' \
--header "Authorization: $API_KEY"
[
{
"action": "cluster:monitor/tasks/lists[n]",
"task_id": "oTUltX4IQMOUUVeiohTt8A:124",
"parent_task_id": "oTUltX4IQMOUUVeiohTt8A:123",
"type": "direct",
"start_time": "1458585884904",
"timestamp": "01:48:24",
"running_time": "44.1micros",
"ip": "127.0.0.1:9300",
"node": "oTUltX4IQMOUUVeiohTt8A"
},
{
"action": "cluster:monitor/tasks/lists",
"task_id": "oTUltX4IQMOUUVeiohTt8A:123",
"parent_task_id": "-",
"type": "transport",
"start_time": "1458585884904",
"timestamp": "01:48:24",
"running_time": "186.2micros",
"ip": "127.0.0.1:9300",
"node": "oTUltX4IQMOUUVeiohTt8A"
}
]
Get thread pool statistics
Get thread pool statistics for each node in a cluster. Returned information includes all built-in thread pools and custom thread pools. IMPORTANT: cat APIs are only intended for human consumption using the command line or Kibana console. They are not intended for use by applications. For application consumption, use the nodes info API.
Query parameters
-
h
string | array[string] List of columns to appear in the response. Supports simple wildcards.
-
s
string | array[string] List of columns that determine how the table should be sorted. Sorting defaults to ascending and can be changed by setting
:asc
or:desc
as a suffix to the column name. -
time
string The unit used to display time values.
Values are
nanos
,micros
,ms
,s
,m
,h
, ord
. -
local
boolean If
true
, the request computes the list of selected nodes from the local cluster state. Iffalse
the list of selected nodes are computed from the cluster state of the master node. In both cases the coordinating node will send requests for further information to each selected node. -
master_timeout
string Period to wait for a connection to the master node.
curl \
--request GET 'http://api.example.com/_cat/thread_pool' \
--header "Authorization: $API_KEY"
[
{
"node_name": "node-0",
"name": "analyze",
"active": "0",
"queue": "0",
"rejected": "0"
},
{
"node_name": "node-0",
"name": "fetch_shard_started",
"active": "0",
"queue": "0",
"rejected": "0"
},
{
"node_name": "node-0",
"name": "fetch_shard_store",
"active": "0",
"queue": "0",
"rejected": "0"
},
{
"node_name": "node-0",
"name": "flush",
"active": "0",
"queue": "0",
"rejected": "0"
},
{
"node_name": "node-0",
"name": "write",
"active": "0",
"queue": "0",
"rejected": "0"
}
]
[
{
"id": "0EWUhXeBQtaVGlexUeVwMg",
"name": "generic",
"active": "0",
"rejected": "0",
"completed": "70"
}
]
Get thread pool statistics
Get thread pool statistics for each node in a cluster. Returned information includes all built-in thread pools and custom thread pools. IMPORTANT: cat APIs are only intended for human consumption using the command line or Kibana console. They are not intended for use by applications. For application consumption, use the nodes info API.
Path parameters
-
thread_pool_patterns
string | array[string] Required A comma-separated list of thread pool names used to limit the request. Accepts wildcard expressions.
Query parameters
-
h
string | array[string] List of columns to appear in the response. Supports simple wildcards.
-
s
string | array[string] List of columns that determine how the table should be sorted. Sorting defaults to ascending and can be changed by setting
:asc
or:desc
as a suffix to the column name. -
time
string The unit used to display time values.
Values are
nanos
,micros
,ms
,s
,m
,h
, ord
. -
local
boolean If
true
, the request computes the list of selected nodes from the local cluster state. Iffalse
the list of selected nodes are computed from the cluster state of the master node. In both cases the coordinating node will send requests for further information to each selected node. -
master_timeout
string Period to wait for a connection to the master node.
curl \
--request GET 'http://api.example.com/_cat/thread_pool/{thread_pool_patterns}' \
--header "Authorization: $API_KEY"
[
{
"node_name": "node-0",
"name": "analyze",
"active": "0",
"queue": "0",
"rejected": "0"
},
{
"node_name": "node-0",
"name": "fetch_shard_started",
"active": "0",
"queue": "0",
"rejected": "0"
},
{
"node_name": "node-0",
"name": "fetch_shard_store",
"active": "0",
"queue": "0",
"rejected": "0"
},
{
"node_name": "node-0",
"name": "flush",
"active": "0",
"queue": "0",
"rejected": "0"
},
{
"node_name": "node-0",
"name": "write",
"active": "0",
"queue": "0",
"rejected": "0"
}
]
[
{
"id": "0EWUhXeBQtaVGlexUeVwMg",
"name": "generic",
"active": "0",
"rejected": "0",
"completed": "70"
}
]
Get transform information
Added in 7.7.0
Get configuration and usage information about transforms.
CAT APIs are only intended for human consumption using the Kibana console or command line. They are not intended for use by applications. For application consumption, use the get transform statistics API.
Path parameters
-
transform_id
string Required A transform identifier or a wildcard expression. If you do not specify one of these options, the API returns information for all transforms.
Query parameters
-
allow_no_match
boolean Specifies what to do when the request: contains wildcard expressions and there are no transforms that match; contains the
_all
string or no identifiers and there are no matches; contains wildcard expressions and there are only partial matches. Iftrue
, it returns an empty transforms array when there are no matches and the subset of results when there are partial matches. Iffalse
, the request returns a 404 status code when there are no matches or only partial matches. -
from
number Skips the specified number of transforms.
-
h
string | array[string] Comma-separated list of column names to display.
Supported values include:
changes_last_detection_time
(orcldt
): The timestamp when changes were last detected in the source indices.checkpoint
(orcp
): The sequence number for the checkpoint.checkpoint_duration_time_exp_avg
(orcdtea
,checkpointTimeExpAvg
): Exponential moving average of the duration of the checkpoint, in milliseconds.checkpoint_progress
(orc
,checkpointProgress
): The progress of the next checkpoint that is currently in progress.create_time
(orct
,createTime
): The time the transform was created.delete_time
(ordtime
): The amount of time spent deleting, in milliseconds.description
(ord
): The description of the transform.dest_index
(ordi
,destIndex
): The destination index for the transform. The mappings of the destination index are deduced based on the source fields when possible. If alternate mappings are required, use the Create index API prior to starting the transform.documents_deleted
(ordocd
): The number of documents that have been deleted from the destination index due to the retention policy for this transform.documents_indexed
(ordoci
): The number of documents that have been indexed into the destination index for the transform.docs_per_second
(ordps
): Specifies a limit on the number of input documents per second. This setting throttles the transform by adding a wait time between search requests. The default value isnull
, which disables throttling.documents_processed
(ordocp
): The number of documents that have been processed from the source index of the transform.frequency
(orf
): The interval between checks for changes in the source indices when the transform is running continuously. Also determines the retry interval in the event of transient failures while the transform is searching or indexing. The minimum value is1s
and the maximum is1h
. The default value is1m
.id
: Identifier for the transform.index_failure
(orif
): The number of indexing failures.index_time
(oritime
): The amount of time spent indexing, in milliseconds.index_total
(orit
): The number of index operations.indexed_documents_exp_avg
(oridea
): Exponential moving average of the number of new documents that have been indexed.last_search_time
(orlst
,lastSearchTime
): The timestamp of the last search in the source indices. This field is only shown if the transform is running.max_page_search_size
(ormpsz
): Defines the initial page size to use for the composite aggregation for each checkpoint. If circuit breaker exceptions occur, the page size is dynamically adjusted to a lower value. The minimum value is10
and the maximum is65,536
. The default value is500
.pages_processed
(orpp
): The number of search or bulk index operations processed. Documents are processed in batches instead of individually.pipeline
(orp
): The unique identifier for an ingest pipeline.processed_documents_exp_avg
(orpdea
): Exponential moving average of the number of documents that have been processed.processing_time
(orpt
): The amount of time spent processing results, in milliseconds.reason
(orr
): If a transform has afailed
state, this property provides details about the reason for the failure.search_failure
(orsf
): The number of search failures.search_time
(orstime
): The amount of time spent searching, in milliseconds.search_total
(orst
): The number of search operations on the source index for the transform.source_index
(orsi
,sourceIndex
): The source indices for the transform. It can be a single index, an index pattern (for example,"my-index-*"
), an array of indices (for example,["my-index-000001", "my-index-000002"]
), or an array of index patterns (for example,["my-index-*", "my-other-index-*"]
. For remote indices use the syntax"remote_name:index_name"
. If any indices are in remote clusters then the master node and at least one transform node must have theremote_cluster_client
node role.state
(ors
): The status of the transform, which can be one of the following values:aborting
: The transform is aborting.failed
: The transform failed. For more information about the failure, check the reason field.indexing
: The transform is actively processing data and creating new documents.started
: The transform is running but not actively indexing data.stopped
: The transform is stopped.stopping
: The transform is stopping.
transform_type
(ortt
): Indicates the type of transform:batch
orcontinuous
.trigger_count
(ortc
): The number of times the transform has been triggered by the scheduler. For example, the scheduler triggers the transform indexer to check for updates or ingest new data at an interval specified in thefrequency
property.version
(orv
): The version of Elasticsearch that existed on the node when the transform was created.
-
s
string | array[string] Comma-separated list of column names or column aliases used to sort the response.
Supported values include:
changes_last_detection_time
(orcldt
): The timestamp when changes were last detected in the source indices.checkpoint
(orcp
): The sequence number for the checkpoint.checkpoint_duration_time_exp_avg
(orcdtea
,checkpointTimeExpAvg
): Exponential moving average of the duration of the checkpoint, in milliseconds.checkpoint_progress
(orc
,checkpointProgress
): The progress of the next checkpoint that is currently in progress.create_time
(orct
,createTime
): The time the transform was created.delete_time
(ordtime
): The amount of time spent deleting, in milliseconds.description
(ord
): The description of the transform.dest_index
(ordi
,destIndex
): The destination index for the transform. The mappings of the destination index are deduced based on the source fields when possible. If alternate mappings are required, use the Create index API prior to starting the transform.documents_deleted
(ordocd
): The number of documents that have been deleted from the destination index due to the retention policy for this transform.documents_indexed
(ordoci
): The number of documents that have been indexed into the destination index for the transform.docs_per_second
(ordps
): Specifies a limit on the number of input documents per second. This setting throttles the transform by adding a wait time between search requests. The default value isnull
, which disables throttling.documents_processed
(ordocp
): The number of documents that have been processed from the source index of the transform.frequency
(orf
): The interval between checks for changes in the source indices when the transform is running continuously. Also determines the retry interval in the event of transient failures while the transform is searching or indexing. The minimum value is1s
and the maximum is1h
. The default value is1m
.id
: Identifier for the transform.index_failure
(orif
): The number of indexing failures.index_time
(oritime
): The amount of time spent indexing, in milliseconds.index_total
(orit
): The number of index operations.indexed_documents_exp_avg
(oridea
): Exponential moving average of the number of new documents that have been indexed.last_search_time
(orlst
,lastSearchTime
): The timestamp of the last search in the source indices. This field is only shown if the transform is running.max_page_search_size
(ormpsz
): Defines the initial page size to use for the composite aggregation for each checkpoint. If circuit breaker exceptions occur, the page size is dynamically adjusted to a lower value. The minimum value is10
and the maximum is65,536
. The default value is500
.pages_processed
(orpp
): The number of search or bulk index operations processed. Documents are processed in batches instead of individually.pipeline
(orp
): The unique identifier for an ingest pipeline.processed_documents_exp_avg
(orpdea
): Exponential moving average of the number of documents that have been processed.processing_time
(orpt
): The amount of time spent processing results, in milliseconds.reason
(orr
): If a transform has afailed
state, this property provides details about the reason for the failure.search_failure
(orsf
): The number of search failures.search_time
(orstime
): The amount of time spent searching, in milliseconds.search_total
(orst
): The number of search operations on the source index for the transform.source_index
(orsi
,sourceIndex
): The source indices for the transform. It can be a single index, an index pattern (for example,"my-index-*"
), an array of indices (for example,["my-index-000001", "my-index-000002"]
), or an array of index patterns (for example,["my-index-*", "my-other-index-*"]
. For remote indices use the syntax"remote_name:index_name"
. If any indices are in remote clusters then the master node and at least one transform node must have theremote_cluster_client
node role.state
(ors
): The status of the transform, which can be one of the following values:aborting
: The transform is aborting.failed
: The transform failed. For more information about the failure, check the reason field.indexing
: The transform is actively processing data and creating new documents.started
: The transform is running but not actively indexing data.stopped
: The transform is stopped.stopping
: The transform is stopping.
transform_type
(ortt
): Indicates the type of transform:batch
orcontinuous
.trigger_count
(ortc
): The number of times the transform has been triggered by the scheduler. For example, the scheduler triggers the transform indexer to check for updates or ingest new data at an interval specified in thefrequency
property.version
(orv
): The version of Elasticsearch that existed on the node when the transform was created.
-
time
string The unit used to display time values.
Values are
nanos
,micros
,ms
,s
,m
,h
, ord
. -
size
number The maximum number of transforms to obtain.
curl \
--request GET 'http://api.example.com/_cat/transforms/{transform_id}' \
--header "Authorization: $API_KEY"
[
{
"id" : "ecommerce_transform",
"state" : "started",
"checkpoint" : "1",
"documents_processed" : "705",
"checkpoint_progress" : "100.00",
"changes_last_detection_time" : null
}
]
Get cluster-wide settings
By default, it returns only settings that have been explicitly defined.
Query parameters
-
flat_settings
boolean If
true
, returns settings in flat format. -
include_defaults
boolean If
true
, returns default cluster settings from the local node. -
master_timeout
string Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.
-
timeout
string Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.
curl \
--request GET 'http://api.example.com/_cluster/settings' \
--header "Authorization: $API_KEY"
Path parameters
-
target
string | array[string] Required Limits the information returned to the specific target. Supports a comma-separated list, such as http,ingest.
Supported values include:
_all
,http
,ingest
,thread_pool
,script
curl \
--request GET 'http://api.example.com/_info/{target}' \
--header "Authorization: $API_KEY"
Get the cluster state
Added in 1.3.0
Get comprehensive information about the state of the cluster.
The cluster state is an internal data structure which keeps track of a variety of information needed by every node, including the identity and attributes of the other nodes in the cluster; cluster-wide settings; index metadata, including the mapping and settings for each index; the location and status of every shard copy in the cluster.
The elected master node ensures that every node in the cluster has a copy of the same cluster state. This API lets you retrieve a representation of this internal state for debugging or diagnostic purposes. You may need to consult the Elasticsearch source code to determine the precise meaning of the response.
By default the API will route requests to the elected master node since this node is the authoritative source of cluster states.
You can also retrieve the cluster state held on the node handling the API request by adding the ?local=true
query parameter.
Elasticsearch may need to expend significant effort to compute a response to this API in larger clusters, and the response may comprise a very large quantity of data. If you use this API repeatedly, your cluster may become unstable.
WARNING: The response is a representation of an internal data structure. Its format is not subject to the same compatibility guarantees as other more stable APIs and may change from version to version. Do not query this API using external monitoring tools. Instead, obtain the information you require using other more stable cluster APIs.
Path parameters
-
metric
string | array[string] Required Limit the information returned to the specified metrics
Query parameters
-
allow_no_indices
boolean Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes
_all
string or when no indices have been specified) -
expand_wildcards
string | array[string] Whether to expand wildcard expression to concrete indices that are open, closed or both.
Supported values include:
all
: Match any data stream or index, including hidden ones.open
: Match open, non-hidden indices. Also matches any non-hidden data stream.closed
: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.hidden
: Match hidden data streams and hidden indices. Must be combined withopen
,closed
, orboth
.none
: Wildcard expressions are not accepted.
-
flat_settings
boolean Return settings in flat format (default: false)
-
local
boolean Return local information, do not retrieve the state from master node (default: false)
-
master_timeout
string Specify timeout for connection to master
-
wait_for_metadata_version
number Wait for the metadata version to be equal or greater than the specified metadata version
-
wait_for_timeout
string The maximum time to wait for wait_for_metadata_version before timing out
curl \
--request GET 'http://api.example.com/_cluster/state/{metric}' \
--header "Authorization: $API_KEY"
Get cluster repositories metering
Technical preview
Get repositories metering information for a cluster. This API exposes monotonically non-decreasing counters and it is expected that clients would durably store the information needed to compute aggregations over a period of time. Additionally, the information exposed by this API is volatile, meaning that it will not be present after node restarts.
Path parameters
-
node_id
string | array[string] Required Comma-separated list of node IDs or names used to limit returned information.
curl \
--request GET 'http://api.example.com/_nodes/{node_id}/_repositories_metering' \
--header "Authorization: $API_KEY"
Get the hot threads for nodes
Get a breakdown of the hot threads on each selected node in the cluster. The output is plain text with a breakdown of the top hot threads for each node.
Query parameters
-
ignore_idle_threads
boolean If true, known idle threads (e.g. waiting in a socket select, or to get a task from an empty queue) are filtered out.
-
interval
string The interval to do the second sampling of threads.
-
snapshots
number Number of samples of thread stacktrace.
-
threads
number Specifies the number of hot threads to provide information for.
-
timeout
string Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.
-
type
string The type to sample.
Values are
cpu
,wait
,block
,gpu
, ormem
. -
sort
string The sort order for 'cpu' type (default: total)
Values are
cpu
,wait
,block
,gpu
, ormem
.
curl \
--request GET 'http://api.example.com/_nodes/hot_threads' \
--header "Authorization: $API_KEY"
Get the hot threads for nodes
Get a breakdown of the hot threads on each selected node in the cluster. The output is plain text with a breakdown of the top hot threads for each node.
Path parameters
-
node_id
string | array[string] Required List of node IDs or names used to limit returned information.
Query parameters
-
ignore_idle_threads
boolean If true, known idle threads (e.g. waiting in a socket select, or to get a task from an empty queue) are filtered out.
-
interval
string The interval to do the second sampling of threads.
-
snapshots
number Number of samples of thread stacktrace.
-
threads
number Specifies the number of hot threads to provide information for.
-
timeout
string Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.
-
type
string The type to sample.
Values are
cpu
,wait
,block
,gpu
, ormem
. -
sort
string The sort order for 'cpu' type (default: total)
Values are
cpu
,wait
,block
,gpu
, ormem
.
curl \
--request GET 'http://api.example.com/_nodes/{node_id}/hot_threads' \
--header "Authorization: $API_KEY"
Get node information
Added in 1.3.0
By default, the API returns all attributes and core settings for cluster nodes.
Path parameters
-
metric
string | array[string] Required Limits the information returned to the specific metrics. Supports a comma-separated list, such as http,ingest.
Query parameters
-
flat_settings
boolean If true, returns settings in flat format.
-
timeout
string Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.
curl \
--request GET 'http://api.example.com/_nodes/{metric}' \
--header "Authorization: $API_KEY"
{
"_nodes": {},
"cluster_name": "elasticsearch",
"nodes": {
"USpTGYaBSIKbgSUJR2Z9lg": {
"name": "node-0",
"transport_address": "192.168.17:9300",
"host": "node-0.elastic.co",
"ip": "192.168.17",
"version": "{version}",
"transport_version": 100000298,
"index_version": 100000074,
"component_versions": {
"ml_config_version": 100000162,
"transform_config_version": 100000096
},
"build_flavor": "default",
"build_type": "{build_type}",
"build_hash": "587409e",
"roles": [
"master",
"data",
"ingest"
],
"attributes": {},
"plugins": [
{
"name": "analysis-icu",
"version": "{version}",
"description": "The ICU Analysis plugin integrates Lucene ICU
module into elasticsearch, adding ICU relates analysis components.",
"classname":
"org.elasticsearch.plugin.analysis.icu.AnalysisICUPlugin",
"has_native_controller": false
}
],
"modules": [
{
"name": "lang-painless",
"version": "{version}",
"description": "An easy, safe and fast scripting language for
Elasticsearch",
"classname": "org.elasticsearch.painless.PainlessPlugin",
"has_native_controller": false
}
]
}
}
}
Get node information
Added in 1.3.0
By default, the API returns all attributes and core settings for cluster nodes.
Query parameters
-
flat_settings
boolean If true, returns settings in flat format.
-
timeout
string Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.
curl \
--request GET 'http://api.example.com/_nodes/{node_id}/{metric}' \
--header "Authorization: $API_KEY"
{
"_nodes": {},
"cluster_name": "elasticsearch",
"nodes": {
"USpTGYaBSIKbgSUJR2Z9lg": {
"name": "node-0",
"transport_address": "192.168.17:9300",
"host": "node-0.elastic.co",
"ip": "192.168.17",
"version": "{version}",
"transport_version": 100000298,
"index_version": 100000074,
"component_versions": {
"ml_config_version": 100000162,
"transform_config_version": 100000096
},
"build_flavor": "default",
"build_type": "{build_type}",
"build_hash": "587409e",
"roles": [
"master",
"data",
"ingest"
],
"attributes": {},
"plugins": [
{
"name": "analysis-icu",
"version": "{version}",
"description": "The ICU Analysis plugin integrates Lucene ICU
module into elasticsearch, adding ICU relates analysis components.",
"classname":
"org.elasticsearch.plugin.analysis.icu.AnalysisICUPlugin",
"has_native_controller": false
}
],
"modules": [
{
"name": "lang-painless",
"version": "{version}",
"description": "An easy, safe and fast scripting language for
Elasticsearch",
"classname": "org.elasticsearch.painless.PainlessPlugin",
"has_native_controller": false
}
]
}
}
}
Get node statistics
Get statistics for nodes in a cluster. By default, all stats are returned. You can limit the returned information by using metrics.
Query parameters
-
completion_fields
string | array[string] Comma-separated list or wildcard expressions of fields to include in fielddata and suggest statistics.
-
fielddata_fields
string | array[string] Comma-separated list or wildcard expressions of fields to include in fielddata statistics.
-
fields
string | array[string] Comma-separated list or wildcard expressions of fields to include in the statistics.
-
groups
boolean Comma-separated list of search groups to include in the search statistics.
-
include_segment_file_sizes
boolean If true, the call reports the aggregated disk usage of each one of the Lucene index files (only applies if segment stats are requested).
-
level
string Indicates whether statistics are aggregated at the cluster, index, or shard level.
Values are
cluster
,indices
, orshards
. -
timeout
string Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.
-
types
array[string] A comma-separated list of document types for the indexing index metric.
-
include_unloaded_segments
boolean If
true
, the response includes information from segments that are not loaded into memory.
curl \
--request GET 'http://api.example.com/_nodes/stats' \
--header "Authorization: $API_KEY"
Get node statistics
Get statistics for nodes in a cluster. By default, all stats are returned. You can limit the returned information by using metrics.
Path parameters
-
metric
string | array[string] Required Limit the information returned to the specified metrics
Query parameters
-
completion_fields
string | array[string] Comma-separated list or wildcard expressions of fields to include in fielddata and suggest statistics.
-
fielddata_fields
string | array[string] Comma-separated list or wildcard expressions of fields to include in fielddata statistics.
-
fields
string | array[string] Comma-separated list or wildcard expressions of fields to include in the statistics.
-
groups
boolean Comma-separated list of search groups to include in the search statistics.
-
include_segment_file_sizes
boolean If true, the call reports the aggregated disk usage of each one of the Lucene index files (only applies if segment stats are requested).
-
level
string Indicates whether statistics are aggregated at the cluster, index, or shard level.
Values are
cluster
,indices
, orshards
. -
timeout
string Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.
-
types
array[string] A comma-separated list of document types for the indexing index metric.
-
include_unloaded_segments
boolean If
true
, the response includes information from segments that are not loaded into memory.
curl \
--request GET 'http://api.example.com/_nodes/stats/{metric}' \
--header "Authorization: $API_KEY"
Get node statistics
Get statistics for nodes in a cluster. By default, all stats are returned. You can limit the returned information by using metrics.
Query parameters
-
completion_fields
string | array[string] Comma-separated list or wildcard expressions of fields to include in fielddata and suggest statistics.
-
fielddata_fields
string | array[string] Comma-separated list or wildcard expressions of fields to include in fielddata statistics.
-
fields
string | array[string] Comma-separated list or wildcard expressions of fields to include in the statistics.
-
groups
boolean Comma-separated list of search groups to include in the search statistics.
-
include_segment_file_sizes
boolean If true, the call reports the aggregated disk usage of each one of the Lucene index files (only applies if segment stats are requested).
-
level
string Indicates whether statistics are aggregated at the cluster, index, or shard level.
Values are
cluster
,indices
, orshards
. -
timeout
string Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.
-
types
array[string] A comma-separated list of document types for the indexing index metric.
-
include_unloaded_segments
boolean If
true
, the response includes information from segments that are not loaded into memory.
curl \
--request GET 'http://api.example.com/_nodes/{node_id}/stats/{metric}' \
--header "Authorization: $API_KEY"
Get the cluster health
Added in 8.7.0
Get a report with the health status of an Elasticsearch cluster. The report contains a list of indicators that compose Elasticsearch functionality.
Each indicator has a health status of: green, unknown, yellow or red. The indicator will provide an explanation and metadata describing the reason for its current health status.
The cluster’s status is controlled by the worst indicator status.
In the event that an indicator’s status is non-green, a list of impacts may be present in the indicator result which detail the functionalities that are negatively affected by the health issue. Each impact carries with it a severity level, an area of the system that is affected, and a simple description of the impact on the system.
Some health indicators can determine the root cause of a health problem and prescribe a set of steps that can be performed in order to improve the health of the system. The root cause and remediation steps are encapsulated in a diagnosis. A diagnosis contains a cause detailing a root cause analysis, an action containing a brief description of the steps to take to fix the problem, the list of affected resources (if applicable), and a detailed step-by-step troubleshooting guide to fix the diagnosed problem.
NOTE: The health indicators perform root cause analysis of non-green health statuses. This can be computationally expensive when called frequently. When setting up automated polling of the API for health status, set verbose to false to disable the more expensive analysis logic.
curl \
--request GET 'http://api.example.com/_health_report' \
--header "Authorization: $API_KEY"
Get the cluster health
Added in 8.7.0
Get a report with the health status of an Elasticsearch cluster. The report contains a list of indicators that compose Elasticsearch functionality.
Each indicator has a health status of: green, unknown, yellow or red. The indicator will provide an explanation and metadata describing the reason for its current health status.
The cluster’s status is controlled by the worst indicator status.
In the event that an indicator’s status is non-green, a list of impacts may be present in the indicator result which detail the functionalities that are negatively affected by the health issue. Each impact carries with it a severity level, an area of the system that is affected, and a simple description of the impact on the system.
Some health indicators can determine the root cause of a health problem and prescribe a set of steps that can be performed in order to improve the health of the system. The root cause and remediation steps are encapsulated in a diagnosis. A diagnosis contains a cause detailing a root cause analysis, an action containing a brief description of the steps to take to fix the problem, the list of affected resources (if applicable), and a detailed step-by-step troubleshooting guide to fix the diagnosed problem.
NOTE: The health indicators perform root cause analysis of non-green health statuses. This can be computationally expensive when called frequently. When setting up automated polling of the API for health status, set verbose to false to disable the more expensive analysis logic.
Path parameters
-
feature
string | array[string] Required A feature of the cluster, as returned by the top-level health report API.
curl \
--request GET 'http://api.example.com/_health_report/{feature}' \
--header "Authorization: $API_KEY"
Body
-
description
string -
index_name
string -
is_native
boolean -
language
string -
name
string -
service_type
string
curl \
--request PUT 'http://api.example.com/_connector' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"index_name\": \"search-google-drive\",\n \"name\": \"My Connector\",\n \"service_type\": \"google_drive\"\n}"'
{
"index_name": "search-google-drive",
"name": "My Connector",
"service_type": "google_drive"
}
{
"index_name": "search-google-drive",
"name": "My Connector",
"description": "My Connector to sync data to Elastic index from Google Drive",
"service_type": "google_drive",
"language": "english"
}
{
"result": "created",
"id": "my-connector"
}
Create a connector
Beta
Connectors are Elasticsearch integrations that bring content from third-party data sources, which can be deployed on Elastic Cloud or hosted on your own infrastructure. Elastic managed connectors (Native connectors) are a managed service on Elastic Cloud. Self-managed connectors (Connector clients) are self-managed on your infrastructure.
Body
-
description
string -
index_name
string -
is_native
boolean -
language
string -
name
string -
service_type
string
curl \
--request POST 'http://api.example.com/_connector' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '{"description":"string","index_name":"string","is_native":true,"language":"string","name":"string","service_type":"string"}'
Get all connector sync jobs
Beta
Get information about all stored connector sync jobs listed by their creation date in ascending order.
Query parameters
-
from
number Starting offset (default: 0)
-
size
number Specifies a max number of results to get
-
status
string A sync job status to fetch connector sync jobs for
Values are
canceling
,canceled
,completed
,error
,in_progress
,pending
, orsuspended
. -
connector_id
string A connector id to fetch connector sync jobs for
-
job_type
string | array[string] A comma-separated list of job types to fetch the sync jobs for
Supported values include:
full
,incremental
,access_control
curl \
--request GET 'http://api.example.com/_connector/_sync_job' \
--header "Authorization: $API_KEY"
Create a connector sync job
Beta
Create a connector sync job document in the internal index and initialize its counters and timestamps with default values.
Body
Required
-
id
string Required -
job_type
string Values are
full
,incremental
, oraccess_control
. -
trigger_method
string Values are
on_demand
orscheduled
.
curl \
--request POST 'http://api.example.com/_connector/_sync_job' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"id\": \"connector-id\",\n \"job_type\": \"full\",\n \"trigger_method\": \"on_demand\"\n}"'
{
"id": "connector-id",
"job_type": "full",
"trigger_method": "on_demand"
}
Update the connector configuration
Beta
Update the configuration field in the connector document.
Path parameters
-
connector_id
string Required The unique identifier of the connector to be updated
Body
Required
-
configuration
object -
values
object
curl \
--request PUT 'http://api.example.com/_connector/{connector_id}/_configuration' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"values\": {\n \"tenant_id\": \"my-tenant-id\",\n \"tenant_name\": \"my-sharepoint-site\",\n \"client_id\": \"foo\",\n \"secret_value\": \"bar\",\n \"site_collections\": \"*\"\n }\n}"'
{
"values": {
"tenant_id": "my-tenant-id",
"tenant_name": "my-sharepoint-site",
"client_id": "foo",
"secret_value": "bar",
"site_collections": "*"
}
}
{
"values": {
"secret_value": "foo-bar"
}
}
{
"result": "updated"
}
Update the connector index name
Beta
Update the index_name
field of a connector, specifying the index where the data ingested by the connector is stored.
Path parameters
-
connector_id
string Required The unique identifier of the connector to be updated
Body
Required
index_name
string | null
curl \
--request PUT 'http://api.example.com/_connector/{connector_id}/_index_name' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"index_name\": \"data-from-my-google-drive\"\n}"'
{
"index_name": "data-from-my-google-drive"
}
{
"result": "updated"
}
Update the connector name and description
Beta
Path parameters
-
connector_id
string Required The unique identifier of the connector to be updated
Body
Required
-
name
string -
description
string
curl \
--request PUT 'http://api.example.com/_connector/{connector_id}/_name' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"name\": \"Custom connector\",\n \"description\": \"This is my customized connector\"\n}"'
{
"name": "Custom connector",
"description": "This is my customized connector"
}
{
"result": "updated"
}
Update the connector is_native flag
Beta
Path parameters
-
connector_id
string Required The unique identifier of the connector to be updated
curl \
--request PUT 'http://api.example.com/_connector/{connector_id}/_native' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '{"is_native":true}'
Update the connector scheduling
Beta
Path parameters
-
connector_id
string Required The unique identifier of the connector to be updated
Body
Required
-
scheduling
object Required
curl \
--request PUT 'http://api.example.com/_connector/{connector_id}/_scheduling' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"scheduling\": {\n \"access_control\": {\n \"enabled\": true,\n \"interval\": \"0 10 0 * * ?\"\n },\n \"full\": {\n \"enabled\": true,\n \"interval\": \"0 20 0 * * ?\"\n },\n \"incremental\": {\n \"enabled\": false,\n \"interval\": \"0 30 0 * * ?\"\n }\n }\n}"'
{
"scheduling": {
"access_control": {
"enabled": true,
"interval": "0 10 0 * * ?"
},
"full": {
"enabled": true,
"interval": "0 20 0 * * ?"
},
"incremental": {
"enabled": false,
"interval": "0 30 0 * * ?"
}
}
}
{
"scheduling": {
"full": {
"enabled": true,
"interval": "0 10 0 * * ?"
}
}
}
{
"result": "updated"
}
Get auto-follow patterns
Added in 6.5.0
Get cross-cluster replication auto-follow patterns.
Path parameters
-
name
string Required The auto-follow pattern collection that you want to retrieve. If you do not specify a name, the API returns information for all collections.
Query parameters
-
master_timeout
string The period to wait for a connection to the master node. If the master node is not available before the timeout expires, the request fails and returns an error. It can also be set to
-1
to indicate that the request should never timeout.
curl \
--request GET 'http://api.example.com/_ccr/auto_follow/{name}' \
--header "Authorization: $API_KEY"
{
"patterns": [
{
"name": "my_auto_follow_pattern",
"pattern": {
"active": true,
"remote_cluster" : "remote_cluster",
"leader_index_patterns" :
[
"leader_index*"
],
"leader_index_exclusion_patterns":
[
"leader_index_001"
],
"follow_index_pattern" : "{{leader_index}}-follower"
}
}
]
}
Delete auto-follow patterns
Added in 6.5.0
Delete a collection of cross-cluster replication auto-follow patterns.
Path parameters
-
name
string Required The auto-follow pattern collection to delete.
Query parameters
-
master_timeout
string The period to wait for a connection to the master node. If the master node is not available before the timeout expires, the request fails and returns an error. It can also be set to
-1
to indicate that the request should never timeout.
curl \
--request DELETE 'http://api.example.com/_ccr/auto_follow/{name}' \
--header "Authorization: $API_KEY"
{
"acknowledged" : true
}
Get follower stats
Added in 6.5.0
Get cross-cluster replication follower stats. The API returns shard-level stats about the "following tasks" associated with each shard for the specified indices.
Path parameters
-
index
string | array[string] Required A comma-delimited list of index patterns.
Query parameters
-
timeout
string The period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.
curl \
--request GET 'http://api.example.com/{index}/_ccr/stats' \
--header "Authorization: $API_KEY"
{
"indices" : [
{
"index" : "follower_index",
"total_global_checkpoint_lag" : 256,
"shards" : [
{
"remote_cluster" : "remote_cluster",
"leader_index" : "leader_index",
"follower_index" : "follower_index",
"shard_id" : 0,
"leader_global_checkpoint" : 1024,
"leader_max_seq_no" : 1536,
"follower_global_checkpoint" : 768,
"follower_max_seq_no" : 896,
"last_requested_seq_no" : 897,
"outstanding_read_requests" : 8,
"outstanding_write_requests" : 2,
"write_buffer_operation_count" : 64,
"follower_mapping_version" : 4,
"follower_settings_version" : 2,
"follower_aliases_version" : 8,
"total_read_time_millis" : 32768,
"total_read_remote_exec_time_millis" : 16384,
"successful_read_requests" : 32,
"failed_read_requests" : 0,
"operations_read" : 896,
"bytes_read" : 32768,
"total_write_time_millis" : 16384,
"write_buffer_size_in_bytes" : 1536,
"successful_write_requests" : 16,
"failed_write_requests" : 0,
"operations_written" : 832,
"read_exceptions" : [ ],
"time_since_last_read_millis" : 8
}
]
}
]
}
Forget a follower
Added in 6.7.0
Remove the cross-cluster replication follower retention leases from the leader.
A following index takes out retention leases on its leader index. These leases are used to increase the likelihood that the shards of the leader index retain the history of operations that the shards of the following index need to run replication. When a follower index is converted to a regular index by the unfollow API (either by directly calling the API or by index lifecycle management tasks), these leases are removed. However, removal of the leases can fail, for example when the remote cluster containing the leader index is unavailable. While the leases will eventually expire on their own, their extended existence can cause the leader index to hold more history than necessary and prevent index lifecycle management from performing some operations on the leader index. This API exists to enable manually removing the leases when the unfollow API is unable to do so.
NOTE: This API does not stop replication by a following index. If you use this API with a follower index that is still actively following, the following index will add back retention leases on the leader. The only purpose of this API is to handle the case of failure to remove the following retention leases after the unfollow API is invoked.
Path parameters
-
index
string Required the name of the leader index for which specified follower retention leases should be removed
Query parameters
-
timeout
string Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.
Body
Required
-
follower_cluster
string -
follower_index
string -
follower_index_uuid
string -
leader_remote_cluster
string
curl \
--request POST 'http://api.example.com/{index}/_ccr/forget_follower' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"follower_cluster\" : \"\u003cfollower_cluster\u003e\",\n \"follower_index\" : \"\u003cfollower_index\u003e\",\n \"follower_index_uuid\" : \"\u003cfollower_index_uuid\u003e\",\n \"leader_remote_cluster\" : \"\u003cleader_remote_cluster\u003e\"\n}"'
{
"follower_cluster" : "<follower_cluster>",
"follower_index" : "<follower_index>",
"follower_index_uuid" : "<follower_index_uuid>",
"leader_remote_cluster" : "<leader_remote_cluster>"
}
{
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0,
"failures" : [ ]
}
}
Pause an auto-follow pattern
Added in 7.5.0
Pause a cross-cluster replication auto-follow pattern. When the API returns, the auto-follow pattern is inactive. New indices that are created on the remote cluster and match the auto-follow patterns are ignored.
You can resume auto-following with the resume auto-follow pattern API. When it resumes, the auto-follow pattern is active again and automatically configures follower indices for newly created indices on the remote cluster that match its patterns. Remote indices that were created while the pattern was paused will also be followed, unless they have been deleted or closed in the interim.
Path parameters
-
name
string Required The name of the auto-follow pattern to pause.
Query parameters
-
master_timeout
string The period to wait for a connection to the master node. If the master node is not available before the timeout expires, the request fails and returns an error. It can also be set to
-1
to indicate that the request should never timeout.
curl \
--request POST 'http://api.example.com/_ccr/auto_follow/{name}/pause' \
--header "Authorization: $API_KEY"
{
"acknowledged" : true
}
Pause a follower
Added in 6.5.0
Pause a cross-cluster replication follower index. The follower index will not fetch any additional operations from the leader index. You can resume following with the resume follower API. You can pause and resume a follower index to change the configuration of the following task.
Path parameters
-
index
string Required The name of the follower index.
Query parameters
-
master_timeout
string The period to wait for a connection to the master node. If the master node is not available before the timeout expires, the request fails and returns an error. It can also be set to
-1
to indicate that the request should never timeout.
curl \
--request POST 'http://api.example.com/{index}/_ccr/pause_follow' \
--header "Authorization: $API_KEY"
{
"acknowledged" : true
}
Get data stream lifecycles
Added in 8.11.0
Get the data stream lifecycle configuration of one or more data streams.
Path parameters
-
name
string | array[string] Required Comma-separated list of data streams to limit the request. Supports wildcards (
*
). To target all data streams, omit this parameter or use*
or_all
.
Query parameters
-
expand_wildcards
string | array[string] Type of data stream that wildcard patterns can match. Supports comma-separated values, such as
open,hidden
. Valid values are:all
,open
,closed
,hidden
,none
.Supported values include:
all
: Match any data stream or index, including hidden ones.open
: Match open, non-hidden indices. Also matches any non-hidden data stream.closed
: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.hidden
: Match hidden data streams and hidden indices. Must be combined withopen
,closed
, orboth
.none
: Wildcard expressions are not accepted.
-
include_defaults
boolean If
true
, return all default settings in the response. -
master_timeout
string Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.
curl \
--request GET 'http://api.example.com/_data_stream/{name}/_lifecycle' \
--header "Authorization: $API_KEY"
{
"data_streams": [
{
"name": "my-data-stream-1",
"lifecycle": {
"enabled": true,
"data_retention": "7d"
}
},
{
"name": "my-data-stream-2",
"lifecycle": {
"enabled": true,
"data_retention": "7d"
}
}
]
}
Downsample an index
Technical preview
Aggregate a time series (TSDS) index and store pre-computed statistical summaries (min
, max
, sum
, value_count
and avg
) for each metric field grouped by a configured time interval.
For example, a TSDS index that contains metrics sampled every 10 seconds can be downsampled to an hourly index.
All documents within an hour interval are summarized and stored as a single document in the downsample index.
NOTE: Only indices in a time series data stream are supported.
Neither field nor document level security can be defined on the source index.
The source index must be read only (index.blocks.write: true
).
Path parameters
-
index
string Required Name of the time series index to downsample.
-
target_index
string Required Name of the index to create.
Body
Required
-
fixed_interval
string Required A date histogram interval. Similar to
Duration
with additional units:w
(week),M
(month),q
(quarter) andy
(year)
curl \
--request POST 'http://api.example.com/{index}/_downsample/{target_index}' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"fixed_interval\": \"1d\"\n}"'
{
"fixed_interval": "1d"
}
Promote a data stream
Added in 7.9.0
Promote a data stream from a replicated data stream managed by cross-cluster replication (CCR) to a regular data stream.
With CCR auto following, a data stream from a remote cluster can be replicated to the local cluster. These data streams can't be rolled over in the local cluster. These replicated data streams roll over only if the upstream data stream rolls over. In the event that the remote cluster is no longer available, the data stream in the local cluster can be promoted to a regular data stream, which allows these data streams to be rolled over in the local cluster.
NOTE: When promoting a data stream, ensure the local cluster has a data stream enabled index template that matches the data stream. If this is missing, the data stream will not be able to roll over until a matching index template is created. This will affect the lifecycle management of the data stream and interfere with the data stream size and retention.
Path parameters
-
name
string Required The name of the data stream
Query parameters
-
master_timeout
string Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.
curl \
--request POST 'http://api.example.com/_data_stream/_promote/{name}' \
--header "Authorization: $API_KEY"
Bulk index or delete documents
Perform multiple index
, create
, delete
, and update
actions in a single request.
This reduces overhead and can greatly increase indexing speed.
If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:
- To use the
create
action, you must have thecreate_doc
,create
,index
, orwrite
index privilege. Data streams support only thecreate
action. - To use the
index
action, you must have thecreate
,index
, orwrite
index privilege. - To use the
delete
action, you must have thedelete
orwrite
index privilege. - To use the
update
action, you must have theindex
orwrite
index privilege. - To automatically create a data stream or index with a bulk API request, you must have the
auto_configure
,create_index
, ormanage
index privilege. - To make the result of a bulk operation visible to search using the
refresh
parameter, you must have themaintenance
ormanage
index privilege.
Automatic data stream creation requires a matching index template with data stream enabled.
The actions are specified in the request body using a newline delimited JSON (NDJSON) structure:
action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
....
action_and_meta_data\n
optional_source\n
The index
and create
actions expect a source on the next line and have the same semantics as the op_type
parameter in the standard index API.
A create
action fails if a document with the same ID already exists in the target
An index
action adds or replaces a document as necessary.
NOTE: Data streams support only the create
action.
To update or delete a document in a data stream, you must target the backing index containing the document.
An update
action expects that the partial doc, upsert, and script and its options are specified on the next line.
A delete
action does not expect a source on the next line and has the same semantics as the standard delete API.
NOTE: The final line of data must end with a newline character (\n
).
Each newline character may be preceded by a carriage return (\r
).
When sending NDJSON data to the _bulk
endpoint, use a Content-Type
header of application/json
or application/x-ndjson
.
Because this format uses literal newline characters (\n
) as delimiters, make sure that the JSON actions and sources are not pretty printed.
If you provide a target in the request path, it is used for any actions that don't explicitly specify an _index
argument.
A note on the format: the idea here is to make processing as fast as possible.
As some of the actions are redirected to other shards on other nodes, only action_meta_data
is parsed on the receiving node side.
Client libraries using this protocol should try and strive to do something similar on the client side, and reduce buffering as much as possible.
There is no "correct" number of actions to perform in a single bulk request. Experiment with different settings to find the optimal size for your particular workload. Note that Elasticsearch limits the maximum size of a HTTP request to 100mb by default so clients must ensure that no request exceeds this size. It is not possible to index a single document that exceeds the size limit, so you must pre-process any such documents into smaller pieces before sending them to Elasticsearch. For instance, split documents into pages or chapters before indexing them, or store raw binary data in a system outside Elasticsearch and replace the raw data with a link to the external system in the documents that you send to Elasticsearch.
Client suppport for bulk requests
Some of the officially supported clients provide helpers to assist with bulk requests and reindexing:
- Go: Check out
esutil.BulkIndexer
- Perl: Check out
Search::Elasticsearch::Client::5_0::Bulk
andSearch::Elasticsearch::Client::5_0::Scroll
- Python: Check out
elasticsearch.helpers.*
- JavaScript: Check out
client.helpers.*
- .NET: Check out
BulkAllObservable
- PHP: Check out bulk indexing.
Submitting bulk requests with cURL
If you're providing text file input to curl
, you must use the --data-binary
flag instead of plain -d
.
The latter doesn't preserve newlines. For example:
$ cat requests
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
$ curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary "@requests"; echo
{"took":7, "errors": false, "items":[{"index":{"_index":"test","_id":"1","_version":1,"result":"created","forced_refresh":false}}]}
Optimistic concurrency control
Each index
and delete
action within a bulk API call may include the if_seq_no
and if_primary_term
parameters in their respective action and meta data lines.
The if_seq_no
and if_primary_term
parameters control how operations are run, based on the last modification to existing documents. See Optimistic concurrency control for more details.
Versioning
Each bulk item can include the version value using the version
field.
It automatically follows the behavior of the index or delete operation based on the _version
mapping.
It also support the version_type
.
Routing
Each bulk item can include the routing value using the routing
field.
It automatically follows the behavior of the index or delete operation based on the _routing
mapping.
NOTE: Data streams do not support custom routing unless they were created with the allow_custom_routing
setting enabled in the template.
Wait for active shards
When making bulk calls, you can set the wait_for_active_shards
parameter to require a minimum number of shard copies to be active before starting to process the bulk request.
Refresh
Control when the changes made by this request are visible to search.
NOTE: Only the shards that receive the bulk request will be affected by refresh.
Imagine a _bulk?refresh=wait_for
request with three documents in it that happen to be routed to different shards in an index with five shards.
The request will only wait for those three shards to refresh.
The other two shards that make up the index do not participate in the _bulk
request at all.
Query parameters
-
include_source_on_error
boolean True or false if to include the document source in the error message in case of parsing errors.
-
list_executed_pipelines
boolean If
true
, the response will include the ingest pipelines that were run for each index or create. -
pipeline
string The pipeline identifier to use to preprocess incoming documents. If the index has a default ingest pipeline specified, setting the value to
_none
turns off the default ingest pipeline for this request. If a final pipeline is configured, it will always run regardless of the value of this parameter. -
refresh
string If
true
, Elasticsearch refreshes the affected shards to make this operation visible to search. Ifwait_for
, wait for a refresh to make this operation visible to search. Iffalse
, do nothing with refreshes. Valid values:true
,false
,wait_for
.Values are
true
,false
, orwait_for
. -
routing
string A custom value that is used to route operations to a specific shard.
-
_source
boolean | string | array[string] Indicates whether to return the
_source
field (true
orfalse
) or contains a list of fields to return. -
_source_excludes
string | array[string] A comma-separated list of source fields to exclude from the response. You can also use this parameter to exclude fields from the subset specified in
_source_includes
query parameter. If the_source
parameter isfalse
, this parameter is ignored. -
_source_includes
string | array[string] A comma-separated list of source fields to include in the response. If this parameter is specified, only these source fields are returned. You can exclude fields from this subset using the
_source_excludes
query parameter. If the_source
parameter isfalse
, this parameter is ignored. -
timeout
string The period each action waits for the following operations: automatic index creation, dynamic mapping updates, and waiting for active shards. The default is
1m
(one minute), which guarantees Elasticsearch waits for at least the timeout before failing. The actual wait time could be longer, particularly when multiple waits occur. -
wait_for_active_shards
number | string The number of shard copies that must be active before proceeding with the operation. Set to
all
or any positive integer up to the total number of shards in the index (number_of_replicas+1
). The default is1
, which waits for each primary shard to be active. -
require_alias
boolean If
true
, the request's actions must target an index alias. -
require_data_stream
boolean If
true
, the request's actions must target a data stream (existing or to be created).
curl \
--request PUT 'http://api.example.com/_bulk' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{ \"index\" : { \"_index\" : \"test\", \"_id\" : \"1\" } }\n{ \"field1\" : \"value1\" }\n{ \"delete\" : { \"_index\" : \"test\", \"_id\" : \"2\" } }\n{ \"create\" : { \"_index\" : \"test\", \"_id\" : \"3\" } }\n{ \"field1\" : \"value3\" }\n{ \"update\" : {\"_id\" : \"1\", \"_index\" : \"test\"} }\n{ \"doc\" : {\"field2\" : \"value2\"} }"'
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }
{ "update" : {"_id" : "1", "_index" : "index1", "retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"} }
{ "update" : { "_id" : "0", "_index" : "index1", "retry_on_conflict" : 3} }
{ "script" : { "source": "ctx._source.counter += params.param1", "lang" : "painless", "params" : {"param1" : 1}}, "upsert" : {"counter" : 1}}
{ "update" : {"_id" : "2", "_index" : "index1", "retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"}, "doc_as_upsert" : true }
{ "update" : {"_id" : "3", "_index" : "index1", "_source" : true} }
{ "doc" : {"field" : "value"} }
{ "update" : {"_id" : "4", "_index" : "index1"} }
{ "doc" : {"field" : "value"}, "_source": true}
{ "update": {"_id": "5", "_index": "index1"} }
{ "doc": {"my_field": "foo"} }
{ "update": {"_id": "6", "_index": "index1"} }
{ "doc": {"my_field": "foo"} }
{ "create": {"_id": "7", "_index": "index1"} }
{ "my_field": "foo" }
{ "index" : { "_index" : "my_index", "_id" : "1", "dynamic_templates": {"work_location": "geo_point"}} }
{ "field" : "value1", "work_location": "41.12,-71.34", "raw_location": "41.12,-71.34"}
{ "create" : { "_index" : "my_index", "_id" : "2", "dynamic_templates": {"home_location": "geo_point"}} }
{ "field" : "value2", "home_location": "41.12,-71.34"}
{
"took": 30,
"errors": false,
"items": [
{
"index": {
"_index": "test",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"status": 201,
"_seq_no" : 0,
"_primary_term": 1
}
},
{
"delete": {
"_index": "test",
"_id": "2",
"_version": 1,
"result": "not_found",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"status": 404,
"_seq_no" : 1,
"_primary_term" : 2
}
},
{
"create": {
"_index": "test",
"_id": "3",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"status": 201,
"_seq_no" : 2,
"_primary_term" : 3
}
},
{
"update": {
"_index": "test",
"_id": "1",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"status": 200,
"_seq_no" : 3,
"_primary_term" : 4
}
}
]
}
{
"took": 486,
"errors": true,
"items": [
{
"update": {
"_index": "index1",
"_id": "5",
"status": 404,
"error": {
"type": "document_missing_exception",
"reason": "[5]: document missing",
"index_uuid": "aAsFqTI0Tc2W0LCWgPNrOA",
"shard": "0",
"index": "index1"
}
}
},
{
"update": {
"_index": "index1",
"_id": "6",
"status": 404,
"error": {
"type": "document_missing_exception",
"reason": "[6]: document missing",
"index_uuid": "aAsFqTI0Tc2W0LCWgPNrOA",
"shard": "0",
"index": "index1"
}
}
},
{
"create": {
"_index": "index1",
"_id": "7",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1,
"status": 201
}
}
]
}
{
"items": [
{
"update": {
"error": {
"type": "document_missing_exception",
"reason": "[5]: document missing",
"index_uuid": "aAsFqTI0Tc2W0LCWgPNrOA",
"shard": "0",
"index": "index1"
}
}
},
{
"update": {
"error": {
"type": "document_missing_exception",
"reason": "[6]: document missing",
"index_uuid": "aAsFqTI0Tc2W0LCWgPNrOA",
"shard": "0",
"index": "index1"
}
}
}
]
}
Bulk index or delete documents
Perform multiple index
, create
, delete
, and update
actions in a single request.
This reduces overhead and can greatly increase indexing speed.
If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:
- To use the
create
action, you must have thecreate_doc
,create
,index
, orwrite
index privilege. Data streams support only thecreate
action. - To use the
index
action, you must have thecreate
,index
, orwrite
index privilege. - To use the
delete
action, you must have thedelete
orwrite
index privilege. - To use the
update
action, you must have theindex
orwrite
index privilege. - To automatically create a data stream or index with a bulk API request, you must have the
auto_configure
,create_index
, ormanage
index privilege. - To make the result of a bulk operation visible to search using the
refresh
parameter, you must have themaintenance
ormanage
index privilege.
Automatic data stream creation requires a matching index template with data stream enabled.
The actions are specified in the request body using a newline delimited JSON (NDJSON) structure:
action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
....
action_and_meta_data\n
optional_source\n
The index
and create
actions expect a source on the next line and have the same semantics as the op_type
parameter in the standard index API.
A create
action fails if a document with the same ID already exists in the target
An index
action adds or replaces a document as necessary.
NOTE: Data streams support only the create
action.
To update or delete a document in a data stream, you must target the backing index containing the document.
An update
action expects that the partial doc, upsert, and script and its options are specified on the next line.
A delete
action does not expect a source on the next line and has the same semantics as the standard delete API.
NOTE: The final line of data must end with a newline character (\n
).
Each newline character may be preceded by a carriage return (\r
).
When sending NDJSON data to the _bulk
endpoint, use a Content-Type
header of application/json
or application/x-ndjson
.
Because this format uses literal newline characters (\n
) as delimiters, make sure that the JSON actions and sources are not pretty printed.
If you provide a target in the request path, it is used for any actions that don't explicitly specify an _index
argument.
A note on the format: the idea here is to make processing as fast as possible.
As some of the actions are redirected to other shards on other nodes, only action_meta_data
is parsed on the receiving node side.
Client libraries using this protocol should try and strive to do something similar on the client side, and reduce buffering as much as possible.
There is no "correct" number of actions to perform in a single bulk request. Experiment with different settings to find the optimal size for your particular workload. Note that Elasticsearch limits the maximum size of a HTTP request to 100mb by default so clients must ensure that no request exceeds this size. It is not possible to index a single document that exceeds the size limit, so you must pre-process any such documents into smaller pieces before sending them to Elasticsearch. For instance, split documents into pages or chapters before indexing them, or store raw binary data in a system outside Elasticsearch and replace the raw data with a link to the external system in the documents that you send to Elasticsearch.
Client suppport for bulk requests
Some of the officially supported clients provide helpers to assist with bulk requests and reindexing:
- Go: Check out
esutil.BulkIndexer
- Perl: Check out
Search::Elasticsearch::Client::5_0::Bulk
andSearch::Elasticsearch::Client::5_0::Scroll
- Python: Check out
elasticsearch.helpers.*
- JavaScript: Check out
client.helpers.*
- .NET: Check out
BulkAllObservable
- PHP: Check out bulk indexing.
Submitting bulk requests with cURL
If you're providing text file input to curl
, you must use the --data-binary
flag instead of plain -d
.
The latter doesn't preserve newlines. For example:
$ cat requests
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
$ curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary "@requests"; echo
{"took":7, "errors": false, "items":[{"index":{"_index":"test","_id":"1","_version":1,"result":"created","forced_refresh":false}}]}
Optimistic concurrency control
Each index
and delete
action within a bulk API call may include the if_seq_no
and if_primary_term
parameters in their respective action and meta data lines.
The if_seq_no
and if_primary_term
parameters control how operations are run, based on the last modification to existing documents. See Optimistic concurrency control for more details.
Versioning
Each bulk item can include the version value using the version
field.
It automatically follows the behavior of the index or delete operation based on the _version
mapping.
It also support the version_type
.
Routing
Each bulk item can include the routing value using the routing
field.
It automatically follows the behavior of the index or delete operation based on the _routing
mapping.
NOTE: Data streams do not support custom routing unless they were created with the allow_custom_routing
setting enabled in the template.
Wait for active shards
When making bulk calls, you can set the wait_for_active_shards
parameter to require a minimum number of shard copies to be active before starting to process the bulk request.
Refresh
Control when the changes made by this request are visible to search.
NOTE: Only the shards that receive the bulk request will be affected by refresh.
Imagine a _bulk?refresh=wait_for
request with three documents in it that happen to be routed to different shards in an index with five shards.
The request will only wait for those three shards to refresh.
The other two shards that make up the index do not participate in the _bulk
request at all.
Path parameters
-
index
string Required The name of the data stream, index, or index alias to perform bulk actions on.
Query parameters
-
include_source_on_error
boolean True or false if to include the document source in the error message in case of parsing errors.
-
list_executed_pipelines
boolean If
true
, the response will include the ingest pipelines that were run for each index or create. -
pipeline
string The pipeline identifier to use to preprocess incoming documents. If the index has a default ingest pipeline specified, setting the value to
_none
turns off the default ingest pipeline for this request. If a final pipeline is configured, it will always run regardless of the value of this parameter. -
refresh
string If
true
, Elasticsearch refreshes the affected shards to make this operation visible to search. Ifwait_for
, wait for a refresh to make this operation visible to search. Iffalse
, do nothing with refreshes. Valid values:true
,false
,wait_for
.Values are
true
,false
, orwait_for
. -
routing
string A custom value that is used to route operations to a specific shard.
-
_source
boolean | string | array[string] Indicates whether to return the
_source
field (true
orfalse
) or contains a list of fields to return. -
_source_excludes
string | array[string] A comma-separated list of source fields to exclude from the response. You can also use this parameter to exclude fields from the subset specified in
_source_includes
query parameter. If the_source
parameter isfalse
, this parameter is ignored. -
_source_includes
string | array[string] A comma-separated list of source fields to include in the response. If this parameter is specified, only these source fields are returned. You can exclude fields from this subset using the
_source_excludes
query parameter. If the_source
parameter isfalse
, this parameter is ignored. -
timeout
string The period each action waits for the following operations: automatic index creation, dynamic mapping updates, and waiting for active shards. The default is
1m
(one minute), which guarantees Elasticsearch waits for at least the timeout before failing. The actual wait time could be longer, particularly when multiple waits occur. -
wait_for_active_shards
number | string The number of shard copies that must be active before proceeding with the operation. Set to
all
or any positive integer up to the total number of shards in the index (number_of_replicas+1
). The default is1
, which waits for each primary shard to be active. -
require_alias
boolean If
true
, the request's actions must target an index alias. -
require_data_stream
boolean If
true
, the request's actions must target a data stream (existing or to be created).
curl \
--request PUT 'http://api.example.com/{index}/_bulk' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{ \"index\" : { \"_index\" : \"test\", \"_id\" : \"1\" } }\n{ \"field1\" : \"value1\" }\n{ \"delete\" : { \"_index\" : \"test\", \"_id\" : \"2\" } }\n{ \"create\" : { \"_index\" : \"test\", \"_id\" : \"3\" } }\n{ \"field1\" : \"value3\" }\n{ \"update\" : {\"_id\" : \"1\", \"_index\" : \"test\"} }\n{ \"doc\" : {\"field2\" : \"value2\"} }"'
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }
{ "update" : {"_id" : "1", "_index" : "index1", "retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"} }
{ "update" : { "_id" : "0", "_index" : "index1", "retry_on_conflict" : 3} }
{ "script" : { "source": "ctx._source.counter += params.param1", "lang" : "painless", "params" : {"param1" : 1}}, "upsert" : {"counter" : 1}}
{ "update" : {"_id" : "2", "_index" : "index1", "retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"}, "doc_as_upsert" : true }
{ "update" : {"_id" : "3", "_index" : "index1", "_source" : true} }
{ "doc" : {"field" : "value"} }
{ "update" : {"_id" : "4", "_index" : "index1"} }
{ "doc" : {"field" : "value"}, "_source": true}
{ "update": {"_id": "5", "_index": "index1"} }
{ "doc": {"my_field": "foo"} }
{ "update": {"_id": "6", "_index": "index1"} }
{ "doc": {"my_field": "foo"} }
{ "create": {"_id": "7", "_index": "index1"} }
{ "my_field": "foo" }
{ "index" : { "_index" : "my_index", "_id" : "1", "dynamic_templates": {"work_location": "geo_point"}} }
{ "field" : "value1", "work_location": "41.12,-71.34", "raw_location": "41.12,-71.34"}
{ "create" : { "_index" : "my_index", "_id" : "2", "dynamic_templates": {"home_location": "geo_point"}} }
{ "field" : "value2", "home_location": "41.12,-71.34"}
{
"took": 30,
"errors": false,
"items": [
{
"index": {
"_index": "test",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"status": 201,
"_seq_no" : 0,
"_primary_term": 1
}
},
{
"delete": {
"_index": "test",
"_id": "2",
"_version": 1,
"result": "not_found",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"status": 404,
"_seq_no" : 1,
"_primary_term" : 2
}
},
{
"create": {
"_index": "test",
"_id": "3",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"status": 201,
"_seq_no" : 2,
"_primary_term" : 3
}
},
{
"update": {
"_index": "test",
"_id": "1",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"status": 200,
"_seq_no" : 3,
"_primary_term" : 4
}
}
]
}
{
"took": 486,
"errors": true,
"items": [
{
"update": {
"_index": "index1",
"_id": "5",
"status": 404,
"error": {
"type": "document_missing_exception",
"reason": "[5]: document missing",
"index_uuid": "aAsFqTI0Tc2W0LCWgPNrOA",
"shard": "0",
"index": "index1"
}
}
},
{
"update": {
"_index": "index1",
"_id": "6",
"status": 404,
"error": {
"type": "document_missing_exception",
"reason": "[6]: document missing",
"index_uuid": "aAsFqTI0Tc2W0LCWgPNrOA",
"shard": "0",
"index": "index1"
}
}
},
{
"create": {
"_index": "index1",
"_id": "7",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1,
"status": 201
}
}
]
}
{
"items": [
{
"update": {
"error": {
"type": "document_missing_exception",
"reason": "[5]: document missing",
"index_uuid": "aAsFqTI0Tc2W0LCWgPNrOA",
"shard": "0",
"index": "index1"
}
}
},
{
"update": {
"error": {
"type": "document_missing_exception",
"reason": "[6]: document missing",
"index_uuid": "aAsFqTI0Tc2W0LCWgPNrOA",
"shard": "0",
"index": "index1"
}
}
}
]
}
Bulk index or delete documents
Perform multiple index
, create
, delete
, and update
actions in a single request.
This reduces overhead and can greatly increase indexing speed.
If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:
- To use the
create
action, you must have thecreate_doc
,create
,index
, orwrite
index privilege. Data streams support only thecreate
action. - To use the
index
action, you must have thecreate
,index
, orwrite
index privilege. - To use the
delete
action, you must have thedelete
orwrite
index privilege. - To use the
update
action, you must have theindex
orwrite
index privilege. - To automatically create a data stream or index with a bulk API request, you must have the
auto_configure
,create_index
, ormanage
index privilege. - To make the result of a bulk operation visible to search using the
refresh
parameter, you must have themaintenance
ormanage
index privilege.
Automatic data stream creation requires a matching index template with data stream enabled.
The actions are specified in the request body using a newline delimited JSON (NDJSON) structure:
action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
....
action_and_meta_data\n
optional_source\n
The index
and create
actions expect a source on the next line and have the same semantics as the op_type
parameter in the standard index API.
A create
action fails if a document with the same ID already exists in the target
An index
action adds or replaces a document as necessary.
NOTE: Data streams support only the create
action.
To update or delete a document in a data stream, you must target the backing index containing the document.
An update
action expects that the partial doc, upsert, and script and its options are specified on the next line.
A delete
action does not expect a source on the next line and has the same semantics as the standard delete API.
NOTE: The final line of data must end with a newline character (\n
).
Each newline character may be preceded by a carriage return (\r
).
When sending NDJSON data to the _bulk
endpoint, use a Content-Type
header of application/json
or application/x-ndjson
.
Because this format uses literal newline characters (\n
) as delimiters, make sure that the JSON actions and sources are not pretty printed.
If you provide a target in the request path, it is used for any actions that don't explicitly specify an _index
argument.
A note on the format: the idea here is to make processing as fast as possible.
As some of the actions are redirected to other shards on other nodes, only action_meta_data
is parsed on the receiving node side.
Client libraries using this protocol should try and strive to do something similar on the client side, and reduce buffering as much as possible.
There is no "correct" number of actions to perform in a single bulk request. Experiment with different settings to find the optimal size for your particular workload. Note that Elasticsearch limits the maximum size of a HTTP request to 100mb by default so clients must ensure that no request exceeds this size. It is not possible to index a single document that exceeds the size limit, so you must pre-process any such documents into smaller pieces before sending them to Elasticsearch. For instance, split documents into pages or chapters before indexing them, or store raw binary data in a system outside Elasticsearch and replace the raw data with a link to the external system in the documents that you send to Elasticsearch.
Client suppport for bulk requests
Some of the officially supported clients provide helpers to assist with bulk requests and reindexing:
- Go: Check out
esutil.BulkIndexer
- Perl: Check out
Search::Elasticsearch::Client::5_0::Bulk
andSearch::Elasticsearch::Client::5_0::Scroll
- Python: Check out
elasticsearch.helpers.*
- JavaScript: Check out
client.helpers.*
- .NET: Check out
BulkAllObservable
- PHP: Check out bulk indexing.
Submitting bulk requests with cURL
If you're providing text file input to curl
, you must use the --data-binary
flag instead of plain -d
.
The latter doesn't preserve newlines. For example:
$ cat requests
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
$ curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary "@requests"; echo
{"took":7, "errors": false, "items":[{"index":{"_index":"test","_id":"1","_version":1,"result":"created","forced_refresh":false}}]}
Optimistic concurrency control
Each index
and delete
action within a bulk API call may include the if_seq_no
and if_primary_term
parameters in their respective action and meta data lines.
The if_seq_no
and if_primary_term
parameters control how operations are run, based on the last modification to existing documents. See Optimistic concurrency control for more details.
Versioning
Each bulk item can include the version value using the version
field.
It automatically follows the behavior of the index or delete operation based on the _version
mapping.
It also support the version_type
.
Routing
Each bulk item can include the routing value using the routing
field.
It automatically follows the behavior of the index or delete operation based on the _routing
mapping.
NOTE: Data streams do not support custom routing unless they were created with the allow_custom_routing
setting enabled in the template.
Wait for active shards
When making bulk calls, you can set the wait_for_active_shards
parameter to require a minimum number of shard copies to be active before starting to process the bulk request.
Refresh
Control when the changes made by this request are visible to search.
NOTE: Only the shards that receive the bulk request will be affected by refresh.
Imagine a _bulk?refresh=wait_for
request with three documents in it that happen to be routed to different shards in an index with five shards.
The request will only wait for those three shards to refresh.
The other two shards that make up the index do not participate in the _bulk
request at all.
Path parameters
-
index
string Required The name of the data stream, index, or index alias to perform bulk actions on.
Query parameters
-
include_source_on_error
boolean True or false if to include the document source in the error message in case of parsing errors.
-
list_executed_pipelines
boolean If
true
, the response will include the ingest pipelines that were run for each index or create. -
pipeline
string The pipeline identifier to use to preprocess incoming documents. If the index has a default ingest pipeline specified, setting the value to
_none
turns off the default ingest pipeline for this request. If a final pipeline is configured, it will always run regardless of the value of this parameter. -
refresh
string If
true
, Elasticsearch refreshes the affected shards to make this operation visible to search. Ifwait_for
, wait for a refresh to make this operation visible to search. Iffalse
, do nothing with refreshes. Valid values:true
,false
,wait_for
.Values are
true
,false
, orwait_for
. -
routing
string A custom value that is used to route operations to a specific shard.
-
_source
boolean | string | array[string] Indicates whether to return the
_source
field (true
orfalse
) or contains a list of fields to return. -
_source_excludes
string | array[string] A comma-separated list of source fields to exclude from the response. You can also use this parameter to exclude fields from the subset specified in
_source_includes
query parameter. If the_source
parameter isfalse
, this parameter is ignored. -
_source_includes
string | array[string] A comma-separated list of source fields to include in the response. If this parameter is specified, only these source fields are returned. You can exclude fields from this subset using the
_source_excludes
query parameter. If the_source
parameter isfalse
, this parameter is ignored. -
timeout
string The period each action waits for the following operations: automatic index creation, dynamic mapping updates, and waiting for active shards. The default is
1m
(one minute), which guarantees Elasticsearch waits for at least the timeout before failing. The actual wait time could be longer, particularly when multiple waits occur. -
wait_for_active_shards
number | string The number of shard copies that must be active before proceeding with the operation. Set to
all
or any positive integer up to the total number of shards in the index (number_of_replicas+1
). The default is1
, which waits for each primary shard to be active. -
require_alias
boolean If
true
, the request's actions must target an index alias. -
require_data_stream
boolean If
true
, the request's actions must target a data stream (existing or to be created).
curl \
--request POST 'http://api.example.com/{index}/_bulk' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{ \"index\" : { \"_index\" : \"test\", \"_id\" : \"1\" } }\n{ \"field1\" : \"value1\" }\n{ \"delete\" : { \"_index\" : \"test\", \"_id\" : \"2\" } }\n{ \"create\" : { \"_index\" : \"test\", \"_id\" : \"3\" } }\n{ \"field1\" : \"value3\" }\n{ \"update\" : {\"_id\" : \"1\", \"_index\" : \"test\"} }\n{ \"doc\" : {\"field2\" : \"value2\"} }"'
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }
{ "update" : {"_id" : "1", "_index" : "index1", "retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"} }
{ "update" : { "_id" : "0", "_index" : "index1", "retry_on_conflict" : 3} }
{ "script" : { "source": "ctx._source.counter += params.param1", "lang" : "painless", "params" : {"param1" : 1}}, "upsert" : {"counter" : 1}}
{ "update" : {"_id" : "2", "_index" : "index1", "retry_on_conflict" : 3} }
{ "doc" : {"field" : "value"}, "doc_as_upsert" : true }
{ "update" : {"_id" : "3", "_index" : "index1", "_source" : true} }
{ "doc" : {"field" : "value"} }
{ "update" : {"_id" : "4", "_index" : "index1"} }
{ "doc" : {"field" : "value"}, "_source": true}
{ "update": {"_id": "5", "_index": "index1"} }
{ "doc": {"my_field": "foo"} }
{ "update": {"_id": "6", "_index": "index1"} }
{ "doc": {"my_field": "foo"} }
{ "create": {"_id": "7", "_index": "index1"} }
{ "my_field": "foo" }
{ "index" : { "_index" : "my_index", "_id" : "1", "dynamic_templates": {"work_location": "geo_point"}} }
{ "field" : "value1", "work_location": "41.12,-71.34", "raw_location": "41.12,-71.34"}
{ "create" : { "_index" : "my_index", "_id" : "2", "dynamic_templates": {"home_location": "geo_point"}} }
{ "field" : "value2", "home_location": "41.12,-71.34"}
{
"took": 30,
"errors": false,
"items": [
{
"index": {
"_index": "test",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"status": 201,
"_seq_no" : 0,
"_primary_term": 1
}
},
{
"delete": {
"_index": "test",
"_id": "2",
"_version": 1,
"result": "not_found",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"status": 404,
"_seq_no" : 1,
"_primary_term" : 2
}
},
{
"create": {
"_index": "test",
"_id": "3",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"status": 201,
"_seq_no" : 2,
"_primary_term" : 3
}
},
{
"update": {
"_index": "test",
"_id": "1",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"status": 200,
"_seq_no" : 3,
"_primary_term" : 4
}
}
]
}
{
"took": 486,
"errors": true,
"items": [
{
"update": {
"_index": "index1",
"_id": "5",
"status": 404,
"error": {
"type": "document_missing_exception",
"reason": "[5]: document missing",
"index_uuid": "aAsFqTI0Tc2W0LCWgPNrOA",
"shard": "0",
"index": "index1"
}
}
},
{
"update": {
"_index": "index1",
"_id": "6",
"status": 404,
"error": {
"type": "document_missing_exception",
"reason": "[6]: document missing",
"index_uuid": "aAsFqTI0Tc2W0LCWgPNrOA",
"shard": "0",
"index": "index1"
}
}
},
{
"create": {
"_index": "index1",
"_id": "7",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1,
"status": 201
}
}
]
}
{
"items": [
{
"update": {
"error": {
"type": "document_missing_exception",
"reason": "[5]: document missing",
"index_uuid": "aAsFqTI0Tc2W0LCWgPNrOA",
"shard": "0",
"index": "index1"
}
}
},
{
"update": {
"error": {
"type": "document_missing_exception",
"reason": "[6]: document missing",
"index_uuid": "aAsFqTI0Tc2W0LCWgPNrOA",
"shard": "0",
"index": "index1"
}
}
}
]
}
Create a new document in the index
Added in 5.0.0
You can index a new JSON document with the /<target>/_doc/
or /<target>/_create/<_id>
APIs
Using _create
guarantees that the document is indexed only if it does not already exist.
It returns a 409 response when a document with a same ID already exists in the index.
To update an existing document, you must use the /<target>/_doc/
API.
If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:
- To add a document using the
PUT /<target>/_create/<_id>
orPOST /<target>/_create/<_id>
request formats, you must have thecreate_doc
,create
,index
, orwrite
index privilege. - To automatically create a data stream or index with this API request, you must have the
auto_configure
,create_index
, ormanage
index privilege.
Automatic data stream creation requires a matching index template with data stream enabled.
Automatically create data streams and indices
If the request's target doesn't exist and matches an index template with a data_stream
definition, the index operation automatically creates the data stream.
If the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.
NOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.
If no mapping exists, the index operation creates a dynamic mapping. By default, new fields and objects are automatically added to the mapping if needed.
Automatic index creation is controlled by the action.auto_create_index
setting.
If it is true
, any index can be created automatically.
You can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to false
to turn off automatic index creation entirely.
Specify a comma-separated list of patterns you want to allow or prefix each pattern with +
or -
to indicate whether it should be allowed or blocked.
When a list is specified, the default behaviour is to disallow.
NOTE: The action.auto_create_index
setting affects the automatic creation of indices only.
It does not affect the creation of data streams.
Routing
By default, shard placement — or routing — is controlled by using a hash of the document's ID value.
For more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the routing
parameter.
When setting up explicit mapping, you can also use the _routing
field to direct the index operation to extract the routing value from the document itself.
This does come at the (very minimal) cost of an additional document parsing pass.
If the _routing
mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.
NOTE: Data streams do not support custom routing unless they were created with the allow_custom_routing
setting enabled in the template.
Distributed
The index operation is directed to the primary shard based on its route and performed on the actual node containing this shard. After the primary shard completes the operation, if needed, the update is distributed to applicable replicas.
Active shards
To improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.
If the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.
By default, write operations only wait for the primary shards to be active before proceeding (that is to say wait_for_active_shards
is 1
).
This default can be overridden in the index settings dynamically by setting index.write.wait_for_active_shards
.
To alter this behavior per operation, use the wait_for_active_shards request
parameter.
Valid values are all or any positive integer up to the total number of configured copies per shard in the index (which is number_of_replicas
+1).
Specifying a negative value or a number greater than the number of shard copies will throw an error.
For example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).
If you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.
This means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.
If wait_for_active_shards
is set on the request to 3
(and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.
This requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.
However, if you set wait_for_active_shards
to all
(or to 4
, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.
The operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.
It is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.
After the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.
The _shards
section of the API response reveals the number of shard copies on which replication succeeded and failed.
Path parameters
-
index
string Required The name of the data stream or index to target. If the target doesn't exist and matches the name or wildcard (
*
) pattern of an index template with adata_stream
definition, this request creates the data stream. If the target doesn't exist and doesn’t match a data stream template, this request creates the index. -
id
string Required A unique identifier for the document. To automatically generate a document ID, use the
POST /<target>/_doc/
request format.
Query parameters
-
if_primary_term
number Only perform the operation if the document has this primary term.
-
if_seq_no
number Only perform the operation if the document has this sequence number.
-
include_source_on_error
boolean True or false if to include the document source in the error message in case of parsing errors.
-
op_type
string Set to
create
to only index the document if it does not already exist (put if absent). If a document with the specified_id
already exists, the indexing operation will fail. The behavior is the same as using the<index>/_create
endpoint. If a document ID is specified, this paramater defaults toindex
. Otherwise, it defaults tocreate
. If the request targets a data stream, anop_type
ofcreate
is required.Supported values include:
index
: Overwrite any documents that already exist.create
: Only index documents that do not already exist.
Values are
index
orcreate
. -
pipeline
string The ID of the pipeline to use to preprocess incoming documents. If the index has a default ingest pipeline specified, setting the value to
_none
turns off the default ingest pipeline for this request. If a final pipeline is configured, it will always run regardless of the value of this parameter. -
refresh
string If
true
, Elasticsearch refreshes the affected shards to make this operation visible to search. Ifwait_for
, it waits for a refresh to make this operation visible to search. Iffalse
, it does nothing with refreshes.Values are
true
,false
, orwait_for
. -
require_alias
boolean If
true
, the destination must be an index alias. -
require_data_stream
boolean If
true
, the request's actions must target a data stream (existing or to be created). -
routing
string A custom value that is used to route operations to a specific shard.
-
timeout
string The period the request waits for the following operations: automatic index creation, dynamic mapping updates, waiting for active shards. Elasticsearch waits for at least the specified timeout period before failing. The actual wait time could be longer, particularly when multiple waits occur.
This parameter is useful for situations where the primary shard assigned to perform the operation might not be available when the operation runs. Some reasons for this might be that the primary shard is currently recovering from a gateway or undergoing relocation. By default, the operation will wait on the primary shard to become available for at least 1 minute before failing and responding with an error. The actual wait time could be longer, particularly when multiple waits occur.
-
version
number The explicit version number for concurrency control. It must be a non-negative long number.
-
version_type
string The version type.
Supported values include:
internal
: Use internal versioning that starts at 1 and increments with each update or delete.external
: Only index the document if the specified version is strictly higher than the version of the stored document or if there is no existing document.external_gte
: Only index the document if the specified version is equal or higher than the version of the stored document or if there is no existing document. NOTE: Theexternal_gte
version type is meant for special use cases and should be used with care. If used incorrectly, it can result in loss of data.force
: This option is deprecated because it can cause primary and replica shards to diverge.
Values are
internal
,external
,external_gte
, orforce
. -
wait_for_active_shards
number | string The number of shard copies that must be active before proceeding with the operation. You can set it to
all
or any positive integer up to the total number of shards in the index (number_of_replicas+1
). The default value of1
means it waits for each primary shard to be active.
curl \
--request POST 'http://api.example.com/{index}/_create/{id}' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"@timestamp\": \"2099-11-15T13:12:00\",\n \"message\": \"GET /search HTTP/1.1 200 1070000\",\n \"user\": {\n \"id\": \"kimchy\"\n }\n}"'
{
"@timestamp": "2099-11-15T13:12:00",
"message": "GET /search HTTP/1.1 200 1070000",
"user": {
"id": "kimchy"
}
}
Get a document by its ID
Get a document and its source or stored fields from an index.
By default, this API is realtime and is not affected by the refresh rate of the index (when data will become visible for search).
In the case where stored fields are requested with the stored_fields
parameter and the document has been updated but is not yet refreshed, the API will have to parse and analyze the source to extract the stored fields.
To turn off realtime behavior, set the realtime
parameter to false.
Source filtering
By default, the API returns the contents of the _source
field unless you have used the stored_fields
parameter or the _source
field is turned off.
You can turn off _source
retrieval by using the _source
parameter:
GET my-index-000001/_doc/0?_source=false
If you only need one or two fields from the _source
, use the _source_includes
or _source_excludes
parameters to include or filter out particular fields.
This can be helpful with large documents where partial retrieval can save on network overhead
Both parameters take a comma separated list of fields or wildcard expressions.
For example:
GET my-index-000001/_doc/0?_source_includes=*.id&_source_excludes=entities
If you only want to specify includes, you can use a shorter notation:
GET my-index-000001/_doc/0?_source=*.id
Routing
If routing is used during indexing, the routing value also needs to be specified to retrieve a document. For example:
GET my-index-000001/_doc/2?routing=user1
This request gets the document with ID 2, but it is routed based on the user. The document is not fetched if the correct routing is not specified.
Distributed
The GET operation is hashed into a specific shard ID. It is then redirected to one of the replicas within that shard ID and returns the result. The replicas are the primary shard and its replicas within that shard ID group. This means that the more replicas you have, the better your GET scaling will be.
Versioning support
You can use the version
parameter to retrieve the document only if its current version is equal to the specified one.
Internally, Elasticsearch has marked the old document as deleted and added an entirely new document. The old version of the document doesn't disappear immediately, although you won't be able to access it. Elasticsearch cleans up deleted documents in the background as you continue to index more data.
Query parameters
-
preference
string The node or shard the operation should be performed on. By default, the operation is randomized between the shard replicas.
If it is set to
_local
, the operation will prefer to be run on a local allocated shard when possible. If it is set to a custom value, the value is used to guarantee that the same shards will be used for the same custom value. This can help with "jumping values" when hitting different shards in different refresh states. A sample value can be something like the web session ID or the user name. -
realtime
boolean If
true
, the request is real-time as opposed to near-real-time. -
refresh
boolean If
true
, the request refreshes the relevant shards before retrieving the document. Setting it totrue
should be done after careful thought and verification that this does not cause a heavy load on the system (and slow down indexing). -
routing
string A custom value used to route operations to a specific shard.
-
_source
boolean | string | array[string] Indicates whether to return the
_source
field (true
orfalse
) or lists the fields to return. -
_source_excludes
string | array[string] A comma-separated list of source fields to exclude from the response. You can also use this parameter to exclude fields from the subset specified in
_source_includes
query parameter. If the_source
parameter isfalse
, this parameter is ignored. -
_source_includes
string | array[string] A comma-separated list of source fields to include in the response. If this parameter is specified, only these source fields are returned. You can exclude fields from this subset using the
_source_excludes
query parameter. If the_source
parameter isfalse
, this parameter is ignored. -
stored_fields
string | array[string] A comma-separated list of stored fields to return as part of a hit. If no fields are specified, no stored fields are included in the response. If this field is specified, the
_source
parameter defaults tofalse
. Only leaf fields can be retrieved with thestored_field
option. Object fields can't be returned;if specified, the request fails. -
version
number The version number for concurrency control. It must match the current version of the document for the request to succeed.
-
version_type
string The version type.
Supported values include:
internal
: Use internal versioning that starts at 1 and increments with each update or delete.external
: Only index the document if the specified version is strictly higher than the version of the stored document or if there is no existing document.external_gte
: Only index the document if the specified version is equal or higher than the version of the stored document or if there is no existing document. NOTE: Theexternal_gte
version type is meant for special use cases and should be used with care. If used incorrectly, it can result in loss of data.force
: This option is deprecated because it can cause primary and replica shards to diverge.
Values are
internal
,external
,external_gte
, orforce
.
curl \
--request GET 'http://api.example.com/{index}/_doc/{id}' \
--header "Authorization: $API_KEY"
{
"_index": "my-index-000001",
"_id": "0",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"found": true,
"_source": {
"@timestamp": "2099-11-15T14:12:12",
"http": {
"request": {
"method": "get"
},
"response": {
"status_code": 200,
"bytes": 1070000
},
"version": "1.1"
},
"source": {
"ip": "127.0.0.1"
},
"message": "GET /search HTTP/1.1 200 1070000",
"user": {
"id": "kimchy"
}
}
}
{
"_index": "my-index-000001",
"_id": "1",
"_version": 1,
"_seq_no" : 22,
"_primary_term" : 1,
"found": true,
"fields": {
"tags": [
"production"
]
}
}
{
"_index": "my-index-000001",
"_id": "2",
"_version": 1,
"_seq_no" : 13,
"_primary_term" : 1,
"_routing": "user1",
"found": true,
"fields": {
"tags": [
"env2"
]
}
}
Delete a document
Remove a JSON document from the specified index.
NOTE: You cannot send deletion requests directly to a data stream. To delete a document in a data stream, you must target the backing index containing the document.
Optimistic concurrency control
Delete operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the if_seq_no
and if_primary_term
parameters.
If a mismatch is detected, the operation will result in a VersionConflictException
and a status code of 409
.
Versioning
Each document indexed is versioned.
When deleting a document, the version can be specified to make sure the relevant document you are trying to delete is actually being deleted and it has not changed in the meantime.
Every write operation run on a document, deletes included, causes its version to be incremented.
The version number of a deleted document remains available for a short time after deletion to allow for control of concurrent operations.
The length of time for which a deleted document's version remains available is determined by the index.gc_deletes
index setting.
Routing
If routing is used during indexing, the routing value also needs to be specified to delete a document.
If the _routing
mapping is set to required
and no routing value is specified, the delete API throws a RoutingMissingException
and rejects the request.
For example:
DELETE /my-index-000001/_doc/1?routing=shard-1
This request deletes the document with ID 1, but it is routed based on the user. The document is not deleted if the correct routing is not specified.
Distributed
The delete operation gets hashed into a specific shard ID. It then gets redirected into the primary shard within that ID group and replicated (if needed) to shard replicas within that ID group.
Query parameters
-
if_primary_term
number Only perform the operation if the document has this primary term.
-
if_seq_no
number Only perform the operation if the document has this sequence number.
-
refresh
string If
true
, Elasticsearch refreshes the affected shards to make this operation visible to search. Ifwait_for
, it waits for a refresh to make this operation visible to search. Iffalse
, it does nothing with refreshes.Values are
true
,false
, orwait_for
. -
routing
string A custom value used to route operations to a specific shard.
-
timeout
string The period to wait for active shards.
This parameter is useful for situations where the primary shard assigned to perform the delete operation might not be available when the delete operation runs. Some reasons for this might be that the primary shard is currently recovering from a store or undergoing relocation. By default, the delete operation will wait on the primary shard to become available for up to 1 minute before failing and responding with an error.
-
version
number An explicit version number for concurrency control. It must match the current version of the document for the request to succeed.
-
version_type
string The version type.
Supported values include:
internal
: Use internal versioning that starts at 1 and increments with each update or delete.external
: Only index the document if the specified version is strictly higher than the version of the stored document or if there is no existing document.external_gte
: Only index the document if the specified version is equal or higher than the version of the stored document or if there is no existing document. NOTE: Theexternal_gte
version type is meant for special use cases and should be used with care. If used incorrectly, it can result in loss of data.force
: This option is deprecated because it can cause primary and replica shards to diverge.
Values are
internal
,external
,external_gte
, orforce
. -
wait_for_active_shards
number | string The minimum number of shard copies that must be active before proceeding with the operation. You can set it to
all
or any positive integer up to the total number of shards in the index (number_of_replicas+1
). The default value of1
means it waits for each primary shard to be active.
curl \
--request DELETE 'http://api.example.com/{index}/_doc/{id}' \
--header "Authorization: $API_KEY"
{
"_shards": {
"total": 2,
"failed": 0,
"successful": 2
},
"_index": "my-index-000001",
"_id": "1",
"_version": 2,
"_primary_term": 1,
"_seq_no": 5,
"result": "deleted"
}
Get a document's source
Get the source of a document. For example:
GET my-index-000001/_source/1
You can use the source filtering parameters to control which parts of the _source
are returned:
GET my-index-000001/_source/1/?_source_includes=*.id&_source_excludes=entities
Query parameters
-
preference
string The node or shard the operation should be performed on. By default, the operation is randomized between the shard replicas.
-
realtime
boolean If
true
, the request is real-time as opposed to near-real-time. -
refresh
boolean If
true
, the request refreshes the relevant shards before retrieving the document. Setting it totrue
should be done after careful thought and verification that this does not cause a heavy load on the system (and slow down indexing). -
routing
string A custom value used to route operations to a specific shard.
-
_source
boolean | string | array[string] Indicates whether to return the
_source
field (true
orfalse
) or lists the fields to return. -
_source_excludes
string | array[string] A comma-separated list of source fields to exclude in the response.
-
_source_includes
string | array[string] A comma-separated list of source fields to include in the response.
-
stored_fields
string | array[string] A comma-separated list of stored fields to return as part of a hit.
-
version
number The version number for concurrency control. It must match the current version of the document for the request to succeed.
-
version_type
string The version type.
Supported values include:
internal
: Use internal versioning that starts at 1 and increments with each update or delete.external
: Only index the document if the specified version is strictly higher than the version of the stored document or if there is no existing document.external_gte
: Only index the document if the specified version is equal or higher than the version of the stored document or if there is no existing document. NOTE: Theexternal_gte
version type is meant for special use cases and should be used with care. If used incorrectly, it can result in loss of data.force
: This option is deprecated because it can cause primary and replica shards to diverge.
Values are
internal
,external
,external_gte
, orforce
.
curl \
--request GET 'http://api.example.com/{index}/_source/{id}' \
--header "Authorization: $API_KEY"
Get multiple documents
Added in 1.3.0
Get multiple JSON documents by ID from one or more indices. If you specify an index in the request URI, you only need to specify the document IDs in the request body. To ensure fast responses, this multi get (mget) API responds with partial results if one or more shards fail.
Filter source fields
By default, the _source
field is returned for every document (if stored).
Use the _source
and _source_include
or source_exclude
attributes to filter what fields are returned for a particular document.
You can include the _source
, _source_includes
, and _source_excludes
query parameters in the request URI to specify the defaults to use when there are no per-document instructions.
Get stored fields
Use the stored_fields
attribute to specify the set of stored fields you want to retrieve.
Any requested fields that are not stored are ignored.
You can include the stored_fields
query parameter in the request URI to specify the defaults to use when there are no per-document instructions.
Query parameters
-
preference
string Specifies the node or shard the operation should be performed on. Random by default.
-
realtime
boolean If
true
, the request is real-time as opposed to near-real-time. -
refresh
boolean If
true
, the request refreshes relevant shards before retrieving documents. -
routing
string Custom value used to route operations to a specific shard.
-
_source
boolean | string | array[string] True or false to return the
_source
field or not, or a list of fields to return. -
_source_excludes
string | array[string] A comma-separated list of source fields to exclude from the response. You can also use this parameter to exclude fields from the subset specified in
_source_includes
query parameter. -
_source_includes
string | array[string] A comma-separated list of source fields to include in the response. If this parameter is specified, only these source fields are returned. You can exclude fields from this subset using the
_source_excludes
query parameter. If the_source
parameter isfalse
, this parameter is ignored. -
stored_fields
string | array[string] If
true
, retrieves the document fields stored in the index rather than the document_source
.
curl \
--request POST 'http://api.example.com/_mget' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"docs\": [\n {\n \"_id\": \"1\"\n },\n {\n \"_id\": \"2\"\n }\n ]\n}"'
{
"docs": [
{
"_id": "1"
},
{
"_id": "2"
}
]
}
{
"docs": [
{
"_index": "test",
"_id": "1",
"_source": false
},
{
"_index": "test",
"_id": "2",
"_source": [ "field3", "field4" ]
},
{
"_index": "test",
"_id": "3",
"_source": {
"include": [ "user" ],
"exclude": [ "user.location" ]
}
}
]
}
{
"docs": [
{
"_index": "test",
"_id": "1",
"stored_fields": [ "field1", "field2" ]
},
{
"_index": "test",
"_id": "2",
"stored_fields": [ "field3", "field4" ]
}
]
}
{
"docs": [
{
"_index": "test",
"_id": "1",
"routing": "key2"
},
{
"_index": "test",
"_id": "2"
}
]
}
Get multiple documents
Added in 1.3.0
Get multiple JSON documents by ID from one or more indices. If you specify an index in the request URI, you only need to specify the document IDs in the request body. To ensure fast responses, this multi get (mget) API responds with partial results if one or more shards fail.
Filter source fields
By default, the _source
field is returned for every document (if stored).
Use the _source
and _source_include
or source_exclude
attributes to filter what fields are returned for a particular document.
You can include the _source
, _source_includes
, and _source_excludes
query parameters in the request URI to specify the defaults to use when there are no per-document instructions.
Get stored fields
Use the stored_fields
attribute to specify the set of stored fields you want to retrieve.
Any requested fields that are not stored are ignored.
You can include the stored_fields
query parameter in the request URI to specify the defaults to use when there are no per-document instructions.
Path parameters
-
index
string Required Name of the index to retrieve documents from when
ids
are specified, or when a document in thedocs
array does not specify an index.
Query parameters
-
preference
string Specifies the node or shard the operation should be performed on. Random by default.
-
realtime
boolean If
true
, the request is real-time as opposed to near-real-time. -
refresh
boolean If
true
, the request refreshes relevant shards before retrieving documents. -
routing
string Custom value used to route operations to a specific shard.
-
_source
boolean | string | array[string] True or false to return the
_source
field or not, or a list of fields to return. -
_source_excludes
string | array[string] A comma-separated list of source fields to exclude from the response. You can also use this parameter to exclude fields from the subset specified in
_source_includes
query parameter. -
_source_includes
string | array[string] A comma-separated list of source fields to include in the response. If this parameter is specified, only these source fields are returned. You can exclude fields from this subset using the
_source_excludes
query parameter. If the_source
parameter isfalse
, this parameter is ignored. -
stored_fields
string | array[string] If
true
, retrieves the document fields stored in the index rather than the document_source
.
curl \
--request GET 'http://api.example.com/{index}/_mget' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"docs\": [\n {\n \"_id\": \"1\"\n },\n {\n \"_id\": \"2\"\n }\n ]\n}"'
{
"docs": [
{
"_id": "1"
},
{
"_id": "2"
}
]
}
{
"docs": [
{
"_index": "test",
"_id": "1",
"_source": false
},
{
"_index": "test",
"_id": "2",
"_source": [ "field3", "field4" ]
},
{
"_index": "test",
"_id": "3",
"_source": {
"include": [ "user" ],
"exclude": [ "user.location" ]
}
}
]
}
{
"docs": [
{
"_index": "test",
"_id": "1",
"stored_fields": [ "field1", "field2" ]
},
{
"_index": "test",
"_id": "2",
"stored_fields": [ "field3", "field4" ]
}
]
}
{
"docs": [
{
"_index": "test",
"_id": "1",
"routing": "key2"
},
{
"_index": "test",
"_id": "2"
}
]
}
Get multiple documents
Added in 1.3.0
Get multiple JSON documents by ID from one or more indices. If you specify an index in the request URI, you only need to specify the document IDs in the request body. To ensure fast responses, this multi get (mget) API responds with partial results if one or more shards fail.
Filter source fields
By default, the _source
field is returned for every document (if stored).
Use the _source
and _source_include
or source_exclude
attributes to filter what fields are returned for a particular document.
You can include the _source
, _source_includes
, and _source_excludes
query parameters in the request URI to specify the defaults to use when there are no per-document instructions.
Get stored fields
Use the stored_fields
attribute to specify the set of stored fields you want to retrieve.
Any requested fields that are not stored are ignored.
You can include the stored_fields
query parameter in the request URI to specify the defaults to use when there are no per-document instructions.
Path parameters
-
index
string Required Name of the index to retrieve documents from when
ids
are specified, or when a document in thedocs
array does not specify an index.
Query parameters
-
preference
string Specifies the node or shard the operation should be performed on. Random by default.
-
realtime
boolean If
true
, the request is real-time as opposed to near-real-time. -
refresh
boolean If
true
, the request refreshes relevant shards before retrieving documents. -
routing
string Custom value used to route operations to a specific shard.
-
_source
boolean | string | array[string] True or false to return the
_source
field or not, or a list of fields to return. -
_source_excludes
string | array[string] A comma-separated list of source fields to exclude from the response. You can also use this parameter to exclude fields from the subset specified in
_source_includes
query parameter. -
_source_includes
string | array[string] A comma-separated list of source fields to include in the response. If this parameter is specified, only these source fields are returned. You can exclude fields from this subset using the
_source_excludes
query parameter. If the_source
parameter isfalse
, this parameter is ignored. -
stored_fields
string | array[string] If
true
, retrieves the document fields stored in the index rather than the document_source
.
curl \
--request POST 'http://api.example.com/{index}/_mget' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"docs\": [\n {\n \"_id\": \"1\"\n },\n {\n \"_id\": \"2\"\n }\n ]\n}"'
{
"docs": [
{
"_id": "1"
},
{
"_id": "2"
}
]
}
{
"docs": [
{
"_index": "test",
"_id": "1",
"_source": false
},
{
"_index": "test",
"_id": "2",
"_source": [ "field3", "field4" ]
},
{
"_index": "test",
"_id": "3",
"_source": {
"include": [ "user" ],
"exclude": [ "user.location" ]
}
}
]
}
{
"docs": [
{
"_index": "test",
"_id": "1",
"stored_fields": [ "field1", "field2" ]
},
{
"_index": "test",
"_id": "2",
"stored_fields": [ "field3", "field4" ]
}
]
}
{
"docs": [
{
"_index": "test",
"_id": "1",
"routing": "key2"
},
{
"_index": "test",
"_id": "2"
}
]
}
Get term vector information
Get information and statistics about terms in the fields of a particular document.
You can retrieve term vectors for documents stored in the index or for artificial documents passed in the body of the request.
You can specify the fields you are interested in through the fields
parameter or by adding the fields to the request body.
For example:
GET /my-index-000001/_termvectors/1?fields=message
Fields can be specified using wildcards, similar to the multi match query.
Term vectors are real-time by default, not near real-time.
This can be changed by setting realtime
parameter to false
.
You can request three types of values: term information, term statistics, and field statistics. By default, all term information and field statistics are returned for all fields but term statistics are excluded.
Term information
- term frequency in the field (always returned)
- term positions (
positions: true
) - start and end offsets (
offsets: true
) - term payloads (
payloads: true
), as base64 encoded bytes
If the requested information wasn't stored in the index, it will be computed on the fly if possible. Additionally, term vectors could be computed for documents not even existing in the index, but instead provided by the user.
Start and end offsets assume UTF-16 encoding is being used. If you want to use these offsets in order to get the original text that produced this token, you should make sure that the string you are taking a sub-string of is also encoded using UTF-16.
Behaviour
The term and field statistics are not accurate.
Deleted documents are not taken into account.
The information is only retrieved for the shard the requested document resides in.
The term and field statistics are therefore only useful as relative measures whereas the absolute numbers have no meaning in this context.
By default, when requesting term vectors of artificial documents, a shard to get the statistics from is randomly selected.
Use routing
only to hit a particular shard.
Path parameters
-
index
string Required The name of the index that contains the document.
Query parameters
-
fields
string | array[string] A comma-separated list or wildcard expressions of fields to include in the statistics. It is used as the default list unless a specific field list is provided in the
completion_fields
orfielddata_fields
parameters. -
field_statistics
boolean If
true
, the response includes:- The document count (how many documents contain this field).
- The sum of document frequencies (the sum of document frequencies for all terms in this field).
- The sum of total term frequencies (the sum of total term frequencies of each term in this field).
-
offsets
boolean If
true
, the response includes term offsets. -
payloads
boolean If
true
, the response includes term payloads. -
positions
boolean If
true
, the response includes term positions. -
preference
string The node or shard the operation should be performed on. It is random by default.
-
realtime
boolean If true, the request is real-time as opposed to near-real-time.
-
routing
string A custom value that is used to route operations to a specific shard.
-
term_statistics
boolean If
true
, the response includes:- The total term frequency (how often a term occurs in all documents).
- The document frequency (the number of documents containing the current term).
By default these values are not returned since term statistics can have a serious performance impact.
-
version
number If
true
, returns the document version as part of a hit. -
version_type
string The version type.
Supported values include:
internal
: Use internal versioning that starts at 1 and increments with each update or delete.external
: Only index the document if the specified version is strictly higher than the version of the stored document or if there is no existing document.external_gte
: Only index the document if the specified version is equal or higher than the version of the stored document or if there is no existing document. NOTE: Theexternal_gte
version type is meant for special use cases and should be used with care. If used incorrectly, it can result in loss of data.force
: This option is deprecated because it can cause primary and replica shards to diverge.
Values are
internal
,external
,external_gte
, orforce
.
Body
-
doc
object An artificial document (a document not present in the index) for which you want to retrieve term vectors.
-
filter
object -
per_field_analyzer
object Override the default per-field analyzer. This is useful in order to generate term vectors in any fashion, especially when using artificial documents. When providing an analyzer for a field that already stores term vectors, the term vectors will be regenerated.
-
fields
string | array[string] -
field_statistics
boolean If
true
, the response includes:- The document count (how many documents contain this field).
- The sum of document frequencies (the sum of document frequencies for all terms in this field).
- The sum of total term frequencies (the sum of total term frequencies of each term in this field).
-
offsets
boolean If
true
, the response includes term offsets. -
payloads
boolean If
true
, the response includes term payloads. -
positions
boolean If
true
, the response includes term positions. -
term_statistics
boolean If
true
, the response includes:- The total term frequency (how often a term occurs in all documents).
- The document frequency (the number of documents containing the current term).
By default these values are not returned since term statistics can have a serious performance impact.
-
routing
string -
version
number -
version_type
string Values are
internal
,external
,external_gte
, orforce
.
curl \
--request GET 'http://api.example.com/{index}/_termvectors' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"fields\" : [\"text\"],\n \"offsets\" : true,\n \"payloads\" : true,\n \"positions\" : true,\n \"term_statistics\" : true,\n \"field_statistics\" : true\n}"'
{
"fields" : ["text"],
"offsets" : true,
"payloads" : true,
"positions" : true,
"term_statistics" : true,
"field_statistics" : true
}
{
"doc" : {
"fullname" : "John Doe",
"text" : "test test test"
},
"fields": ["fullname"],
"per_field_analyzer" : {
"fullname": "keyword"
}
}
{
"doc": {
"plot": "When wealthy industrialist Tony Stark is forced to build an armored suit after a life-threatening incident, he ultimately decides to use its technology to fight against evil."
},
"term_statistics": true,
"field_statistics": true,
"positions": false,
"offsets": false,
"filter": {
"max_num_terms": 3,
"min_term_freq": 1,
"min_doc_freq": 1
}
}
{
"fields" : ["text", "some_field_without_term_vectors"],
"offsets" : true,
"positions" : true,
"term_statistics" : true,
"field_statistics" : true
}
{
"doc" : {
"fullname" : "John Doe",
"text" : "test test test"
}
}
{
"_index": "my-index-000001",
"_id": "1",
"_version": 1,
"found": true,
"took": 6,
"term_vectors": {
"text": {
"field_statistics": {
"sum_doc_freq": 4,
"doc_count": 2,
"sum_ttf": 6
},
"terms": {
"test": {
"doc_freq": 2,
"ttf": 4,
"term_freq": 3,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 4,
"payload": "d29yZA=="
},
{
"position": 1,
"start_offset": 5,
"end_offset": 9,
"payload": "d29yZA=="
},
{
"position": 2,
"start_offset": 10,
"end_offset": 14,
"payload": "d29yZA=="
}
]
}
}
}
}
}
{
"_index": "my-index-000001",
"_version": 0,
"found": true,
"took": 6,
"term_vectors": {
"fullname": {
"field_statistics": {
"sum_doc_freq": 2,
"doc_count": 4,
"sum_ttf": 4
},
"terms": {
"John Doe": {
"term_freq": 1,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 8
}
]
}
}
}
}
}
{
"_index": "imdb",
"_version": 0,
"found": true,
"term_vectors": {
"plot": {
"field_statistics": {
"sum_doc_freq": 3384269,
"doc_count": 176214,
"sum_ttf": 3753460
},
"terms": {
"armored": {
"doc_freq": 27,
"ttf": 27,
"term_freq": 1,
"score": 9.74725
},
"industrialist": {
"doc_freq": 88,
"ttf": 88,
"term_freq": 1,
"score": 8.590818
},
"stark": {
"doc_freq": 44,
"ttf": 47,
"term_freq": 1,
"score": 9.272792
}
}
}
}
}
Throttle an update by query operation
Added in 6.5.0
Change the number of requests per second for a particular update by query operation. Rethrottling that speeds up the query takes effect immediately but rethrotting that slows down the query takes effect after completing the current batch to prevent scroll timeouts.
Path parameters
-
task_id
string Required The ID for the task.
Query parameters
-
requests_per_second
number The throttle for this request in sub-requests per second. To turn off throttling, set it to
-1
.
curl \
--request POST 'http://api.example.com/_update_by_query/{task_id}/_rethrottle' \
--header "Authorization: $API_KEY"
Get an enrich policy
Added in 7.5.0
Returns information about an enrich policy.
Path parameters
-
name
string | array[string] Required Comma-separated list of enrich policy names used to limit the request. To return information for all enrich policies, omit this parameter.
Query parameters
-
master_timeout
string Period to wait for a connection to the master node.
curl \
--request GET 'http://api.example.com/_enrich/policy/{name}' \
--header "Authorization: $API_KEY"
Path parameters
-
name
string Required Name of the enrich policy to create or update.
Query parameters
-
master_timeout
string Period to wait for a connection to the master node.
curl \
--request PUT 'http://api.example.com/_enrich/policy/{name}' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '{"additionalProperty1":{"enrich_fields":"string","indices":"string","match_field":"string","query":{},"name":"string","elasticsearch_version":"string"},"additionalProperty2":{"enrich_fields":"string","indices":"string","match_field":"string","query":{},"name":"string","elasticsearch_version":"string"}}'
Delete an enrich policy
Added in 7.5.0
Deletes an existing enrich policy and its enrich index.
Path parameters
-
name
string Required Enrich policy to delete.
Query parameters
-
master_timeout
string Period to wait for a connection to the master node.
curl \
--request DELETE 'http://api.example.com/_enrich/policy/{name}' \
--header "Authorization: $API_KEY"
Run an enrich policy
Added in 7.5.0
Create the enrich index for an existing enrich policy.
Path parameters
-
name
string Required Enrich policy to execute.
Query parameters
-
master_timeout
string Period to wait for a connection to the master node.
-
wait_for_completion
boolean If
true
, the request blocks other enrich policy execution requests until complete.
curl \
--request PUT 'http://api.example.com/_enrich/policy/{name}/_execute' \
--header "Authorization: $API_KEY"
Query parameters
-
master_timeout
string Period to wait for a connection to the master node.
curl \
--request GET 'http://api.example.com/_enrich/policy' \
--header "Authorization: $API_KEY"
Get enrich stats
Added in 7.5.0
Returns enrich coordinator statistics and information about enrich policies that are currently executing.
Query parameters
-
master_timeout
string Period to wait for a connection to the master node.
curl \
--request GET 'http://api.example.com/_enrich/_stats' \
--header "Authorization: $API_KEY"
EQL
Event Query Language (EQL) is a query language for event-based time series data, such as logs, metrics, and traces.
Get the async EQL status
Added in 7.9.0
Get the current status for an async EQL search or a stored synchronous EQL search without returning results.
Path parameters
-
id
string Required Identifier for the search.
curl \
--request GET 'http://api.example.com/_eql/search/status/{id}' \
--header "Authorization: $API_KEY"
{
"id": "FmNJRUZ1YWZCU3dHY1BIOUhaenVSRkEaaXFlZ3h4c1RTWFNocDdnY2FSaERnUTozNDE=",
"is_running" : true,
"is_partial" : true,
"start_time_in_millis" : 1611690235000,
"expiration_time_in_millis" : 1611690295000
}
Get EQL search results
Added in 7.9.0
Returns search results for an Event Query Language (EQL) query. EQL assumes each document in a data stream or index corresponds to an event.
Path parameters
-
index
string | array[string] Required The name of the index to scope the operation
Query parameters
-
allow_no_indices
boolean -
allow_partial_search_results
boolean If true, returns partial results if there are shard failures. If false, returns an error with no partial results.
-
allow_partial_sequence_results
boolean If true, sequence queries will return partial results in case of shard failures. If false, they will return no results at all. This flag has effect only if allow_partial_search_results is true.
-
expand_wildcards
string | array[string] Supported values include:
all
: Match any data stream or index, including hidden ones.open
: Match open, non-hidden indices. Also matches any non-hidden data stream.closed
: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.hidden
: Match hidden data streams and hidden indices. Must be combined withopen
,closed
, orboth
.none
: Wildcard expressions are not accepted.
-
keep_alive
string Period for which the search and its results are stored on the cluster.
-
keep_on_completion
boolean If true, the search and its results are stored on the cluster.
-
wait_for_completion_timeout
string Timeout duration to wait for the request to finish. Defaults to no timeout, meaning the request waits for complete search results.
Body
Required
-
query
string Required EQL query you wish to run.
-
case_sensitive
boolean -
event_category_field
string Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
tiebreaker_field
string Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
timestamp_field
string Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.
-
fetch_size
number filter
object | array[object] Query, written in Query DSL, used to filter the events on which the EQL query runs.
One of: An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
External documentation -
keep_alive
string A duration. Units can be
nanos
,micros
,ms
(milliseconds),s
(seconds),m
(minutes),h
(hours) andd
(days). Also accepts "0" without a unit and "-1" to indicate an unspecified value. -
keep_on_completion
boolean -
wait_for_completion_timeout
string A duration. Units can be
nanos
,micros
,ms
(milliseconds),s
(seconds),m
(minutes),h
(hours) andd
(days). Also accepts "0" without a unit and "-1" to indicate an unspecified value. -
allow_partial_search_results
boolean Allow query execution also in case of shard failures. If true, the query will keep running and will return results based on the available shards. For sequences, the behavior can be further refined using allow_partial_sequence_results
-
allow_partial_sequence_results
boolean This flag applies only to sequences and has effect only if allow_partial_search_results=true. If true, the sequence query will return results based on the available shards, ignoring the others. If false, the sequence query will return successfully, but will always have empty results.
-
size
number fields
object | array[object] Array of wildcard (*) patterns. The response returns values for field names matching these patterns in the fields property of each hit.
-
result_position
string Values are
tail
orhead
. -
runtime_mappings
object -
max_samples_per_key
number By default, the response of a sample query contains up to
10
samples, with one sample per unique set of join keys. Use thesize
parameter to get a smaller or larger set of samples. To retrieve more than one sample per set of join keys, use themax_samples_per_key
parameter. Pipes are not supported for sample queries.
curl \
--request GET 'http://api.example.com/{index}/_eql/search' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"query\": \"\"\"\n process where (process.name == \"cmd.exe\" and process.pid != 2013)\n \"\"\"\n}"'
{
"query": """
process where (process.name == "cmd.exe" and process.pid != 2013)
"""
}
{
"query": """
sequence by process.pid
[ file where file.name == "cmd.exe" and process.pid != 2013 ]
[ process where stringContains(process.executable, "regsvr32") ]
"""
}
{
"is_partial": false,
"is_running": false,
"took": 6,
"timed_out": false,
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"sequences": [
{
"join_keys": [
2012
],
"events": [
{
"_index": ".ds-my-data-stream-2099.12.07-000001",
"_id": "AtOJ4UjUBAAx3XR5kcCM",
"_source": {
"@timestamp": "2099-12-06T11:04:07.000Z",
"event": {
"category": "file",
"id": "dGCHwoeS",
"sequence": 2
},
"file": {
"accessed": "2099-12-07T11:07:08.000Z",
"name": "cmd.exe",
"path": "C:\\Windows\\System32\\cmd.exe",
"type": "file",
"size": 16384
},
"process": {
"pid": 2012,
"name": "cmd.exe",
"executable": "C:\\Windows\\System32\\cmd.exe"
}
}
},
{
"_index": ".ds-my-data-stream-2099.12.07-000001",
"_id": "OQmfCaduce8zoHT93o4H",
"_source": {
"@timestamp": "2099-12-07T11:07:09.000Z",
"event": {
"category": "process",
"id": "aR3NWVOs",
"sequence": 4
},
"process": {
"pid": 2012,
"name": "regsvr32.exe",
"command_line": "regsvr32.exe /s /u /i:https://...RegSvr32.sct scrobj.dll",
"executable": "C:\\Windows\\System32\\regsvr32.exe"
}
}
}
]
}
]
}
}
Run an async ES|QL query
Added in 8.13.0
Asynchronously run an ES|QL (Elasticsearch query language) query, monitor its progress, and retrieve results when they become available.
The API accepts the same parameters and request body as the synchronous query API, along with additional async related properties.
Query parameters
-
delimiter
string The character to use between values within a CSV row. It is valid only for the CSV format.
-
drop_null_columns
boolean Indicates whether columns that are entirely
null
will be removed from thecolumns
andvalues
portion of the results. Iftrue
, the response will include an extra section under the nameall_columns
which has the name of all the columns. -
format
string A short version of the Accept header, for example
json
oryaml
.Values are
csv
,json
,tsv
,txt
,yaml
,cbor
,smile
, orarrow
. -
keep_alive
string The period for which the query and its results are stored in the cluster. The default period is five days. When this period expires, the query and its results are deleted, even if the query is still ongoing. If the
keep_on_completion
parameter is false, Elasticsearch only stores async queries that do not complete within the period set by thewait_for_completion_timeout
parameter, regardless of this value. -
keep_on_completion
boolean Indicates whether the query and its results are stored in the cluster. If false, the query and its results are stored in the cluster only if the request does not complete during the period set by the
wait_for_completion_timeout
parameter. -
wait_for_completion_timeout
string The period to wait for the request to finish. By default, the request waits for 1 second for the query results. If the query completes during this period, results are returned Otherwise, a query ID is returned that can later be used to retrieve the results.
Body
Required
-
columnar
boolean By default, ES|QL returns results as rows. For example, FROM returns each individual document as one row. For the JSON, YAML, CBOR and smile formats, ES|QL can return the results in a columnar fashion where one row represents all the values of a certain column in the results.
-
filter
object An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
External documentation -
locale
string -
params
array[number | string | boolean | null] To avoid any attempts of hacking or code injection, extract the values in a separate list of parameters. Use question mark placeholders (?) in the query string for each of the parameters.
-
profile
boolean If provided and
true
the response will include an extraprofile
object with information on how the query was executed. This information is for human debugging and its format can change at any time but it can give some insight into the performance of each part of the query. -
query
string Required The ES|QL query API accepts an ES|QL query string in the query parameter, runs it, and returns the results.
-
tables
object Tables to use with the LOOKUP operation. The top level key is the table name and the next level key is the column name.
-
include_ccs_metadata
boolean When set to
true
and performing a cross-cluster query, the response will include an extra_clusters
object with information about the clusters that participated in the search along with info such as shards count. -
wait_for_completion_timeout
string A duration. Units can be
nanos
,micros
,ms
(milliseconds),s
(seconds),m
(minutes),h
(hours) andd
(days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.
curl \
--request POST 'http://api.example.com/_query/async' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"query\": \"\"\"\n FROM library,remote-*:library\n | EVAL year = DATE_TRUNC(1 YEARS, release_date)\n | STATS MAX(page_count) BY year\n | SORT year\n | LIMIT 5\n \"\"\",\n \"wait_for_completion_timeout\": \"2s\",\n \"include_ccs_metadata\": true\n}"'
{
"query": """
FROM library,remote-*:library
| EVAL year = DATE_TRUNC(1 YEARS, release_date)
| STATS MAX(page_count) BY year
| SORT year
| LIMIT 5
""",
"wait_for_completion_timeout": "2s",
"include_ccs_metadata": true
}
Features
The feature APIs enable you to introspect and manage features provided by Elasticsearch and Elasticsearch plugins.