Get shard allocation information
Get a snapshot of the number of shards allocated to each data node and their disk space.
IMPORTANT: CAT APIs are only intended for human consumption using the command line or Kibana console. They are not intended for use by applications.
Path parameters
-
node_id
string | array[string] Required A comma-separated list of node identifiers or names used to limit the returned information.
Query parameters
-
bytes
string The unit used to display byte values.
Values are
b
,kb
,mb
,gb
,tb
, orpb
. -
h
string | array[string] List of columns to appear in the response. Supports simple wildcards.
-
s
string | array[string] List of columns that determine how the table should be sorted. Sorting defaults to ascending and can be changed by setting
:asc
or:desc
as a suffix to the column name. -
local
boolean If
true
, the request computes the list of selected nodes from the local cluster state. Iffalse
the list of selected nodes are computed from the cluster state of the master node. In both cases the coordinating node will send requests for further information to each selected node. -
master_timeout
string Period to wait for a connection to the master node.
Values are
-1
or0
.
GET /_cat/allocation?v=true&format=json
curl \
--request GET 'http://api.example.com/_cat/allocation/{node_id}' \
--header "Authorization: $API_KEY"
[
{
"shards": "1",
"shards.undesired": "0",
"write_load.forecast": "0.0",
"disk.indices.forecast": "260b",
"disk.indices": "260b",
"disk.used": "47.3gb",
"disk.avail": "43.4gb",
"disk.total": "100.7gb",
"disk.percent": "46",
"host": "127.0.0.1",
"ip": "127.0.0.1",
"node": "CSUXak2",
"node.role": "himrst"
}
]
Get datafeeds
Added in 7.7.0
Get configuration and usage information about datafeeds.
This API returns a maximum of 10,000 datafeeds.
If the Elasticsearch security features are enabled, you must have monitor_ml
, monitor
, manage_ml
, or manage
cluster privileges to use this API.
IMPORTANT: CAT APIs are only intended for human consumption using the Kibana console or command line. They are not intended for use by applications. For application consumption, use the get datafeed statistics API.
Query parameters
-
allow_no_match
boolean Specifies what to do when the request:
- Contains wildcard expressions and there are no datafeeds that match.
- Contains the
_all
string or no identifiers and there are no matches. - Contains wildcard expressions and there are only partial matches.
If
true
, the API returns an empty datafeeds array when there are no matches and the subset of results when there are partial matches. Iffalse
, the API returns a 404 status code when there are no matches or only partial matches. -
h
string | array[string] Comma-separated list of column names to display.
Supported values include:
ae
(orassignment_explanation
): For started datafeeds only, contains messages relating to the selection of a node.bc
(orbuckets.count
,bucketsCount
): The number of buckets processed.id
: A numerical character string that uniquely identifies the datafeed.na
(ornode.address
,nodeAddress
): For started datafeeds only, the network address of the node where the datafeed is started.ne
(ornode.ephemeral_id
,nodeEphemeralId
): For started datafeeds only, the ephemeral ID of the node where the datafeed is started.ni
(ornode.id
,nodeId
): For started datafeeds only, the unique identifier of the node where the datafeed is started.nn
(ornode.name
,nodeName
): For started datafeeds only, the name of the node where the datafeed is started.sba
(orsearch.bucket_avg
,searchBucketAvg
): The average search time per bucket, in milliseconds.sc
(orsearch.count
,searchCount
): The number of searches run by the datafeed.seah
(orsearch.exp_avg_hour
,searchExpAvgHour
): The exponential average search time per hour, in milliseconds.st
(orsearch.time
,searchTime
): The total time the datafeed spent searching, in milliseconds.s
(orstate
): The status of the datafeed:starting
,started
,stopping
, orstopped
. Ifstarting
, the datafeed has been requested to start but has not yet started. Ifstarted
, the datafeed is actively receiving data. Ifstopping
, the datafeed has been requested to stop gracefully and is completing its final action. Ifstopped
, the datafeed is stopped and will not receive data until it is re-started.
Values are
ae
,assignment_explanation
,bc
,buckets.count
,bucketsCount
,id
,na
,node.address
,nodeAddress
,ne
,node.ephemeral_id
,nodeEphemeralId
,ni
,node.id
,nodeId
,nn
,node.name
,nodeName
,sba
,search.bucket_avg
,searchBucketAvg
,sc
,search.count
,searchCount
,seah
,search.exp_avg_hour
,searchExpAvgHour
,st
,search.time
,searchTime
,s
, orstate
. -
s
string | array[string] Comma-separated list of column names or column aliases used to sort the response.
Supported values include:
ae
(orassignment_explanation
): For started datafeeds only, contains messages relating to the selection of a node.bc
(orbuckets.count
,bucketsCount
): The number of buckets processed.id
: A numerical character string that uniquely identifies the datafeed.na
(ornode.address
,nodeAddress
): For started datafeeds only, the network address of the node where the datafeed is started.ne
(ornode.ephemeral_id
,nodeEphemeralId
): For started datafeeds only, the ephemeral ID of the node where the datafeed is started.ni
(ornode.id
,nodeId
): For started datafeeds only, the unique identifier of the node where the datafeed is started.nn
(ornode.name
,nodeName
): For started datafeeds only, the name of the node where the datafeed is started.sba
(orsearch.bucket_avg
,searchBucketAvg
): The average search time per bucket, in milliseconds.sc
(orsearch.count
,searchCount
): The number of searches run by the datafeed.seah
(orsearch.exp_avg_hour
,searchExpAvgHour
): The exponential average search time per hour, in milliseconds.st
(orsearch.time
,searchTime
): The total time the datafeed spent searching, in milliseconds.s
(orstate
): The status of the datafeed:starting
,started
,stopping
, orstopped
. Ifstarting
, the datafeed has been requested to start but has not yet started. Ifstarted
, the datafeed is actively receiving data. Ifstopping
, the datafeed has been requested to stop gracefully and is completing its final action. Ifstopped
, the datafeed is stopped and will not receive data until it is re-started.
Values are
ae
,assignment_explanation
,bc
,buckets.count
,bucketsCount
,id
,na
,node.address
,nodeAddress
,ne
,node.ephemeral_id
,nodeEphemeralId
,ni
,node.id
,nodeId
,nn
,node.name
,nodeName
,sba
,search.bucket_avg
,searchBucketAvg
,sc
,search.count
,searchCount
,seah
,search.exp_avg_hour
,searchExpAvgHour
,st
,search.time
,searchTime
,s
, orstate
. -
time
string The unit used to display time values.
Values are
nanos
,micros
,ms
,s
,m
,h
, ord
.
GET _cat/ml/datafeeds?v=true&format=json
curl \
--request GET 'http://api.example.com/_cat/ml/datafeeds' \
--header "Authorization: $API_KEY"
[
{
"id": "datafeed-high_sum_total_sales",
"state": "stopped",
"buckets.count": "743",
"search.count": "7"
},
{
"id": "datafeed-low_request_rate",
"state": "stopped",
"buckets.count": "1457",
"search.count": "3"
},
{
"id": "datafeed-response_code_rates",
"state": "stopped",
"buckets.count": "1460",
"search.count": "18"
},
{
"id": "datafeed-url_scanning",
"state": "stopped",
"buckets.count": "1460",
"search.count": "18"
}
]
Explain the shard allocations
Added in 5.0.0
Get explanations for shard allocations in the cluster. For unassigned shards, it provides an explanation for why the shard is unassigned. For assigned shards, it provides an explanation for why the shard is remaining on its current node and has not moved or rebalanced to another node. This API can be very useful when attempting to diagnose why a shard is unassigned or why a shard continues to remain on its current node when you might expect otherwise.
Query parameters
-
include_disk_info
boolean If true, returns information about disk usage and shard sizes.
-
include_yes_decisions
boolean If true, returns YES decisions in explanation.
-
master_timeout
string Period to wait for a connection to the master node.
Values are
-1
or0
.
Body
-
current_node
string Specifies the node ID or the name of the node to only explain a shard that is currently located on the specified node.
-
index
string -
primary
boolean If true, returns explanation for the primary shard for the given shard ID.
-
shard
number Specifies the ID of the shard that you would like an explanation for.
GET _cluster/allocation/explain
{
"index": "my-index-000001",
"shard": 0,
"primary": false,
"current_node": "my-node"
}
curl \
--request POST 'http://api.example.com/_cluster/allocation/explain' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"index\": \"my-index-000001\",\n \"shard\": 0,\n \"primary\": false,\n \"current_node\": \"my-node\"\n}"'
{
"index": "my-index-000001",
"shard": 0,
"primary": false,
"current_node": "my-node"
}
{
"index" : "my-index-000001",
"shard" : 0,
"primary" : true,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "INDEX_CREATED",
"at" : "2017-01-04T18:08:16.600Z",
"last_allocation_status" : "no"
},
"can_allocate" : "no",
"allocate_explanation" : "Elasticsearch isn't allowed to allocate this shard to any of the nodes in the cluster. Choose a node to which you expect this shard to be allocated, find this node in the node-by-node explanation, and address the reasons which prevent Elasticsearch from allocating this shard there.",
"node_allocation_decisions" : [
{
"node_id" : "8qt2rY-pT6KNZB3-hGfLnw",
"node_name" : "node-0",
"transport_address" : "127.0.0.1:9401",
"roles" : ["data", "data_cold", "data_content", "data_frozen", "data_hot", "data_warm", "ingest", "master", "ml", "remote_cluster_client", "transform"],
"node_attributes" : {},
"node_decision" : "no",
"weight_ranking" : 1,
"deciders" : [
{
"decider" : "filter",
"decision" : "NO",
"explanation" : "node does not match index setting [index.routing.allocation.include] filters [_name:\"nonexistent_node\"]"
}
]
}
]
}
{
"index" : "my-index-000001",
"shard" : 0,
"primary" : true,
"current_state" : "unassigned",
"unassigned_info" : {
"at" : "2017-01-04T18:03:28.464Z",
"failed shard on node [mEKjwwzLT1yJVb8UxT6anw]: failed recovery, failure RecoveryFailedException",
"reason": "ALLOCATION_FAILED",
"failed_allocation_attempts": 5,
"last_allocation_status": "no",
},
"can_allocate": "no",
"allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions" : [
{
"node_id" : "3sULLVJrRneSg0EfBB-2Ew",
"node_name" : "node_t0",
"transport_address" : "127.0.0.1:9400",
"roles" : ["data_content", "data_hot"],
"node_decision" : "no",
"store" : {
"matching_size" : "4.2kb",
"matching_size_in_bytes" : 4325
},
"deciders" : [
{
"decider": "max_retry",
"decision" : "NO",
"explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [POST /_cluster/reroute?retry_failed] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2024-07-30T21:04:12.166Z], failed_attempts[5], failed_nodes[[mEKjwwzLT1yJVb8UxT6anw]], delayed=false, details[failed shard on node [mEKjwwzLT1yJVb8UxT6anw]: failed recovery, failure RecoveryFailedException], allocation_status[deciders_no]]]"
}
]
}
]
}
Get a connector sync job
Beta
Path parameters
-
connector_sync_job_id
string Required The unique identifier of the connector sync job
curl \
--request GET 'http://api.example.com/_connector/_sync_job/{connector_sync_job_id}' \
--header "Authorization: $API_KEY"
Update the connector draft filtering validation
Technical preview
Update the draft filtering validation info for a connector.
Path parameters
-
connector_id
string Required The unique identifier of the connector to be updated
Body
Required
-
validation
object Required
curl \
--request PUT 'http://api.example.com/_connector/{connector_id}/_filtering/_validation' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '{"validation":{"errors":[{"ids":["string"],"messages":["string"]}],"state":"edited"}}'
Update the connector service type
Beta
Path parameters
-
connector_id
string Required The unique identifier of the connector to be updated
Body
Required
-
service_type
string Required
PUT _connector/my-connector/_service_type
{
"service_type": "sharepoint_online"
}
curl \
--request PUT 'http://api.example.com/_connector/{connector_id}/_service_type' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"service_type\": \"sharepoint_online\"\n}"'
{
"service_type": "sharepoint_online"
}
{
"result": "updated"
}
Create a follower
Added in 6.5.0
Create a cross-cluster replication follower index that follows a specific leader index. When the API returns, the follower index exists and cross-cluster replication starts replicating operations from the leader index to the follower index.
Path parameters
-
index
string Required The name of the follower index.
Query parameters
-
master_timeout
string Period to wait for a connection to the master node.
Values are
-1
or0
. -
wait_for_active_shards
number | string Specifies the number of shards to wait on being active before responding. This defaults to waiting on none of the shards to be active. A shard must be restored from the leader index before being active. Restoring a follower shard requires transferring all the remote Lucene segment files to the follower index.
Values are
all
orindex-setting
.
Body
Required
-
data_stream_name
string If the leader index is part of a data stream, the name to which the local data stream for the followed index should be renamed.
-
leader_index
string Required -
The maximum number of outstanding reads requests from the remote cluster.
-
The maximum number of outstanding write requests on the follower.
-
The maximum number of operations to pull per read from the remote cluster.
max_read_request_size
number | string -
max_retry_delay
string A duration. Units can be
nanos
,micros
,ms
(milliseconds),s
(seconds),m
(minutes),h
(hours) andd
(days). Also accepts "0" without a unit and "-1" to indicate an unspecified value. -
max_write_buffer_count
number The maximum number of operations that can be queued for writing. When this limit is reached, reads from the remote cluster will be deferred until the number of queued operations goes below the limit.
max_write_buffer_size
number | string -
The maximum number of operations per bulk write request executed on the follower.
max_write_request_size
number | string -
read_poll_timeout
string A duration. Units can be
nanos
,micros
,ms
(milliseconds),s
(seconds),m
(minutes),h
(hours) andd
(days). Also accepts "0" without a unit and "-1" to indicate an unspecified value. -
remote_cluster
string Required The remote cluster containing the leader index.
-
settings
object Index settings
PUT /follower_index/_ccr/follow?wait_for_active_shards=1
{
"remote_cluster" : "remote_cluster",
"leader_index" : "leader_index",
"settings": {
"index.number_of_replicas": 0
},
"max_read_request_operation_count" : 1024,
"max_outstanding_read_requests" : 16,
"max_read_request_size" : "1024k",
"max_write_request_operation_count" : 32768,
"max_write_request_size" : "16k",
"max_outstanding_write_requests" : 8,
"max_write_buffer_count" : 512,
"max_write_buffer_size" : "512k",
"max_retry_delay" : "10s",
"read_poll_timeout" : "30s"
}
curl \
--request PUT 'http://api.example.com/{index}/_ccr/follow' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"remote_cluster\" : \"remote_cluster\",\n \"leader_index\" : \"leader_index\",\n \"settings\": {\n \"index.number_of_replicas\": 0\n },\n \"max_read_request_operation_count\" : 1024,\n \"max_outstanding_read_requests\" : 16,\n \"max_read_request_size\" : \"1024k\",\n \"max_write_request_operation_count\" : 32768,\n \"max_write_request_size\" : \"16k\",\n \"max_outstanding_write_requests\" : 8,\n \"max_write_buffer_count\" : 512,\n \"max_write_buffer_size\" : \"512k\",\n \"max_retry_delay\" : \"10s\",\n \"read_poll_timeout\" : \"30s\"\n}"'
{
"remote_cluster" : "remote_cluster",
"leader_index" : "leader_index",
"settings": {
"index.number_of_replicas": 0
},
"max_read_request_operation_count" : 1024,
"max_outstanding_read_requests" : 16,
"max_read_request_size" : "1024k",
"max_write_request_operation_count" : 32768,
"max_write_request_size" : "16k",
"max_outstanding_write_requests" : 8,
"max_write_buffer_count" : 512,
"max_write_buffer_size" : "512k",
"max_retry_delay" : "10s",
"read_poll_timeout" : "30s"
}
{
"follow_index_created" : true,
"follow_index_shards_acked" : true,
"index_following_started" : true
}
Get follower information
Added in 6.7.0
Get information about all cross-cluster replication follower indices. For example, the results include follower index names, leader index names, replication options, and whether the follower indices are active or paused.
Path parameters
-
index
string | array[string] Required A comma-delimited list of follower index patterns.
Query parameters
-
master_timeout
string The period to wait for a connection to the master node. If the master node is not available before the timeout expires, the request fails and returns an error. It can also be set to
-1
to indicate that the request should never timeout.Values are
-1
or0
.
GET /follower_index/_ccr/info
curl \
--request GET 'http://api.example.com/{index}/_ccr/info' \
--header "Authorization: $API_KEY"
{
"follower_indices": [
{
"follower_index": "follower_index",
"remote_cluster": "remote_cluster",
"leader_index": "leader_index",
"status": "active",
"parameters": {
"max_read_request_operation_count": 5120,
"max_read_request_size": "32mb",
"max_outstanding_read_requests": 12,
"max_write_request_operation_count": 5120,
"max_write_request_size": "9223372036854775807b",
"max_outstanding_write_requests": 9,
"max_write_buffer_count": 2147483647,
"max_write_buffer_size": "512mb",
"max_retry_delay": "500ms",
"read_poll_timeout": "1m"
}
}
]
}
{
"follower_indices": [
{
"follower_index": "follower_index",
"remote_cluster": "remote_cluster",
"leader_index": "leader_index",
"status": "paused"
}
]
}
Create a new document in the index
Added in 5.0.0
You can index a new JSON document with the /<target>/_doc/
or /<target>/_create/<_id>
APIs
Using _create
guarantees that the document is indexed only if it does not already exist.
It returns a 409 response when a document with a same ID already exists in the index.
To update an existing document, you must use the /<target>/_doc/
API.
If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:
- To add a document using the
PUT /<target>/_create/<_id>
orPOST /<target>/_create/<_id>
request formats, you must have thecreate_doc
,create
,index
, orwrite
index privilege. - To automatically create a data stream or index with this API request, you must have the
auto_configure
,create_index
, ormanage
index privilege.
Automatic data stream creation requires a matching index template with data stream enabled.
Automatically create data streams and indices
If the request's target doesn't exist and matches an index template with a data_stream
definition, the index operation automatically creates the data stream.
If the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.
NOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.
If no mapping exists, the index operation creates a dynamic mapping. By default, new fields and objects are automatically added to the mapping if needed.
Automatic index creation is controlled by the action.auto_create_index
setting.
If it is true
, any index can be created automatically.
You can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to false
to turn off automatic index creation entirely.
Specify a comma-separated list of patterns you want to allow or prefix each pattern with +
or -
to indicate whether it should be allowed or blocked.
When a list is specified, the default behaviour is to disallow.
NOTE: The action.auto_create_index
setting affects the automatic creation of indices only.
It does not affect the creation of data streams.
Routing
By default, shard placement — or routing — is controlled by using a hash of the document's ID value.
For more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the routing
parameter.
When setting up explicit mapping, you can also use the _routing
field to direct the index operation to extract the routing value from the document itself.
This does come at the (very minimal) cost of an additional document parsing pass.
If the _routing
mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.
NOTE: Data streams do not support custom routing unless they were created with the allow_custom_routing
setting enabled in the template.
Distributed
The index operation is directed to the primary shard based on its route and performed on the actual node containing this shard. After the primary shard completes the operation, if needed, the update is distributed to applicable replicas.
Active shards
To improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.
If the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.
By default, write operations only wait for the primary shards to be active before proceeding (that is to say wait_for_active_shards
is 1
).
This default can be overridden in the index settings dynamically by setting index.write.wait_for_active_shards
.
To alter this behavior per operation, use the wait_for_active_shards request
parameter.
Valid values are all or any positive integer up to the total number of configured copies per shard in the index (which is number_of_replicas
+1).
Specifying a negative value or a number greater than the number of shard copies will throw an error.
For example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).
If you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.
This means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.
If wait_for_active_shards
is set on the request to 3
(and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.
This requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.
However, if you set wait_for_active_shards
to all
(or to 4
, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.
The operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.
It is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.
After the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.
The _shards
section of the API response reveals the number of shard copies on which replication succeeded and failed.
Path parameters
-
index
string Required The name of the data stream or index to target. If the target doesn't exist and matches the name or wildcard (
*
) pattern of an index template with adata_stream
definition, this request creates the data stream. If the target doesn't exist and doesn’t match a data stream template, this request creates the index. -
id
string Required A unique identifier for the document. To automatically generate a document ID, use the
POST /<target>/_doc/
request format.
Query parameters
-
if_primary_term
number Only perform the operation if the document has this primary term.
-
if_seq_no
number Only perform the operation if the document has this sequence number.
-
include_source_on_error
boolean True or false if to include the document source in the error message in case of parsing errors.
-
op_type
string Set to
create
to only index the document if it does not already exist (put if absent). If a document with the specified_id
already exists, the indexing operation will fail. The behavior is the same as using the<index>/_create
endpoint. If a document ID is specified, this paramater defaults toindex
. Otherwise, it defaults tocreate
. If the request targets a data stream, anop_type
ofcreate
is required.Supported values include:
index
: Overwrite any documents that already exist.create
: Only index documents that do not already exist.
Values are
index
orcreate
. -
pipeline
string The ID of the pipeline to use to preprocess incoming documents. If the index has a default ingest pipeline specified, setting the value to
_none
turns off the default ingest pipeline for this request. If a final pipeline is configured, it will always run regardless of the value of this parameter. -
refresh
string If
true
, Elasticsearch refreshes the affected shards to make this operation visible to search. Ifwait_for
, it waits for a refresh to make this operation visible to search. Iffalse
, it does nothing with refreshes.Values are
true
,false
, orwait_for
. -
require_alias
boolean If
true
, the destination must be an index alias. -
require_data_stream
boolean If
true
, the request's actions must target a data stream (existing or to be created). -
routing
string A custom value that is used to route operations to a specific shard.
-
timeout
string The period the request waits for the following operations: automatic index creation, dynamic mapping updates, waiting for active shards. Elasticsearch waits for at least the specified timeout period before failing. The actual wait time could be longer, particularly when multiple waits occur.
This parameter is useful for situations where the primary shard assigned to perform the operation might not be available when the operation runs. Some reasons for this might be that the primary shard is currently recovering from a gateway or undergoing relocation. By default, the operation will wait on the primary shard to become available for at least 1 minute before failing and responding with an error. The actual wait time could be longer, particularly when multiple waits occur.
Values are
-1
or0
. -
version
number The explicit version number for concurrency control. It must be a non-negative long number.
-
version_type
string The version type.
Supported values include:
internal
: Use internal versioning that starts at 1 and increments with each update or delete.external
: Only index the document if the specified version is strictly higher than the version of the stored document or if there is no existing document.external_gte
: Only index the document if the specified version is equal or higher than the version of the stored document or if there is no existing document. NOTE: Theexternal_gte
version type is meant for special use cases and should be used with care. If used incorrectly, it can result in loss of data.force
: This option is deprecated because it can cause primary and replica shards to diverge.
Values are
internal
,external
,external_gte
, orforce
. -
wait_for_active_shards
number | string The number of shard copies that must be active before proceeding with the operation. You can set it to
all
or any positive integer up to the total number of shards in the index (number_of_replicas+1
). The default value of1
means it waits for each primary shard to be active.Values are
all
orindex-setting
.
PUT my-index-000001/_create/1
{
"@timestamp": "2099-11-15T13:12:00",
"message": "GET /search HTTP/1.1 200 1070000",
"user": {
"id": "kimchy"
}
}
curl \
--request PUT 'http://api.example.com/{index}/_create/{id}' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"@timestamp\": \"2099-11-15T13:12:00\",\n \"message\": \"GET /search HTTP/1.1 200 1070000\",\n \"user\": {\n \"id\": \"kimchy\"\n }\n}"'
{
"@timestamp": "2099-11-15T13:12:00",
"message": "GET /search HTTP/1.1 200 1070000",
"user": {
"id": "kimchy"
}
}
Delete documents
Added in 5.0.0
Deletes documents that match the specified query.
If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or alias:
read
delete
orwrite
You can specify the query criteria in the request URI or the request body using the same syntax as the search API. When you submit a delete by query request, Elasticsearch gets a snapshot of the data stream or index when it begins processing the request and deletes matching documents using internal versioning. If a document changes between the time that the snapshot is taken and the delete operation is processed, it results in a version conflict and the delete operation fails.
NOTE: Documents with a version equal to 0 cannot be deleted using delete by query because internal versioning does not support 0 as a valid version number.
While processing a delete by query request, Elasticsearch performs multiple search requests sequentially to find all of the matching documents to delete. A bulk delete request is performed for each batch of matching documents. If a search or bulk request is rejected, the requests are retried up to 10 times, with exponential back off. If the maximum retry limit is reached, processing halts and all failed requests are returned in the response. Any delete requests that completed successfully still stick, they are not rolled back.
You can opt to count version conflicts instead of halting and returning by setting conflicts
to proceed
.
Note that if you opt to count version conflicts the operation could attempt to delete more documents from the source than max_docs
until it has successfully deleted max_docs documents
, or it has gone through every document in the source query.
Throttling delete requests
To control the rate at which delete by query issues batches of delete operations, you can set requests_per_second
to any positive decimal number.
This pads each batch with a wait time to throttle the rate.
Set requests_per_second
to -1
to disable throttling.
Throttling uses a wait time between batches so that the internal scroll requests can be given a timeout that takes the request padding into account.
The padding time is the difference between the batch size divided by the requests_per_second
and the time spent writing.
By default the batch size is 1000
, so if requests_per_second
is set to 500
:
target_time = 1000 / 500 per second = 2 seconds
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
Since the batch is issued as a single _bulk
request, large batch sizes cause Elasticsearch to create many requests and wait before starting the next set.
This is "bursty" instead of "smooth".
Slicing
Delete by query supports sliced scroll to parallelize the delete process. This can improve efficiency and provide a convenient way to break the request down into smaller parts.
Setting slices
to auto
lets Elasticsearch choose the number of slices to use.
This setting will use one slice per shard, up to a certain limit.
If there are multiple source data streams or indices, it will choose the number of slices based on the index or backing index with the smallest number of shards.
Adding slices to the delete by query operation creates sub-requests which means it has some quirks:
- You can see these requests in the tasks APIs. These sub-requests are "child" tasks of the task for the request with slices.
- Fetching the status of the task for the request with slices only contains the status of completed slices.
- These sub-requests are individually addressable for things like cancellation and rethrottling.
- Rethrottling the request with
slices
will rethrottle the unfinished sub-request proportionally. - Canceling the request with
slices
will cancel each sub-request. - Due to the nature of
slices
each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution. - Parameters like
requests_per_second
andmax_docs
on a request withslices
are distributed proportionally to each sub-request. Combine that with the earlier point about distribution being uneven and you should conclude that usingmax_docs
withslices
might not result in exactlymax_docs
documents being deleted. - Each sub-request gets a slightly different snapshot of the source data stream or index though these are all taken at approximately the same time.
If you're slicing manually or otherwise tuning automatic slicing, keep in mind that:
- Query performance is most efficient when the number of slices is equal to the number of shards in the index or backing index. If that number is large (for example, 500), choose a lower number as too many
slices
hurts performance. Settingslices
higher than the number of shards generally does not improve efficiency and adds overhead. - Delete performance scales linearly across available resources with the number of slices.
Whether query or delete performance dominates the runtime depends on the documents being reindexed and cluster resources.
Cancel a delete by query operation
Any delete by query can be canceled using the task cancel API. For example:
POST _tasks/r1A2WoRbTwKZ516z6NEs5A:36619/_cancel
The task ID can be found by using the get tasks API.
Cancellation should happen quickly but might take a few seconds. The get task status API will continue to list the delete by query task until this task checks that it has been cancelled and terminates itself.
Path parameters
-
index
string | array[string] Required A comma-separated list of data streams, indices, and aliases to search. It supports wildcards (
*
). To search all data streams or indices, omit this parameter or use*
or_all
.
Query parameters
-
allow_no_indices
boolean If
false
, the request returns an error if any wildcard expression, index alias, or_all
value targets only missing or closed indices. This behavior applies even if the request targets other open indices. For example, a request targetingfoo*,bar*
returns an error if an index starts withfoo
but no index starts withbar
. -
analyzer
string Analyzer to use for the query string. This parameter can be used only when the
q
query string parameter is specified. -
analyze_wildcard
boolean If
true
, wildcard and prefix queries are analyzed. This parameter can be used only when theq
query string parameter is specified. -
conflicts
string What to do if delete by query hits version conflicts:
abort
orproceed
.Supported values include:
abort
: Stop reindexing if there are conflicts.proceed
: Continue reindexing even if there are conflicts.
Values are
abort
orproceed
. -
default_operator
string The default operator for query string query:
AND
orOR
. This parameter can be used only when theq
query string parameter is specified.Values are
and
,AND
,or
, orOR
. -
df
string The field to use as default where no field prefix is given in the query string. This parameter can be used only when the
q
query string parameter is specified. -
expand_wildcards
string | array[string] The type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. It supports comma-separated values, such as
open,hidden
.Supported values include:
all
: Match any data stream or index, including hidden ones.open
: Match open, non-hidden indices. Also matches any non-hidden data stream.closed
: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.hidden
: Match hidden data streams and hidden indices. Must be combined withopen
,closed
, orboth
.none
: Wildcard expressions are not accepted.
Values are
all
,open
,closed
,hidden
, ornone
. -
from
number Skips the specified number of documents.
-
lenient
boolean If
true
, format-based query failures (such as providing text to a numeric field) in the query string will be ignored. This parameter can be used only when theq
query string parameter is specified. -
max_docs
number The maximum number of documents to process. Defaults to all documents. When set to a value less then or equal to
scroll_size
, a scroll will not be used to retrieve the results for the operation. -
preference
string The node or shard the operation should be performed on. It is random by default.
-
refresh
boolean If
true
, Elasticsearch refreshes all shards involved in the delete by query after the request completes. This is different than the delete API'srefresh
parameter, which causes just the shard that received the delete request to be refreshed. Unlike the delete API, it does not supportwait_for
. -
request_cache
boolean If
true
, the request cache is used for this request. Defaults to the index-level setting. -
requests_per_second
number The throttle for this request in sub-requests per second.
-
routing
string A custom value used to route operations to a specific shard.
-
q
string A query in the Lucene query string syntax.
-
scroll
string The period to retain the search context for scrolling.
Values are
-1
or0
. -
scroll_size
number The size of the scroll request that powers the operation.
-
search_timeout
string The explicit timeout for each search request. It defaults to no timeout.
Values are
-1
or0
. -
search_type
string The type of the search operation. Available options include
query_then_fetch
anddfs_query_then_fetch
.Supported values include:
query_then_fetch
: Documents are scored using local term and document frequencies for the shard. This is usually faster but less accurate.dfs_query_then_fetch
: Documents are scored using global term and document frequencies across all shards. This is usually slower but more accurate.
Values are
query_then_fetch
ordfs_query_then_fetch
. -
slices
number | string The number of slices this task should be divided into.
Value is
auto
. -
sort
array[string] A comma-separated list of
<field>:<direction>
pairs. -
stats
array[string] The specific
tag
of the request for logging and statistical purposes. -
terminate_after
number The maximum number of documents to collect for each shard. If a query reaches this limit, Elasticsearch terminates the query early. Elasticsearch collects documents before sorting.
Use with caution. Elasticsearch applies this parameter to each shard handling the request. When possible, let Elasticsearch perform early termination automatically. Avoid specifying this parameter for requests that target data streams with backing indices across multiple data tiers.
-
timeout
string The period each deletion request waits for active shards.
Values are
-1
or0
. -
version
boolean If
true
, returns the document version as part of a hit. -
wait_for_active_shards
number | string The number of shard copies that must be active before proceeding with the operation. Set to
all
or any positive integer up to the total number of shards in the index (number_of_replicas+1
). Thetimeout
value controls how long each write request waits for unavailable shards to become available.Values are
all
orindex-setting
. -
wait_for_completion
boolean If
true
, the request blocks until the operation is complete. Iffalse
, Elasticsearch performs some preflight checks, launches the request, and returns a task you can use to cancel or get the status of the task. Elasticsearch creates a record of this task as a document at.tasks/task/${taskId}
. When you are done with a task, you should delete the task document so Elasticsearch can reclaim the space.
Body
Required
-
max_docs
number The maximum number of documents to delete.
-
query
object An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.
External documentation -
slice
object
POST /my-index-000001,my-index-000002/_delete_by_query
{
"query": {
"match_all": {}
}
}
curl \
--request POST 'http://api.example.com/{index}/_delete_by_query' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"query\": {\n \"match_all\": {}\n }\n}"'
{
"query": {
"match_all": {}
}
}
{
"query": {
"term": {
"user.id": "kimchy"
}
},
"max_docs": 1
}
{
"slice": {
"id": 0,
"max": 2
},
"query": {
"range": {
"http.response.bytes": {
"lt": 2000000
}
}
}
}
{
"query": {
"range": {
"http.response.bytes": {
"lt": 2000000
}
}
}
}
{
"took" : 147,
"timed_out": false,
"total": 119,
"deleted": 119,
"batches": 1,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1.0,
"throttled_until_millis": 0,
"failures" : [ ]
}
Get term vector information
Get information and statistics about terms in the fields of a particular document.
You can retrieve term vectors for documents stored in the index or for artificial documents passed in the body of the request.
You can specify the fields you are interested in through the fields
parameter or by adding the fields to the request body.
For example:
GET /my-index-000001/_termvectors/1?fields=message
Fields can be specified using wildcards, similar to the multi match query.
Term vectors are real-time by default, not near real-time.
This can be changed by setting realtime
parameter to false
.
You can request three types of values: term information, term statistics, and field statistics. By default, all term information and field statistics are returned for all fields but term statistics are excluded.
Term information
- term frequency in the field (always returned)
- term positions (
positions: true
) - start and end offsets (
offsets: true
) - term payloads (
payloads: true
), as base64 encoded bytes
If the requested information wasn't stored in the index, it will be computed on the fly if possible. Additionally, term vectors could be computed for documents not even existing in the index, but instead provided by the user.
Start and end offsets assume UTF-16 encoding is being used. If you want to use these offsets in order to get the original text that produced this token, you should make sure that the string you are taking a sub-string of is also encoded using UTF-16.
Behaviour
The term and field statistics are not accurate.
Deleted documents are not taken into account.
The information is only retrieved for the shard the requested document resides in.
The term and field statistics are therefore only useful as relative measures whereas the absolute numbers have no meaning in this context.
By default, when requesting term vectors of artificial documents, a shard to get the statistics from is randomly selected.
Use routing
only to hit a particular shard.
Query parameters
-
fields
string | array[string] A comma-separated list or wildcard expressions of fields to include in the statistics. It is used as the default list unless a specific field list is provided in the
completion_fields
orfielddata_fields
parameters. -
field_statistics
boolean If
true
, the response includes:- The document count (how many documents contain this field).
- The sum of document frequencies (the sum of document frequencies for all terms in this field).
- The sum of total term frequencies (the sum of total term frequencies of each term in this field).
-
offsets
boolean If
true
, the response includes term offsets. -
payloads
boolean If
true
, the response includes term payloads. -
positions
boolean If
true
, the response includes term positions. -
preference
string The node or shard the operation should be performed on. It is random by default.
-
realtime
boolean If true, the request is real-time as opposed to near-real-time.
-
routing
string A custom value that is used to route operations to a specific shard.
-
term_statistics
boolean If
true
, the response includes:- The total term frequency (how often a term occurs in all documents).
- The document frequency (the number of documents containing the current term).
By default these values are not returned since term statistics can have a serious performance impact.
-
version
number If
true
, returns the document version as part of a hit. -
version_type
string The version type.
Supported values include:
internal
: Use internal versioning that starts at 1 and increments with each update or delete.external
: Only index the document if the specified version is strictly higher than the version of the stored document or if there is no existing document.external_gte
: Only index the document if the specified version is equal or higher than the version of the stored document or if there is no existing document. NOTE: Theexternal_gte
version type is meant for special use cases and should be used with care. If used incorrectly, it can result in loss of data.force
: This option is deprecated because it can cause primary and replica shards to diverge.
Values are
internal
,external
,external_gte
, orforce
.
Body
-
doc
object An artificial document (a document not present in the index) for which you want to retrieve term vectors.
-
filter
object -
per_field_analyzer
object Override the default per-field analyzer. This is useful in order to generate term vectors in any fashion, especially when using artificial documents. When providing an analyzer for a field that already stores term vectors, the term vectors will be regenerated.
-
fields
string | array[string] -
field_statistics
boolean If
true
, the response includes:- The document count (how many documents contain this field).
- The sum of document frequencies (the sum of document frequencies for all terms in this field).
- The sum of total term frequencies (the sum of total term frequencies of each term in this field).
-
offsets
boolean If
true
, the response includes term offsets. -
payloads
boolean If
true
, the response includes term payloads. -
positions
boolean If
true
, the response includes term positions. -
term_statistics
boolean If
true
, the response includes:- The total term frequency (how often a term occurs in all documents).
- The document frequency (the number of documents containing the current term).
By default these values are not returned since term statistics can have a serious performance impact.
-
routing
string -
version
number -
version_type
string Values are
internal
,external
,external_gte
, orforce
.
GET /my-index-000001/_termvectors/1
{
"fields" : ["text"],
"offsets" : true,
"payloads" : true,
"positions" : true,
"term_statistics" : true,
"field_statistics" : true
}
curl \
--request GET 'http://api.example.com/{index}/_termvectors/{id}' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"fields\" : [\"text\"],\n \"offsets\" : true,\n \"payloads\" : true,\n \"positions\" : true,\n \"term_statistics\" : true,\n \"field_statistics\" : true\n}"'
{
"fields" : ["text"],
"offsets" : true,
"payloads" : true,
"positions" : true,
"term_statistics" : true,
"field_statistics" : true
}
{
"doc" : {
"fullname" : "John Doe",
"text" : "test test test"
},
"fields": ["fullname"],
"per_field_analyzer" : {
"fullname": "keyword"
}
}
{
"doc": {
"plot": "When wealthy industrialist Tony Stark is forced to build an armored suit after a life-threatening incident, he ultimately decides to use its technology to fight against evil."
},
"term_statistics": true,
"field_statistics": true,
"positions": false,
"offsets": false,
"filter": {
"max_num_terms": 3,
"min_term_freq": 1,
"min_doc_freq": 1
}
}
{
"fields" : ["text", "some_field_without_term_vectors"],
"offsets" : true,
"positions" : true,
"term_statistics" : true,
"field_statistics" : true
}
{
"doc" : {
"fullname" : "John Doe",
"text" : "test test test"
}
}
{
"_index": "my-index-000001",
"_id": "1",
"_version": 1,
"found": true,
"took": 6,
"term_vectors": {
"text": {
"field_statistics": {
"sum_doc_freq": 4,
"doc_count": 2,
"sum_ttf": 6
},
"terms": {
"test": {
"doc_freq": 2,
"ttf": 4,
"term_freq": 3,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 4,
"payload": "d29yZA=="
},
{
"position": 1,
"start_offset": 5,
"end_offset": 9,
"payload": "d29yZA=="
},
{
"position": 2,
"start_offset": 10,
"end_offset": 14,
"payload": "d29yZA=="
}
]
}
}
}
}
}
{
"_index": "my-index-000001",
"_version": 0,
"found": true,
"took": 6,
"term_vectors": {
"fullname": {
"field_statistics": {
"sum_doc_freq": 2,
"doc_count": 4,
"sum_ttf": 4
},
"terms": {
"John Doe": {
"term_freq": 1,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 8
}
]
}
}
}
}
}
{
"_index": "imdb",
"_version": 0,
"found": true,
"term_vectors": {
"plot": {
"field_statistics": {
"sum_doc_freq": 3384269,
"doc_count": 176214,
"sum_ttf": 3753460
},
"terms": {
"armored": {
"doc_freq": 27,
"ttf": 27,
"term_freq": 1,
"score": 9.74725
},
"industrialist": {
"doc_freq": 88,
"ttf": 88,
"term_freq": 1,
"score": 8.590818
},
"stark": {
"doc_freq": 44,
"ttf": 47,
"term_freq": 1,
"score": 9.272792
}
}
}
}
}
Get term vector information
Get information and statistics about terms in the fields of a particular document.
You can retrieve term vectors for documents stored in the index or for artificial documents passed in the body of the request.
You can specify the fields you are interested in through the fields
parameter or by adding the fields to the request body.
For example:
GET /my-index-000001/_termvectors/1?fields=message
Fields can be specified using wildcards, similar to the multi match query.
Term vectors are real-time by default, not near real-time.
This can be changed by setting realtime
parameter to false
.
You can request three types of values: term information, term statistics, and field statistics. By default, all term information and field statistics are returned for all fields but term statistics are excluded.
Term information
- term frequency in the field (always returned)
- term positions (
positions: true
) - start and end offsets (
offsets: true
) - term payloads (
payloads: true
), as base64 encoded bytes
If the requested information wasn't stored in the index, it will be computed on the fly if possible. Additionally, term vectors could be computed for documents not even existing in the index, but instead provided by the user.
Start and end offsets assume UTF-16 encoding is being used. If you want to use these offsets in order to get the original text that produced this token, you should make sure that the string you are taking a sub-string of is also encoded using UTF-16.
Behaviour
The term and field statistics are not accurate.
Deleted documents are not taken into account.
The information is only retrieved for the shard the requested document resides in.
The term and field statistics are therefore only useful as relative measures whereas the absolute numbers have no meaning in this context.
By default, when requesting term vectors of artificial documents, a shard to get the statistics from is randomly selected.
Use routing
only to hit a particular shard.
Path parameters
-
index
string Required The name of the index that contains the document.
Query parameters
-
fields
string | array[string] A comma-separated list or wildcard expressions of fields to include in the statistics. It is used as the default list unless a specific field list is provided in the
completion_fields
orfielddata_fields
parameters. -
field_statistics
boolean If
true
, the response includes:- The document count (how many documents contain this field).
- The sum of document frequencies (the sum of document frequencies for all terms in this field).
- The sum of total term frequencies (the sum of total term frequencies of each term in this field).
-
offsets
boolean If
true
, the response includes term offsets. -
payloads
boolean If
true
, the response includes term payloads. -
positions
boolean If
true
, the response includes term positions. -
preference
string The node or shard the operation should be performed on. It is random by default.
-
realtime
boolean If true, the request is real-time as opposed to near-real-time.
-
routing
string A custom value that is used to route operations to a specific shard.
-
term_statistics
boolean If
true
, the response includes:- The total term frequency (how often a term occurs in all documents).
- The document frequency (the number of documents containing the current term).
By default these values are not returned since term statistics can have a serious performance impact.
-
version
number If
true
, returns the document version as part of a hit. -
version_type
string The version type.
Supported values include:
internal
: Use internal versioning that starts at 1 and increments with each update or delete.external
: Only index the document if the specified version is strictly higher than the version of the stored document or if there is no existing document.external_gte
: Only index the document if the specified version is equal or higher than the version of the stored document or if there is no existing document. NOTE: Theexternal_gte
version type is meant for special use cases and should be used with care. If used incorrectly, it can result in loss of data.force
: This option is deprecated because it can cause primary and replica shards to diverge.
Values are
internal
,external
,external_gte
, orforce
.
Body
-
doc
object An artificial document (a document not present in the index) for which you want to retrieve term vectors.
-
filter
object -
per_field_analyzer
object Override the default per-field analyzer. This is useful in order to generate term vectors in any fashion, especially when using artificial documents. When providing an analyzer for a field that already stores term vectors, the term vectors will be regenerated.
-
fields
string | array[string] -
field_statistics
boolean If
true
, the response includes:- The document count (how many documents contain this field).
- The sum of document frequencies (the sum of document frequencies for all terms in this field).
- The sum of total term frequencies (the sum of total term frequencies of each term in this field).
-
offsets
boolean If
true
, the response includes term offsets. -
payloads
boolean If
true
, the response includes term payloads. -
positions
boolean If
true
, the response includes term positions. -
term_statistics
boolean If
true
, the response includes:- The total term frequency (how often a term occurs in all documents).
- The document frequency (the number of documents containing the current term).
By default these values are not returned since term statistics can have a serious performance impact.
-
routing
string -
version
number -
version_type
string Values are
internal
,external
,external_gte
, orforce
.
GET /my-index-000001/_termvectors/1
{
"fields" : ["text"],
"offsets" : true,
"payloads" : true,
"positions" : true,
"term_statistics" : true,
"field_statistics" : true
}
curl \
--request GET 'http://api.example.com/{index}/_termvectors' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"fields\" : [\"text\"],\n \"offsets\" : true,\n \"payloads\" : true,\n \"positions\" : true,\n \"term_statistics\" : true,\n \"field_statistics\" : true\n}"'
{
"fields" : ["text"],
"offsets" : true,
"payloads" : true,
"positions" : true,
"term_statistics" : true,
"field_statistics" : true
}
{
"doc" : {
"fullname" : "John Doe",
"text" : "test test test"
},
"fields": ["fullname"],
"per_field_analyzer" : {
"fullname": "keyword"
}
}
{
"doc": {
"plot": "When wealthy industrialist Tony Stark is forced to build an armored suit after a life-threatening incident, he ultimately decides to use its technology to fight against evil."
},
"term_statistics": true,
"field_statistics": true,
"positions": false,
"offsets": false,
"filter": {
"max_num_terms": 3,
"min_term_freq": 1,
"min_doc_freq": 1
}
}
{
"fields" : ["text", "some_field_without_term_vectors"],
"offsets" : true,
"positions" : true,
"term_statistics" : true,
"field_statistics" : true
}
{
"doc" : {
"fullname" : "John Doe",
"text" : "test test test"
}
}
{
"_index": "my-index-000001",
"_id": "1",
"_version": 1,
"found": true,
"took": 6,
"term_vectors": {
"text": {
"field_statistics": {
"sum_doc_freq": 4,
"doc_count": 2,
"sum_ttf": 6
},
"terms": {
"test": {
"doc_freq": 2,
"ttf": 4,
"term_freq": 3,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 4,
"payload": "d29yZA=="
},
{
"position": 1,
"start_offset": 5,
"end_offset": 9,
"payload": "d29yZA=="
},
{
"position": 2,
"start_offset": 10,
"end_offset": 14,
"payload": "d29yZA=="
}
]
}
}
}
}
}
{
"_index": "my-index-000001",
"_version": 0,
"found": true,
"took": 6,
"term_vectors": {
"fullname": {
"field_statistics": {
"sum_doc_freq": 2,
"doc_count": 4,
"sum_ttf": 4
},
"terms": {
"John Doe": {
"term_freq": 1,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 8
}
]
}
}
}
}
}
{
"_index": "imdb",
"_version": 0,
"found": true,
"term_vectors": {
"plot": {
"field_statistics": {
"sum_doc_freq": 3384269,
"doc_count": 176214,
"sum_ttf": 3753460
},
"terms": {
"armored": {
"doc_freq": 27,
"ttf": 27,
"term_freq": 1,
"score": 9.74725
},
"industrialist": {
"doc_freq": 88,
"ttf": 88,
"term_freq": 1,
"score": 8.590818
},
"stark": {
"doc_freq": 44,
"ttf": 47,
"term_freq": 1,
"score": 9.272792
}
}
}
}
}
Create or update a component template
Added in 7.8.0
Component templates are building blocks for constructing index templates that specify index mappings, settings, and aliases.
An index template can be composed of multiple component templates.
To use a component template, specify it in an index template’s composed_of
list.
Component templates are only applied to new data streams and indices as part of a matching index template.
Settings and mappings specified directly in the index template or the create index request override any settings or mappings specified in a component template.
Component templates are only used during index creation. For data streams, this includes data stream creation and the creation of a stream’s backing indices. Changes to component templates do not affect existing indices, including a stream’s backing indices.
You can use C-style /* *\/
block comments in component templates.
You can include comments anywhere in the request body except before the opening curly bracket.
Applying component templates
You cannot directly apply a component template to a data stream or index.
To be applied, a component template must be included in an index template's composed_of
list.
Path parameters
-
name
string Required Name of the component template to create. Elasticsearch includes the following built-in component templates:
logs-mappings
;logs-settings
;metrics-mappings
;metrics-settings
;synthetics-mapping
;synthetics-settings
. Elastic Agent uses these templates to configure backing indices for its data streams. If you use Elastic Agent and want to overwrite one of these templates, set theversion
for your replacement template higher than the current version. If you don’t use Elastic Agent and want to disable all built-in component and index templates, setstack.templates.enabled
tofalse
using the cluster update settings API.
Query parameters
-
create
boolean If
true
, this request cannot replace or update existing component templates. -
master_timeout
string Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.
Values are
-1
or0
.
Body
Required
-
template
object Required -
version
number -
_meta
object -
deprecated
boolean Marks this index template as deprecated. When creating or updating a non-deprecated index template that uses deprecated components, Elasticsearch will emit a deprecation warning.
PUT _component_template/template_1
{
"template": null,
"settings": {
"number_of_shards": 1
},
"mappings": {
"_source": {
"enabled": false
},
"properties": {
"host_name": {
"type": "keyword"
},
"created_at": {
"type": "date",
"format": "EEE MMM dd HH:mm:ss Z yyyy"
}
}
}
}
curl \
--request POST 'http://api.example.com/_component_template/{name}' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"template\": null,\n \"settings\": {\n \"number_of_shards\": 1\n },\n \"mappings\": {\n \"_source\": {\n \"enabled\": false\n },\n \"properties\": {\n \"host_name\": {\n \"type\": \"keyword\"\n },\n \"created_at\": {\n \"type\": \"date\",\n \"format\": \"EEE MMM dd HH:mm:ss Z yyyy\"\n }\n }\n }\n}"'
{
"template": null,
"settings": {
"number_of_shards": 1
},
"mappings": {
"_source": {
"enabled": false
},
"properties": {
"host_name": {
"type": "keyword"
},
"created_at": {
"type": "date",
"format": "EEE MMM dd HH:mm:ss Z yyyy"
}
}
}
}
{
"template": null,
"settings": {
"number_of_shards": 1
},
"aliases": {
"alias1": {},
"alias2": {
"filter": {
"term": {
"user.id": "kimchy"
}
},
"routing": "shard-1"
},
"{index}-alias": {}
}
}
Path parameters
-
index
string | array[string] Required Comma-separated list of data streams, indices, and aliases. Supports wildcards (
*
).
Query parameters
-
allow_no_indices
boolean If
false
, the request returns an error if any wildcard expression, index alias, or_all
value targets only missing or closed indices. This behavior applies even if the request targets other open indices. -
expand_wildcards
string | array[string] Type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. Supports comma-separated values, such as
open,hidden
. Valid values are:all
,open
,closed
,hidden
,none
.Supported values include:
all
: Match any data stream or index, including hidden ones.open
: Match open, non-hidden indices. Also matches any non-hidden data stream.closed
: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.hidden
: Match hidden data streams and hidden indices. Must be combined withopen
,closed
, orboth
.none
: Wildcard expressions are not accepted.
Values are
all
,open
,closed
,hidden
, ornone
. -
flat_settings
boolean If
true
, returns settings in flat format. -
include_defaults
boolean If
true
, return all default settings in the response. -
local
boolean If
true
, the request retrieves information from the local node only.
curl \
--request HEAD 'http://api.example.com/{index}' \
--header "Authorization: $API_KEY"
Delete data stream options
Added in 8.19.0
Removes the data stream options from a data stream.
Path parameters
-
name
string | array[string] Required A comma-separated list of data streams of which the data stream options will be deleted; use
*
to get all data streams
Query parameters
-
expand_wildcards
string | array[string] Whether wildcard expressions should get expanded to open or closed indices (default: open)
Supported values include:
all
: Match any data stream or index, including hidden ones.open
: Match open, non-hidden indices. Also matches any non-hidden data stream.closed
: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.hidden
: Match hidden data streams and hidden indices. Must be combined withopen
,closed
, orboth
.none
: Wildcard expressions are not accepted.
Values are
all
,open
,closed
,hidden
, ornone
. -
master_timeout
string Specify timeout for connection to master
Values are
-1
or0
. -
timeout
string Explicit timestamp for the document
Values are
-1
or0
.
curl \
--request DELETE 'http://api.example.com/_data_stream/{name}/_options' \
--header "Authorization: $API_KEY"
{
"acknowledged": true
}
Update index settings
Changes dynamic index settings in real time. For data streams, index setting changes are applied to all backing indices by default.
To revert a setting to the default value, use a null value.
The list of per-index settings that can be updated dynamically on live indices can be found in index settings documentation.
To preserve existing settings from being updated, set the preserve_existing
parameter to true
.
There are multiple valid ways to represent index settings in the request body. You can specify only the setting, for example:
{
"number_of_replicas": 1
}
Or you can use an index
setting object:
{
"index": {
"number_of_replicas": 1
}
}
Or you can use dot annotation:
{
"index.number_of_replicas": 1
}
Or you can embed any of the aforementioned options in a settings
object. For example:
{
"settings": {
"index": {
"number_of_replicas": 1
}
}
}
NOTE: You can only define new analyzers on closed indices. To add an analyzer, you must close the index, define the analyzer, and reopen the index. You cannot close the write index of a data stream. To update the analyzer for a data stream's write index and future backing indices, update the analyzer in the index template used by the stream. Then roll over the data stream to apply the new analyzer to the stream's write index and future backing indices. This affects searches and any new data added to the stream after the rollover. However, it does not affect the data stream's backing indices or their existing data. To change the analyzer for existing backing indices, you must create a new data stream and reindex your data into it.
Path parameters
-
index
string | array[string] Required Comma-separated list of data streams, indices, and aliases used to limit the request. Supports wildcards (
*
). To target all data streams and indices, omit this parameter or use*
or_all
.
Query parameters
-
allow_no_indices
boolean If
false
, the request returns an error if any wildcard expression, index alias, or_all
value targets only missing or closed indices. This behavior applies even if the request targets other open indices. For example, a request targetingfoo*,bar*
returns an error if an index starts withfoo
but no index starts withbar
. -
expand_wildcards
string | array[string] Type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. Supports comma-separated values, such as
open,hidden
.Supported values include:
all
: Match any data stream or index, including hidden ones.open
: Match open, non-hidden indices. Also matches any non-hidden data stream.closed
: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.hidden
: Match hidden data streams and hidden indices. Must be combined withopen
,closed
, orboth
.none
: Wildcard expressions are not accepted.
Values are
all
,open
,closed
,hidden
, ornone
. -
flat_settings
boolean If
true
, returns settings in flat format. -
master_timeout
string Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.
Values are
-1
or0
. -
preserve_existing
boolean If
true
, existing index settings remain unchanged. -
reopen
boolean Whether to close and reopen the index to apply non-dynamic settings. If set to
true
the indices to which the settings are being applied will be closed temporarily and then reopened in order to apply the changes. -
timeout
string Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.
Values are
-1
or0
.
Body
Required
PUT /my-index-000001/_settings
{
"index" : {
"number_of_replicas" : 2
}
}
curl \
--request PUT 'http://api.example.com/{index}/_settings' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"index\" : {\n \"number_of_replicas\" : 2\n }\n}"'
{
"index" : {
"number_of_replicas" : 2
}
}
{
"index" : {
"refresh_interval" : null
}
}
{
"analysis" : {
"analyzer":{
"content":{
"type":"custom",
"tokenizer":"whitespace"
}
}
}
}
POST /my-index-000001/_open
Split an index
Added in 6.1.0
Split an index into a new index with more primary shards.
Before you can split an index:
The index must be read-only.
The cluster health status must be green.
You can do make an index read-only with the following request using the add index block API:
PUT /my_source_index/_block/write
The current write index on a data stream cannot be split. In order to split the current write index, the data stream must first be rolled over so that a new write index is created and then the previous write index can be split.
The number of times the index can be split (and the number of shards that each original shard can be split into) is determined by the index.number_of_routing_shards
setting.
The number of routing shards specifies the hashing space that is used internally to distribute documents across shards with consistent hashing.
For instance, a 5 shard index with number_of_routing_shards
set to 30 (5 x 2 x 3) could be split by a factor of 2 or 3.
A split operation:
- Creates a new target index with the same definition as the source index, but with a larger number of primary shards.
- Hard-links segments from the source index into the target index. If the file system doesn't support hard-linking, all segments are copied into the new index, which is a much more time consuming process.
- Hashes all documents again, after low level files are created, to delete documents that belong to a different shard.
- Recovers the target index as though it were a closed index which had just been re-opened.
IMPORTANT: Indices can only be split if they satisfy the following requirements:
- The target index must not exist.
- The source index must have fewer primary shards than the target index.
- The number of primary shards in the target index must be a multiple of the number of primary shards in the source index.
- The node handling the split process must have sufficient free disk space to accommodate a second copy of the existing index.
Query parameters
-
master_timeout
string Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.
Values are
-1
or0
. -
timeout
string Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.
Values are
-1
or0
. -
wait_for_active_shards
number | string The number of shard copies that must be active before proceeding with the operation. Set to
all
or any positive integer up to the total number of shards in the index (number_of_replicas+1
).Values are
all
orindex-setting
.
POST /my-index-000001/_split/split-my-index-000001
{
"settings": {
"index.number_of_shards": 2
}
}
curl \
--request POST 'http://api.example.com/{index}/_split/{target}' \
--header "Authorization: $API_KEY" \
--header "Content-Type: application/json" \
--data '"{\n \"settings\": {\n \"index.number_of_shards\": 2\n }\n}"'
{
"settings": {
"index.number_of_shards": 2
}
}
Retry a policy
Added in 6.6.0
Retry running the lifecycle policy for an index that is in the ERROR step. The API sets the policy back to the step where the error occurred and runs the step. Use the explain lifecycle state API to determine whether an index is in the ERROR step.
Path parameters
-
index
string Required The name of the indices (comma-separated) whose failed lifecycle step is to be retry
curl \
--request POST 'http://api.example.com/{index}/_ilm/retry' \
--header "Authorization: $API_KEY"
Inference
Inference APIs enable you to use certain services, such as built-in machine learning models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Azure, Google AI Studio or Hugging Face. For built-in models and models uploaded through Eland, the inference APIs offer an alternative way to use and manage trained models. However, if you do not plan to use the inference APIs to use these models or if you want to use non-NLP models, use the machine learning trained model APIs.