Get a document count
Generally available
All methods and paths for this operation:
Get quick access to a document count for a data stream, an index, or an entire cluster. The document count only includes live documents, not deleted documents which have not yet been removed by the merge process.
IMPORTANT: CAT APIs are only intended for human consumption using the command line or Kibana console. They are not intended for use by applications. For application consumption, use the count API.
Required authorization
- Index privileges:
read
Path parameters
-
A comma-separated list of data streams, indices, and aliases used to limit the request. It supports wildcards (
*
). To target all data streams and indices, omit this parameter or use*
or_all
.
Query parameters
-
A comma-separated list of columns names to display. It supports simple wildcards.
Supported values include:
epoch
(ort
,time
): The Unix epoch time in seconds since 1970-01-01 00:00:00.timestamp
(orts
,hms
,hhmmss
): The current time in HH:MM:SS format.count
(ordc
,docs.count
,docsCount
): The document count in the cluster or index.
Values are
epoch
,t
,time
,timestamp
,ts
,hms
,hhmmss
,count
,dc
,docs.count
, ordocsCount
. -
List of columns that determine how the table should be sorted. Sorting defaults to ascending and can be changed by setting
:asc
or:desc
as a suffix to the column name.
GET /_cat/count/my-index-000001?v=true&format=json
resp = client.cat.count(
index="my-index-000001",
v=True,
format="json",
)
const response = await client.cat.count({
index: "my-index-000001",
v: "true",
format: "json",
});
response = client.cat.count(
index: "my-index-000001",
v: "true",
format: "json"
)
$resp = $client->cat()->count([
"index" => "my-index-000001",
"v" => "true",
"format" => "json",
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_cat/count/my-index-000001?v=true&format=json"
client.cat().count();
[
{
"epoch": "1475868259",
"timestamp": "15:24:20",
"count": "120"
}
]
[
{
"epoch": "1475868259",
"timestamp": "15:24:20",
"count": "121"
}
]
Get the cluster health status
Generally available
IMPORTANT: CAT APIs are only intended for human consumption using the command line or Kibana console.
They are not intended for use by applications. For application consumption, use the cluster health API.
This API is often used to check malfunctioning clusters.
To help you track cluster health alongside log files and alerting systems, the API returns timestamps in two formats:
HH:MM:SS
, which is human-readable but includes no date information;
Unix epoch time
, which is machine-sortable and includes date information.
The latter format is useful for cluster recoveries that take multiple days.
You can use the cat health API to verify cluster health across multiple nodes.
You also can use the API to track the recovery of a large cluster over a longer period of time.
Required authorization
- Cluster privileges:
monitor
Query parameters
-
The unit used to display time values.
Values are
nanos
,micros
,ms
,s
,m
,h
, ord
. -
If true, returns
HH:MM:SS
and Unix epoch timestamps. -
List of columns to appear in the response. Supports simple wildcards.
-
List of columns that determine how the table should be sorted. Sorting defaults to ascending and can be changed by setting
:asc
or:desc
as a suffix to the column name.
GET /_cat/health?v=true&format=json
resp = client.cat.health(
v=True,
format="json",
)
const response = await client.cat.health({
v: "true",
format: "json",
});
response = client.cat.health(
v: "true",
format: "json"
)
$resp = $client->cat()->health([
"v" => "true",
"format" => "json",
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_cat/health?v=true&format=json"
client.cat().health();
[
{
"epoch": "1475871424",
"timestamp": "16:17:04",
"cluster": "elasticsearch",
"status": "green",
"node.total": "1",
"node.data": "1",
"shards": "1",
"pri": "1",
"relo": "0",
"init": "0",
"unassign": "0",
"unassign.pri": "0",
"pending_tasks": "0",
"max_task_wait_time": "-",
"active_shards_percent": "100.0%"
}
]
Get data frame analytics jobs
Generally available; Added in 7.7.0
All methods and paths for this operation:
Get configuration and usage information about data frame analytics jobs.
IMPORTANT: CAT APIs are only intended for human consumption using the Kibana console or command line. They are not intended for use by applications. For application consumption, use the get data frame analytics jobs statistics API.
Required authorization
- Cluster privileges:
monitor_ml
Query parameters
-
Whether to ignore if a wildcard expression matches no configs. (This includes
_all
string or when no configs have been specified) -
The unit in which to display byte values
Values are
b
,kb
,mb
,gb
,tb
, orpb
. -
Comma-separated list of column names to display.
Supported values include:
assignment_explanation
(orae
): Contains messages relating to the selection of a node.create_time
(orct
,createTime
): The time when the data frame analytics job was created.description
(ord
): A description of a job.dest_index
(ordi
,destIndex
): Name of the destination index.failure_reason
(orfr
,failureReason
): Contains messages about the reason why a data frame analytics job failed.id
: Identifier for the data frame analytics job.model_memory_limit
(ormml
,modelMemoryLimit
): The approximate maximum amount of memory resources that are permitted for the data frame analytics job.node.address
(orna
,nodeAddress
): The network address of the node that the data frame analytics job is assigned to.node.ephemeral_id
(orne
,nodeEphemeralId
): The ephemeral ID of the node that the data frame analytics job is assigned to.node.id
(orni
,nodeId
): The unique identifier of the node that the data frame analytics job is assigned to.node.name
(ornn
,nodeName
): The name of the node that the data frame analytics job is assigned to.progress
(orp
): The progress report of the data frame analytics job by phase.source_index
(orsi
,sourceIndex
): Name of the source index.state
(ors
): Current state of the data frame analytics job.type
(ort
): The type of analysis that the data frame analytics job performs.version
(orv
): The Elasticsearch version number in which the data frame analytics job was created.
-
Comma-separated list of column names or column aliases used to sort the response.
Supported values include:
assignment_explanation
(orae
): Contains messages relating to the selection of a node.create_time
(orct
,createTime
): The time when the data frame analytics job was created.description
(ord
): A description of a job.dest_index
(ordi
,destIndex
): Name of the destination index.failure_reason
(orfr
,failureReason
): Contains messages about the reason why a data frame analytics job failed.id
: Identifier for the data frame analytics job.model_memory_limit
(ormml
,modelMemoryLimit
): The approximate maximum amount of memory resources that are permitted for the data frame analytics job.node.address
(orna
,nodeAddress
): The network address of the node that the data frame analytics job is assigned to.node.ephemeral_id
(orne
,nodeEphemeralId
): The ephemeral ID of the node that the data frame analytics job is assigned to.node.id
(orni
,nodeId
): The unique identifier of the node that the data frame analytics job is assigned to.node.name
(ornn
,nodeName
): The name of the node that the data frame analytics job is assigned to.progress
(orp
): The progress report of the data frame analytics job by phase.source_index
(orsi
,sourceIndex
): Name of the source index.state
(ors
): Current state of the data frame analytics job.type
(ort
): The type of analysis that the data frame analytics job performs.version
(orv
): The Elasticsearch version number in which the data frame analytics job was created.
-
Unit used to display time values.
Values are
nanos
,micros
,ms
,s
,m
,h
, ord
.
GET _cat/ml/data_frame/analytics?v=true&format=json
resp = client.cat.ml_data_frame_analytics(
v=True,
format="json",
)
const response = await client.cat.mlDataFrameAnalytics({
v: "true",
format: "json",
});
response = client.cat.ml_data_frame_analytics(
v: "true",
format: "json"
)
$resp = $client->cat()->mlDataFrameAnalytics([
"v" => "true",
"format" => "json",
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_cat/ml/data_frame/analytics?v=true&format=json"
client.cat().mlDataFrameAnalytics();
[
{
"id": "classifier_job_1",
"type": "classification",
"create_time": "2020-02-12T11:49:09.594Z",
"state": "stopped"
},
{
"id": "classifier_job_2",
"type": "classification",
"create_time": "2020-02-12T11:49:14.479Z",
"state": "stopped"
},
{
"id": "classifier_job_3",
"type": "classification",
"create_time": "2020-02-12T11:49:16.928Z",
"state": "stopped"
},
{
"id": "classifier_job_4",
"type": "classification",
"create_time": "2020-02-12T11:49:19.127Z",
"state": "stopped"
},
{
"id": "classifier_job_5",
"type": "classification",
"create_time": "2020-02-12T11:49:21.349Z",
"state": "stopped"
}
]
Get node statistics
Generally available
All methods and paths for this operation:
Get statistics for nodes in a cluster. By default, all stats are returned. You can limit the returned information by using metrics.
Required authorization
- Cluster privileges:
monitor
,manage
Path parameters
-
Comma-separated list of node IDs or names used to limit returned information.
-
Limit the information returned to the specified metrics
-
Limit the information returned for indices metric to the specific index metrics. It can be used only if indices (or all) metric is specified.
Query parameters
-
Comma-separated list or wildcard expressions of fields to include in fielddata and suggest statistics.
-
Comma-separated list or wildcard expressions of fields to include in fielddata statistics.
-
Comma-separated list or wildcard expressions of fields to include in the statistics.
-
Comma-separated list of search groups to include in the search statistics.
-
If true, the call reports the aggregated disk usage of each one of the Lucene index files (only applies if segment stats are requested).
-
Indicates whether statistics are aggregated at the cluster, index, or shard level.
Values are
cluster
,indices
, orshards
. -
Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.
Values are
-1
or0
. -
A comma-separated list of document types for the indexing index metric.
-
If
true
, the response includes information from segments that are not loaded into memory.
GET _nodes/stats/process?filter_path=**.max_file_descriptors
resp = client.nodes.stats(
metric="process",
filter_path="**.max_file_descriptors",
)
const response = await client.nodes.stats({
metric: "process",
filter_path: "**.max_file_descriptors",
});
response = client.nodes.stats(
metric: "process",
filter_path: "**.max_file_descriptors"
)
$resp = $client->nodes()->stats([
"metric" => "process",
"filter_path" => "**.max_file_descriptors",
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_nodes/stats/process?filter_path=**.max_file_descriptors"
Get follower information
Generally available; Added in 6.7.0
Get information about all cross-cluster replication follower indices. For example, the results include follower index names, leader index names, replication options, and whether the follower indices are active or paused.
Required authorization
- Cluster privileges:
monitor
GET /follower_index/_ccr/info
resp = client.ccr.follow_info(
index="follower_index",
)
const response = await client.ccr.followInfo({
index: "follower_index",
});
response = client.ccr.follow_info(
index: "follower_index"
)
$resp = $client->ccr()->followInfo([
"index" => "follower_index",
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/follower_index/_ccr/info"
client.ccr().followInfo(f -> f
.index("follower_index")
);
{
"follower_indices": [
{
"follower_index": "follower_index",
"remote_cluster": "remote_cluster",
"leader_index": "leader_index",
"status": "active",
"parameters": {
"max_read_request_operation_count": 5120,
"max_read_request_size": "32mb",
"max_outstanding_read_requests": 12,
"max_write_request_operation_count": 5120,
"max_write_request_size": "9223372036854775807b",
"max_outstanding_write_requests": 9,
"max_write_buffer_count": 2147483647,
"max_write_buffer_size": "512mb",
"max_retry_delay": "500ms",
"read_poll_timeout": "1m"
}
}
]
}
{
"follower_indices": [
{
"follower_index": "follower_index",
"remote_cluster": "remote_cluster",
"leader_index": "leader_index",
"status": "paused"
}
]
}
Create or update a document in an index
Generally available
All methods and paths for this operation:
Add a JSON document to the specified data stream or index and make it searchable. If the target is an index and the document already exists, the request updates the document and increments its version.
NOTE: You cannot use this API to send update requests for existing documents in a data stream.
If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or index alias:
- To add or overwrite a document using the
PUT /<target>/_doc/<_id>
request format, you must have thecreate
,index
, orwrite
index privilege. - To add a document using the
POST /<target>/_doc/
request format, you must have thecreate_doc
,create
,index
, orwrite
index privilege. - To automatically create a data stream or index with this API request, you must have the
auto_configure
,create_index
, ormanage
index privilege.
Automatic data stream creation requires a matching index template with data stream enabled.
NOTE: Replica shards might not all be started when an indexing operation returns successfully.
By default, only the primary is required. Set wait_for_active_shards
to change this default behavior.
Automatically create data streams and indices
If the request's target doesn't exist and matches an index template with a data_stream
definition, the index operation automatically creates the data stream.
If the target doesn't exist and doesn't match a data stream template, the operation automatically creates the index and applies any matching index templates.
NOTE: Elasticsearch includes several built-in index templates. To avoid naming collisions with these templates, refer to index pattern documentation.
If no mapping exists, the index operation creates a dynamic mapping. By default, new fields and objects are automatically added to the mapping if needed.
Automatic index creation is controlled by the action.auto_create_index
setting.
If it is true
, any index can be created automatically.
You can modify this setting to explicitly allow or block automatic creation of indices that match specified patterns or set it to false
to turn off automatic index creation entirely.
Specify a comma-separated list of patterns you want to allow or prefix each pattern with +
or -
to indicate whether it should be allowed or blocked.
When a list is specified, the default behaviour is to disallow.
NOTE: The action.auto_create_index
setting affects the automatic creation of indices only.
It does not affect the creation of data streams.
Optimistic concurrency control
Index operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the if_seq_no
and if_primary_term
parameters.
If a mismatch is detected, the operation will result in a VersionConflictException
and a status code of 409
.
Routing
By default, shard placement — or routing — is controlled by using a hash of the document's ID value.
For more explicit control, the value fed into the hash function used by the router can be directly specified on a per-operation basis using the routing
parameter.
When setting up explicit mapping, you can also use the _routing
field to direct the index operation to extract the routing value from the document itself.
This does come at the (very minimal) cost of an additional document parsing pass.
If the _routing
mapping is defined and set to be required, the index operation will fail if no routing value is provided or extracted.
NOTE: Data streams do not support custom routing unless they were created with the allow_custom_routing
setting enabled in the template.
Distributed
The index operation is directed to the primary shard based on its route and performed on the actual node containing this shard. After the primary shard completes the operation, if needed, the update is distributed to applicable replicas.
Active shards
To improve the resiliency of writes to the system, indexing operations can be configured to wait for a certain number of active shard copies before proceeding with the operation.
If the requisite number of active shard copies are not available, then the write operation must wait and retry, until either the requisite shard copies have started or a timeout occurs.
By default, write operations only wait for the primary shards to be active before proceeding (that is to say wait_for_active_shards
is 1
).
This default can be overridden in the index settings dynamically by setting index.write.wait_for_active_shards
.
To alter this behavior per operation, use the wait_for_active_shards request
parameter.
Valid values are all or any positive integer up to the total number of configured copies per shard in the index (which is number_of_replicas
+1).
Specifying a negative value or a number greater than the number of shard copies will throw an error.
For example, suppose you have a cluster of three nodes, A, B, and C and you create an index index with the number of replicas set to 3 (resulting in 4 shard copies, one more copy than there are nodes).
If you attempt an indexing operation, by default the operation will only ensure the primary copy of each shard is available before proceeding.
This means that even if B and C went down and A hosted the primary shard copies, the indexing operation would still proceed with only one copy of the data.
If wait_for_active_shards
is set on the request to 3
(and all three nodes are up), the indexing operation will require 3 active shard copies before proceeding.
This requirement should be met because there are 3 active nodes in the cluster, each one holding a copy of the shard.
However, if you set wait_for_active_shards
to all
(or to 4
, which is the same in this situation), the indexing operation will not proceed as you do not have all 4 copies of each shard active in the index.
The operation will timeout unless a new node is brought up in the cluster to host the fourth copy of the shard.
It is important to note that this setting greatly reduces the chances of the write operation not writing to the requisite number of shard copies, but it does not completely eliminate the possibility, because this check occurs before the write operation starts.
After the write operation is underway, it is still possible for replication to fail on any number of shard copies but still succeed on the primary.
The _shards
section of the API response reveals the number of shard copies on which replication succeeded and failed.
No operation (noop) updates
When updating a document by using this API, a new version of the document is always created even if the document hasn't changed.
If this isn't acceptable use the _update
API with detect_noop
set to true
.
The detect_noop
option isn't available on this API because it doesn’t fetch the old source and isn't able to compare it against the new source.
There isn't a definitive rule for when noop updates aren't acceptable. It's a combination of lots of factors like how frequently your data source sends updates that are actually noops and how many queries per second Elasticsearch runs on the shard receiving the updates.
Versioning
Each indexed document is given a version number.
By default, internal versioning is used that starts at 1 and increments with each update, deletes included.
Optionally, the version number can be set to an external value (for example, if maintained in a database).
To enable this functionality, version_type
should be set to external
.
The value provided must be a numeric, long value greater than or equal to 0, and less than around 9.2e+18
.
NOTE: Versioning is completely real time, and is not affected by the near real time aspects of search operations. If no version is provided, the operation runs without any version checks.
When using the external version type, the system checks to see if the version number passed to the index request is greater than the version of the currently stored document. If true, the document will be indexed and the new version number used. If the value provided is less than or equal to the stored document's version number, a version conflict will occur and the index operation will fail. For example:
PUT my-index-000001/_doc/1?version=2&version_type=external
{
"user": {
"id": "elkbee"
}
}
In this example, the operation will succeed since the supplied version of 2 is higher than the current document version of 1.
If the document was already updated and its version was set to 2 or higher, the indexing command will fail and result in a conflict (409 HTTP status code).
A nice side effect is that there is no need to maintain strict ordering of async indexing operations run as a result of changes to a source database, as long as version numbers from the source database are used.
Even the simple case of updating the Elasticsearch index using data from a database is simplified if external versioning is used, as only the latest version will be used if the index operations arrive out of order.
## Required authorization
* Index privileges: `index`
Path parameters
-
The name of the data stream or index to target. If the target doesn't exist and matches the name or wildcard (
*
) pattern of an index template with adata_stream
definition, this request creates the data stream. If the target doesn't exist and doesn't match a data stream template, this request creates the index. You can check for existing targets with the resolve index API. -
A unique identifier for the document. To automatically generate a document ID, use the
POST /<target>/_doc/
request format and omit this parameter.
Query parameters
-
Only perform the operation if the document has this primary term.
-
Only perform the operation if the document has this sequence number.
-
True or false if to include the document source in the error message in case of parsing errors.
-
Set to
create
to only index the document if it does not already exist (put if absent). If a document with the specified_id
already exists, the indexing operation will fail. The behavior is the same as using the<index>/_create
endpoint. If a document ID is specified, this paramater defaults toindex
. Otherwise, it defaults tocreate
. If the request targets a data stream, anop_type
ofcreate
is required.Supported values include:
index
: Overwrite any documents that already exist.create
: Only index documents that do not already exist.
Values are
index
orcreate
. -
The ID of the pipeline to use to preprocess incoming documents. If the index has a default ingest pipeline specified, then setting the value to
_none
disables the default ingest pipeline for this request. If a final pipeline is configured it will always run, regardless of the value of this parameter. -
If
true
, Elasticsearch refreshes the affected shards to make this operation visible to search. Ifwait_for
, it waits for a refresh to make this operation visible to search. Iffalse
, it does nothing with refreshes.Values are
true
,false
, orwait_for
. -
A custom value that is used to route operations to a specific shard.
-
The period the request waits for the following operations: automatic index creation, dynamic mapping updates, waiting for active shards.
This parameter is useful for situations where the primary shard assigned to perform the operation might not be available when the operation runs. Some reasons for this might be that the primary shard is currently recovering from a gateway or undergoing relocation. By default, the operation will wait on the primary shard to become available for at least 1 minute before failing and responding with an error. The actual wait time could be longer, particularly when multiple waits occur.
Values are
-1
or0
. -
An explicit version number for concurrency control. It must be a non-negative long number.
-
The version type.
Supported values include:
internal
: Use internal versioning that starts at 1 and increments with each update or delete.external
: Only index the document if the specified version is strictly higher than the version of the stored document or if there is no existing document.external_gte
: Only index the document if the specified version is equal or higher than the version of the stored document or if there is no existing document. NOTE: Theexternal_gte
version type is meant for special use cases and should be used with care. If used incorrectly, it can result in loss of data.force
: This option is deprecated because it can cause primary and replica shards to diverge.
Values are
internal
,external
,external_gte
, orforce
. -
The number of shard copies that must be active before proceeding with the operation. You can set it to
all
or any positive integer up to the total number of shards in the index (number_of_replicas+1
). The default value of1
means it waits for each primary shard to be active.Values are
all
orindex-setting
. -
If
true
, the destination must be an index alias. -
If
true
, the request's actions must target a data stream (existing or to be created).
POST my-index-000001/_doc/
{
"@timestamp": "2099-11-15T13:12:00",
"message": "GET /search HTTP/1.1 200 1070000",
"user": {
"id": "kimchy"
}
}
resp = client.index(
index="my-index-000001",
document={
"@timestamp": "2099-11-15T13:12:00",
"message": "GET /search HTTP/1.1 200 1070000",
"user": {
"id": "kimchy"
}
},
)
const response = await client.index({
index: "my-index-000001",
document: {
"@timestamp": "2099-11-15T13:12:00",
message: "GET /search HTTP/1.1 200 1070000",
user: {
id: "kimchy",
},
},
});
response = client.index(
index: "my-index-000001",
body: {
"@timestamp": "2099-11-15T13:12:00",
"message": "GET /search HTTP/1.1 200 1070000",
"user": {
"id": "kimchy"
}
}
)
$resp = $client->index([
"index" => "my-index-000001",
"body" => [
"@timestamp" => "2099-11-15T13:12:00",
"message" => "GET /search HTTP/1.1 200 1070000",
"user" => [
"id" => "kimchy",
],
],
]);
curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"@timestamp":"2099-11-15T13:12:00","message":"GET /search HTTP/1.1 200 1070000","user":{"id":"kimchy"}}' "$ELASTICSEARCH_URL/my-index-000001/_doc/"
client.index(i -> i
.index("my-index-000001")
.document(JsonData.fromJson("{\"@timestamp\":\"2099-11-15T13:12:00\",\"message\":\"GET /search HTTP/1.1 200 1070000\",\"user\":{\"id\":\"kimchy\"}}"))
);
{
"@timestamp": "2099-11-15T13:12:00",
"message": "GET /search HTTP/1.1 200 1070000",
"user": {
"id": "kimchy"
}
}
{
"@timestamp": "2099-11-15T13:12:00",
"message": "GET /search HTTP/1.1 200 1070000",
"user": {
"id": "kimchy"
}
}
{
"_shards": {
"total": 2,
"failed": 0,
"successful": 2
},
"_index": "my-index-000001",
"_id": "W0tpsmIBdwcYyG50zbta",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"result": "created"
}
{
"_shards": {
"total": 2,
"failed": 0,
"successful": 2
},
"_index": "my-index-000001",
"_id": "1",
"_version": 1,
"_seq_no": 0,
"_primary_term": 1,
"result": "created"
}
Update documents
Generally available; Added in 2.4.0
Updates documents that match the specified query. If no query is specified, performs an update on every document in the data stream or index without modifying the source, which is useful for picking up mapping changes.
If the Elasticsearch security features are enabled, you must have the following index privileges for the target data stream, index, or alias:
read
index
orwrite
You can specify the query criteria in the request URI or the request body using the same syntax as the search API.
When you submit an update by query request, Elasticsearch gets a snapshot of the data stream or index when it begins processing the request and updates matching documents using internal versioning.
When the versions match, the document is updated and the version number is incremented.
If a document changes between the time that the snapshot is taken and the update operation is processed, it results in a version conflict and the operation fails.
You can opt to count version conflicts instead of halting and returning by setting conflicts
to proceed
.
Note that if you opt to count version conflicts, the operation could attempt to update more documents from the source than max_docs
until it has successfully updated max_docs
documents or it has gone through every document in the source query.
NOTE: Documents with a version equal to 0 cannot be updated using update by query because internal versioning does not support 0 as a valid version number.
While processing an update by query request, Elasticsearch performs multiple search requests sequentially to find all of the matching documents. A bulk update request is performed for each batch of matching documents. Any query or update failures cause the update by query request to fail and the failures are shown in the response. Any update requests that completed successfully still stick, they are not rolled back.
Throttling update requests
To control the rate at which update by query issues batches of update operations, you can set requests_per_second
to any positive decimal number.
This pads each batch with a wait time to throttle the rate.
Set requests_per_second
to -1
to turn off throttling.
Throttling uses a wait time between batches so that the internal scroll requests can be given a timeout that takes the request padding into account.
The padding time is the difference between the batch size divided by the requests_per_second
and the time spent writing.
By default the batch size is 1000, so if requests_per_second
is set to 500
:
target_time = 1000 / 500 per second = 2 seconds
wait_time = target_time - write_time = 2 seconds - .5 seconds = 1.5 seconds
Since the batch is issued as a single _bulk request, large batch sizes cause Elasticsearch to create many requests and wait before starting the next set. This is "bursty" instead of "smooth".
Slicing
Update by query supports sliced scroll to parallelize the update process. This can improve efficiency and provide a convenient way to break the request down into smaller parts.
Setting slices
to auto
chooses a reasonable number for most data streams and indices.
This setting will use one slice per shard, up to a certain limit.
If there are multiple source data streams or indices, it will choose the number of slices based on the index or backing index with the smallest number of shards.
Adding slices
to _update_by_query
just automates the manual process of creating sub-requests, which means it has some quirks:
- You can see these requests in the tasks APIs. These sub-requests are "child" tasks of the task for the request with slices.
- Fetching the status of the task for the request with
slices
only contains the status of completed slices. - These sub-requests are individually addressable for things like cancellation and rethrottling.
- Rethrottling the request with
slices
will rethrottle the unfinished sub-request proportionally. - Canceling the request with slices will cancel each sub-request.
- Due to the nature of slices each sub-request won't get a perfectly even portion of the documents. All documents will be addressed, but some slices may be larger than others. Expect larger slices to have a more even distribution.
- Parameters like
requests_per_second
andmax_docs
on a request with slices are distributed proportionally to each sub-request. Combine that with the point above about distribution being uneven and you should conclude that usingmax_docs
withslices
might not result in exactlymax_docs
documents being updated. - Each sub-request gets a slightly different snapshot of the source data stream or index though these are all taken at approximately the same time.
If you're slicing manually or otherwise tuning automatic slicing, keep in mind that:
- Query performance is most efficient when the number of slices is equal to the number of shards in the index or backing index. If that number is large (for example, 500), choose a lower number as too many slices hurts performance. Setting slices higher than the number of shards generally does not improve efficiency and adds overhead.
- Update performance scales linearly across available resources with the number of slices.
Whether query or update performance dominates the runtime depends on the documents being reindexed and cluster resources.
Update the document source
Update by query supports scripts to update the document source.
As with the update API, you can set ctx.op
to change the operation that is performed.
Set ctx.op = "noop"
if your script decides that it doesn't have to make any changes.
The update by query operation skips updating the document and increments the noop
counter.
Set ctx.op = "delete"
if your script decides that the document should be deleted.
The update by query operation deletes the document and increments the deleted
counter.
Update by query supports only index
, noop
, and delete
.
Setting ctx.op
to anything else is an error.
Setting any other field in ctx
is an error.
This API enables you to only modify the source of matching documents; you cannot move them.
Required authorization
- Index privileges:
read
,write
Path parameters
-
A comma-separated list of data streams, indices, and aliases to search. It supports wildcards (
*
). To search all data streams or indices, omit this parameter or use*
or_all
.
Query parameters
-
If
false
, the request returns an error if any wildcard expression, index alias, or_all
value targets only missing or closed indices. This behavior applies even if the request targets other open indices. For example, a request targetingfoo*,bar*
returns an error if an index starts withfoo
but no index starts withbar
. -
The analyzer to use for the query string. This parameter can be used only when the
q
query string parameter is specified. -
If
true
, wildcard and prefix queries are analyzed. This parameter can be used only when theq
query string parameter is specified. -
The preferred behavior when update by query hits version conflicts:
abort
orproceed
.Supported values include:
abort
: Stop reindexing if there are conflicts.proceed
: Continue reindexing even if there are conflicts.
Values are
abort
orproceed
. -
The default operator for query string query:
AND
orOR
. This parameter can be used only when theq
query string parameter is specified.Values are
and
,AND
,or
, orOR
. -
The field to use as default where no field prefix is given in the query string. This parameter can be used only when the
q
query string parameter is specified. -
The type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. It supports comma-separated values, such as
open,hidden
.Supported values include:
all
: Match any data stream or index, including hidden ones.open
: Match open, non-hidden indices. Also matches any non-hidden data stream.closed
: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.hidden
: Match hidden data streams and hidden indices. Must be combined withopen
,closed
, orboth
.none
: Wildcard expressions are not accepted.
Values are
all
,open
,closed
,hidden
, ornone
. -
Skips the specified number of documents.
-
If
true
, format-based query failures (such as providing text to a numeric field) in the query string will be ignored. This parameter can be used only when theq
query string parameter is specified. -
The maximum number of documents to process. It defaults to all documents. When set to a value less then or equal to
scroll_size
then a scroll will not be used to retrieve the results for the operation. -
The ID of the pipeline to use to preprocess incoming documents. If the index has a default ingest pipeline specified, then setting the value to
_none
disables the default ingest pipeline for this request. If a final pipeline is configured it will always run, regardless of the value of this parameter. -
The node or shard the operation should be performed on. It is random by default.
-
A query in the Lucene query string syntax.
-
If
true
, Elasticsearch refreshes affected shards to make the operation visible to search after the request completes. This is different than the update API'srefresh
parameter, which causes just the shard that received the request to be refreshed. -
If
true
, the request cache is used for this request. It defaults to the index-level setting. -
The throttle for this request in sub-requests per second.
-
A custom value used to route operations to a specific shard.
-
The period to retain the search context for scrolling.
Values are
-1
or0
. -
The size of the scroll request that powers the operation.
-
An explicit timeout for each search request. By default, there is no timeout.
Values are
-1
or0
. -
The type of the search operation. Available options include
query_then_fetch
anddfs_query_then_fetch
.Supported values include:
query_then_fetch
: Documents are scored using local term and document frequencies for the shard. This is usually faster but less accurate.dfs_query_then_fetch
: Documents are scored using global term and document frequencies across all shards. This is usually slower but more accurate.
Values are
query_then_fetch
ordfs_query_then_fetch
. -
The number of slices this task should be divided into.
Value is
auto
. -
A comma-separated list of : pairs.
-
The specific
tag
of the request for logging and statistical purposes. -
The maximum number of documents to collect for each shard. If a query reaches this limit, Elasticsearch terminates the query early. Elasticsearch collects documents before sorting.
IMPORTANT: Use with caution. Elasticsearch applies this parameter to each shard handling the request. When possible, let Elasticsearch perform early termination automatically. Avoid specifying this parameter for requests that target data streams with backing indices across multiple data tiers.
-
The period each update request waits for the following operations: dynamic mapping updates, waiting for active shards. By default, it is one minute. This guarantees Elasticsearch waits for at least the timeout before failing. The actual wait time could be longer, particularly when multiple waits occur.
Values are
-1
or0
. -
If
true
, returns the document version as part of a hit. -
Should the document increment the version number (internal) on hit or not (reindex)
-
The number of shard copies that must be active before proceeding with the operation. Set to
all
or any positive integer up to the total number of shards in the index (number_of_replicas+1
). Thetimeout
parameter controls how long each write request waits for unavailable shards to become available. Both work exactly the way they work in the bulk API.Values are
all
orindex-setting
. -
If
true
, the request blocks until the operation is complete. Iffalse
, Elasticsearch performs some preflight checks, launches the request, and returns a task ID that you can use to cancel or get the status of the task. Elasticsearch creates a record of this task as a document at.tasks/task/${taskId}
.
Body
-
The maximum number of documents to update.
-
The documents to update using the Query DSL.
External documentation -
The script to run to update the document source or metadata when updating.
-
Slice the request manually using the provided slice ID and total number of slices.
-
The preferred behavior when update by query hits version conflicts:
abort
orproceed
.Supported values include:
abort
: Stop reindexing if there are conflicts.proceed
: Continue reindexing even if there are conflicts.
Values are
abort
orproceed
.
POST my-index-000001/_update_by_query?conflicts=proceed
{
"query": {
"term": {
"user.id": "kimchy"
}
}
}
resp = client.update_by_query(
index="my-index-000001",
conflicts="proceed",
query={
"term": {
"user.id": "kimchy"
}
},
)
const response = await client.updateByQuery({
index: "my-index-000001",
conflicts: "proceed",
query: {
term: {
"user.id": "kimchy",
},
},
});
response = client.update_by_query(
index: "my-index-000001",
conflicts: "proceed",
body: {
"query": {
"term": {
"user.id": "kimchy"
}
}
}
)
$resp = $client->updateByQuery([
"index" => "my-index-000001",
"conflicts" => "proceed",
"body" => [
"query" => [
"term" => [
"user.id" => "kimchy",
],
],
],
]);
curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"query":{"term":{"user.id":"kimchy"}}}' "$ELASTICSEARCH_URL/my-index-000001/_update_by_query?conflicts=proceed"
client.updateByQuery(u -> u
.conflicts(Conflicts.Proceed)
.index("my-index-000001")
.query(q -> q
.term(t -> t
.field("user.id")
.value(FieldValue.of("kimchy"))
)
)
);
{
"query": {
"term": {
"user.id": "kimchy"
}
}
}
{
"script": {
"source": "ctx._source.count++",
"lang": "painless"
},
"query": {
"term": {
"user.id": "kimchy"
}
}
}
{
"slice": {
"id": 0,
"max": 2
},
"script": {
"source": "ctx._source['extra'] = 'test'"
}
}
{
"script": {
"source": "ctx._source['extra'] = 'test'"
}
}
Reset the features
Technical preview; Added in 7.12.0
Clear all of the state information stored in system indices by Elasticsearch features, including the security and machine learning indices.
WARNING: Intended for development and testing use only. Do not reset features on a production cluster.
Return a cluster to the same state as a new installation by resetting the feature state for all Elasticsearch features. This deletes all state information stored in system indices.
The response code is HTTP 200 if the state is successfully reset for all features. It is HTTP 500 if the reset operation failed for any feature.
Note that select features might provide a way to reset particular system indices. Using this API resets all features, both those that are built-in and implemented as plugins.
To list the features that will be affected, use the get features API.
IMPORTANT: The features installed on the node you submit this request to are the features that will be reset. Run on the master node if you have any doubts about which plugins are installed on individual nodes.
POST /_features/_reset
resp = client.features.reset_features()
const response = await client.features.resetFeatures();
response = client.features.reset_features
$resp = $client->features()->resetFeatures();
curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_features/_reset"
client.features().resetFeatures(r -> r);
{
"features" : [
{
"feature_name" : "security",
"status" : "SUCCESS"
},
{
"feature_name" : "tasks",
"status" : "SUCCESS"
}
]
}
Delete a dangling index
Generally available; Added in 7.9.0
If Elasticsearch encounters index data that is absent from the current cluster state, those indices are considered to be dangling.
For example, this can happen if you delete more than cluster.indices.tombstones.size
indices while an Elasticsearch node is offline.
Required authorization
- Cluster privileges:
manage
DELETE /_dangling/<index-uuid>?accept_data_loss=true
resp = client.dangling_indices.delete_dangling_index(
index_uuid="<index-uuid>",
accept_data_loss=True,
)
const response = await client.danglingIndices.deleteDanglingIndex({
index_uuid: "<index-uuid>",
accept_data_loss: "true",
});
response = client.dangling_indices.delete_dangling_index(
index_uuid: "<index-uuid>",
accept_data_loss: "true"
)
$resp = $client->danglingIndices()->deleteDanglingIndex([
"index_uuid" => "<index-uuid>",
"accept_data_loss" => "true",
]);
curl -X DELETE -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_dangling/<index-uuid>?accept_data_loss=true"
client.danglingIndices().deleteDanglingIndex(d -> d
.acceptDataLoss(true)
.indexUuid("<index-uuid>")
);
Create an index
Generally available
You can use the create index API to add a new index to an Elasticsearch cluster. When creating an index, you can specify the following:
- Settings for the index.
- Mappings for fields in the index.
- Index aliases
Wait for active shards
By default, index creation will only return a response to the client when the primary copies of each shard have been started, or the request times out.
The index creation response will indicate what happened.
For example, acknowledged
indicates whether the index was successfully created in the cluster, while shards_acknowledged
indicates whether the requisite number of shard copies were started for each shard in the index before timing out.
Note that it is still possible for either acknowledged
or shards_acknowledged
to be false
, but for the index creation to be successful.
These values simply indicate whether the operation completed before the timeout.
If acknowledged
is false, the request timed out before the cluster state was updated with the newly created index, but it probably will be created sometime soon.
If shards_acknowledged
is false, then the request timed out before the requisite number of shards were started (by default just the primaries), even if the cluster state was successfully updated to reflect the newly created index (that is to say, acknowledged
is true
).
You can change the default of only waiting for the primary shards to start through the index setting index.write.wait_for_active_shards
.
Note that changing this setting will also affect the wait_for_active_shards
value on all subsequent write operations.
Required authorization
- Index privileges:
create_index
,manage
Query parameters
-
Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.
Values are
-1
or0
. -
Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.
Values are
-1
or0
. -
The number of shard copies that must be active before proceeding with the operation. Set to
all
or any positive integer up to the total number of shards in the index (number_of_replicas+1
).Values are
all
orindex-setting
.
Body
-
Aliases for the index.
-
Mapping for fields in the index. If specified, this mapping can include:
- Field names
- Field data types
- Mapping parameters
-
Configuration options for the index.
Index settings
PUT /my-index-000001
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 2
}
}
resp = client.indices.create(
index="my-index-000001",
settings={
"number_of_shards": 3,
"number_of_replicas": 2
},
)
const response = await client.indices.create({
index: "my-index-000001",
settings: {
number_of_shards: 3,
number_of_replicas: 2,
},
});
response = client.indices.create(
index: "my-index-000001",
body: {
"settings": {
"number_of_shards": 3,
"number_of_replicas": 2
}
}
)
$resp = $client->indices()->create([
"index" => "my-index-000001",
"body" => [
"settings" => [
"number_of_shards" => 3,
"number_of_replicas" => 2,
],
],
]);
curl -X PUT -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"settings":{"number_of_shards":3,"number_of_replicas":2}}' "$ELASTICSEARCH_URL/my-index-000001"
client.indices().create(c -> c
.index("my-index-000001")
.settings(s -> s
.numberOfShards("3")
.numberOfReplicas("2")
)
);
{
"settings": {
"number_of_shards": 3,
"number_of_replicas": 2
}
}
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"properties": {
"field1": { "type": "text" }
}
}
}
{
"aliases": {
"alias_1": {},
"alias_2": {
"filter": {
"term": {
"user.id": "kimchy"
}
},
"routing": "shard-1"
}
}
}
Create or update an index template
Generally available; Added in 7.9.0
All methods and paths for this operation:
Index templates define settings, mappings, and aliases that can be applied automatically to new indices.
Elasticsearch applies templates to new indices based on an wildcard pattern that matches the index name. Index templates are applied during data stream or index creation. For data streams, these settings and mappings are applied when the stream's backing indices are created. Settings and mappings specified in a create index API request override any settings or mappings specified in an index template. Changes to index templates do not affect existing indices, including the existing backing indices of a data stream.
You can use C-style /* *\/
block comments in index templates.
You can include comments anywhere in the request body, except before the opening curly bracket.
Multiple matching templates
If multiple index templates match the name of a new index or data stream, the template with the highest priority is used.
Multiple templates with overlapping index patterns at the same priority are not allowed and an error will be thrown when attempting to create a template matching an existing index template at identical priorities.
Composing aliases, mappings, and settings
When multiple component templates are specified in the composed_of
field for an index template, they are merged in the order specified, meaning that later component templates override earlier component templates.
Any mappings, settings, or aliases from the parent index template are merged in next.
Finally, any configuration on the index request itself is merged.
Mapping definitions are merged recursively, which means that later mapping components can introduce new field mappings and update the mapping configuration.
If a field mapping is already contained in an earlier component, its definition will be completely overwritten by the later one.
This recursive merging strategy applies not only to field mappings, but also root options like dynamic_templates
and meta
.
If an earlier component contains a dynamic_templates
block, then by default new dynamic_templates
entries are appended onto the end.
If an entry already exists with the same key, then it is overwritten by the new definition.
Required authorization
- Cluster privileges:
manage_index_templates
Query parameters
-
If
true
, this request cannot replace or update existing index templates. -
Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.
Values are
-1
or0
. -
User defined reason for creating/updating the index template
Body
Required
-
Name of the index template to create.
-
An ordered list of component template names. Component templates are merged in the order specified, meaning that the last component template specified has the highest precedence.
-
Template to be applied. It may optionally include an
aliases
,mappings
, orsettings
configuration. -
If this object is included, the template is used to create data streams and their backing indices. Supports an empty object. Data streams require a matching index template with a
data_stream
object. -
Priority to determine index template precedence when a new data stream or index is created. The index template with the highest priority is chosen. If no priority is specified the template is treated as though it is of priority 0 (lowest priority). This number is not automatically generated by Elasticsearch.
-
Version number used to manage index templates externally. This number is not automatically generated by Elasticsearch. External systems can use these version numbers to simplify template management. To unset a version, replace the template without specifying one.
-
Optional user metadata about the index template. It may have any contents. It is not automatically generated or used by Elasticsearch. This user-defined object is stored in the cluster state, so keeping it short is preferable To unset the metadata, replace the template without specifying it.
-
This setting overrides the value of the
action.auto_create_index
cluster setting. If set totrue
in a template, then indices can be automatically created using that template even if auto-creation of indices is disabled viaactions.auto_create_index
. If set tofalse
, then indices or data streams matching the template must always be explicitly created, and may never be automatically created. -
The configuration option ignore_missing_component_templates can be used when an index template references a component template that might not exist
-
Marks this index template as deprecated. When creating or updating a non-deprecated index template that uses deprecated components, Elasticsearch will emit a deprecation warning.
PUT /_index_template/template_1
{
"index_patterns" : ["template*"],
"priority" : 1,
"template": {
"settings" : {
"number_of_shards" : 2
}
}
}
resp = client.indices.put_index_template(
name="template_1",
index_patterns=[
"template*"
],
priority=1,
template={
"settings": {
"number_of_shards": 2
}
},
)
const response = await client.indices.putIndexTemplate({
name: "template_1",
index_patterns: ["template*"],
priority: 1,
template: {
settings: {
number_of_shards: 2,
},
},
});
response = client.indices.put_index_template(
name: "template_1",
body: {
"index_patterns": [
"template*"
],
"priority": 1,
"template": {
"settings": {
"number_of_shards": 2
}
}
}
)
$resp = $client->indices()->putIndexTemplate([
"name" => "template_1",
"body" => [
"index_patterns" => array(
"template*",
),
"priority" => 1,
"template" => [
"settings" => [
"number_of_shards" => 2,
],
],
],
]);
curl -X PUT -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"index_patterns":["template*"],"priority":1,"template":{"settings":{"number_of_shards":2}}}' "$ELASTICSEARCH_URL/_index_template/template_1"
client.indices().putIndexTemplate(p -> p
.indexPatterns("template*")
.name("template_1")
.priority(1L)
.template(t -> t
.settings(s -> s
.numberOfShards("2")
)
)
);
{
"index_patterns" : ["template*"],
"priority" : 1,
"template": {
"settings" : {
"number_of_shards" : 2
}
}
}
{
"index_patterns": [
"template*"
],
"template": {
"settings": {
"number_of_shards": 1
},
"aliases": {
"alias1": {},
"alias2": {
"filter": {
"term": {
"user.id": "kimchy"
}
},
"routing": "shard-1"
},
"{index}-alias": {}
}
}
}
Path parameters
-
Comma-separated list of index template names used to limit the request. Wildcard (*) expressions are supported.
Query parameters
-
If true, the request retrieves information from the local node only. Defaults to false, which means information is retrieved from the master node.
-
If true, returns settings in flat format.
-
Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.
Values are
-1
or0
.
curl \
--request HEAD 'http://api.example.com/_index_template/{name}' \
--header "Authorization: $API_KEY"
Path parameters
-
The name of the legacy index template to delete. Wildcard (
*
) expressions are supported.
Query parameters
-
Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.
Values are
-1
or0
. -
Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.
Values are
-1
or0
.
DELETE _template/.cloud-hot-warm-allocation-0
resp = client.indices.delete_template(
name=".cloud-hot-warm-allocation-0",
)
const response = await client.indices.deleteTemplate({
name: ".cloud-hot-warm-allocation-0",
});
response = client.indices.delete_template(
name: ".cloud-hot-warm-allocation-0"
)
$resp = $client->indices()->deleteTemplate([
"name" => ".cloud-hot-warm-allocation-0",
]);
curl -X DELETE -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_template/.cloud-hot-warm-allocation-0"
client.indices().deleteTemplate(d -> d
.name(".cloud-hot-warm-allocation-0")
);
Check existence of index templates
Generally available
Get information about whether index templates exist. Index templates define settings, mappings, and aliases that can be applied automatically to new indices.
IMPORTANT: This documentation is about legacy index templates, which are deprecated and will be replaced by the composable templates introduced in Elasticsearch 7.8.
Required authorization
- Cluster privileges:
manage_index_templates
Path parameters
-
A comma-separated list of index template names used to limit the request. Wildcard (
*
) expressions are supported.
Query parameters
-
Indicates whether to use a flat format for the response.
-
Indicates whether to get information from the local node only.
-
The period to wait for the master node. If the master node is not available before the timeout expires, the request fails and returns an error. To indicate that the request should never timeout, set it to
-1
.Values are
-1
or0
.
HEAD /_template/template_1
resp = client.indices.exists_template(
name="template_1",
)
const response = await client.indices.existsTemplate({
name: "template_1",
});
response = client.indices.exists_template(
name: "template_1"
)
$resp = $client->indices()->existsTemplate([
"name" => "template_1",
]);
curl --head -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_template/template_1"
client.indices().existsTemplate(e -> e
.name("template_1")
);
Check aliases
Generally available
All methods and paths for this operation:
Check if one or more data stream or index aliases exist.
Path parameters
-
Comma-separated list of data streams or indices used to limit the request. Supports wildcards (
*
). To target all data streams and indices, omit this parameter or use*
or_all
. -
Comma-separated list of aliases to check. Supports wildcards (
*
).
Query parameters
-
If
false
, the request returns an error if any wildcard expression, index alias, or_all
value targets only missing or closed indices. This behavior applies even if the request targets other open indices. -
Type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. Supports comma-separated values, such as
open,hidden
.Supported values include:
all
: Match any data stream or index, including hidden ones.open
: Match open, non-hidden indices. Also matches any non-hidden data stream.closed
: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.hidden
: Match hidden data streams and hidden indices. Must be combined withopen
,closed
, orboth
.none
: Wildcard expressions are not accepted.
Values are
all
,open
,closed
,hidden
, ornone
. -
If
true
, the request retrieves information from the local node only.
HEAD _alias/my-alias
resp = client.indices.exists_alias(
name="my-alias",
)
const response = await client.indices.existsAlias({
name: "my-alias",
});
response = client.indices.exists_alias(
name: "my-alias"
)
$resp = $client->indices()->existsAlias([
"name" => "my-alias",
]);
curl --head -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_alias/my-alias"
client.indices().existsAlias(e -> e
.name("my-alias")
);
Force a merge
Generally available; Added in 2.1.0
All methods and paths for this operation:
Perform the force merge operation on the shards of one or more indices. For data streams, the API forces a merge on the shards of the stream's backing indices.
Merging reduces the number of segments in each shard by merging some of them together and also frees up the space used by deleted documents. Merging normally happens automatically, but sometimes it is useful to trigger a merge manually.
WARNING: We recommend force merging only a read-only index (meaning the index is no longer receiving writes). When documents are updated or deleted, the old version is not immediately removed but instead soft-deleted and marked with a "tombstone". These soft-deleted documents are automatically cleaned up during regular segment merges. But force merge can cause very large (greater than 5 GB) segments to be produced, which are not eligible for regular merges. So the number of soft-deleted documents can then grow rapidly, resulting in higher disk usage and worse search performance. If you regularly force merge an index receiving writes, this can also make snapshots more expensive, since the new documents can't be backed up incrementally.
Blocks during a force merge
Calls to this API block until the merge is complete (unless request contains wait_for_completion=false
).
If the client connection is lost before completion then the force merge process will continue in the background.
Any new requests to force merge the same indices will also block until the ongoing force merge is complete.
Running force merge asynchronously
If the request contains wait_for_completion=false
, Elasticsearch performs some preflight checks, launches the request, and returns a task you can use to get the status of the task.
However, you can not cancel this task as the force merge task is not cancelable.
Elasticsearch creates a record of this task as a document at _tasks/<task_id>
.
When you are done with a task, you should delete the task document so Elasticsearch can reclaim the space.
Force merging multiple indices
You can force merge multiple indices with a single request by targeting:
- One or more data streams that contain multiple backing indices
- Multiple indices
- One or more aliases
- All data streams and indices in a cluster
Each targeted shard is force-merged separately using the force_merge threadpool.
By default each node only has a single force_merge
thread which means that the shards on that node are force-merged one at a time.
If you expand the force_merge
threadpool on a node then it will force merge its shards in parallel
Force merge makes the storage for the shard being merged temporarily increase, as it may require free space up to triple its size in case max_num_segments parameter
is set to 1
, to rewrite all segments into a new one.
Data streams and time-based indices
Force-merging is useful for managing a data stream's older backing indices and other time-based indices, particularly after a rollover. In these cases, each index only receives indexing traffic for a certain period of time. Once an index receive no more writes, its shards can be force-merged to a single segment. This can be a good idea because single-segment shards can sometimes use simpler and more efficient data structures to perform searches. For example:
POST /.ds-my-data-stream-2099.03.07-000001/_forcemerge?max_num_segments=1
Required authorization
- Index privileges:
maintenance
Path parameters
-
A comma-separated list of index names; use
_all
or empty string to perform the operation on all indices
Query parameters
-
Whether to ignore if a wildcard indices expression resolves into no concrete indices. (This includes
_all
string or when no indices have been specified) -
Whether to expand wildcard expression to concrete indices that are open, closed or both.
Supported values include:
all
: Match any data stream or index, including hidden ones.open
: Match open, non-hidden indices. Also matches any non-hidden data stream.closed
: Match closed, non-hidden indices. Also matches any non-hidden data stream. Data streams cannot be closed.hidden
: Match hidden data streams and hidden indices. Must be combined withopen
,closed
, orboth
.none
: Wildcard expressions are not accepted.
Values are
all
,open
,closed
,hidden
, ornone
. -
Specify whether the index should be flushed after performing the operation (default: true)
-
The number of segments the index should be merged into (default: dynamic)
-
Specify whether the operation should only expunge deleted documents
-
Should the request wait until the force merge is completed.
POST my-index-000001/_forcemerge
resp = client.indices.forcemerge(
index="my-index-000001",
)
const response = await client.indices.forcemerge({
index: "my-index-000001",
});
response = client.indices.forcemerge(
index: "my-index-000001"
)
$resp = $client->indices()->forcemerge([
"index" => "my-index-000001",
]);
curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/my-index-000001/_forcemerge"
client.indices().forcemerge(f -> f
.index("my-index-000001")
);
Create or update a lifecycle policy
Generally available; Added in 6.6.0
If the specified policy exists, it is replaced and the policy version is incremented.
NOTE: Only the latest version of the policy is stored, you cannot revert to previous versions.
Required authorization
- Index privileges:
manage
- Cluster privileges:
manage_ilm
Query parameters
-
Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.
Values are
-1
or0
. -
Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.
Values are
-1
or0
.
PUT _ilm/policy/my_policy
{
"policy": {
"_meta": {
"description": "used for nginx log",
"project": {
"name": "myProject",
"department": "myDepartment"
}
},
"phases": {
"warm": {
"min_age": "10d",
"actions": {
"forcemerge": {
"max_num_segments": 1
}
}
},
"delete": {
"min_age": "30d",
"actions": {
"delete": {}
}
}
}
}
}
resp = client.ilm.put_lifecycle(
name="my_policy",
policy={
"_meta": {
"description": "used for nginx log",
"project": {
"name": "myProject",
"department": "myDepartment"
}
},
"phases": {
"warm": {
"min_age": "10d",
"actions": {
"forcemerge": {
"max_num_segments": 1
}
}
},
"delete": {
"min_age": "30d",
"actions": {
"delete": {}
}
}
}
},
)
const response = await client.ilm.putLifecycle({
name: "my_policy",
policy: {
_meta: {
description: "used for nginx log",
project: {
name: "myProject",
department: "myDepartment",
},
},
phases: {
warm: {
min_age: "10d",
actions: {
forcemerge: {
max_num_segments: 1,
},
},
},
delete: {
min_age: "30d",
actions: {
delete: {},
},
},
},
},
});
response = client.ilm.put_lifecycle(
policy: "my_policy",
body: {
"policy": {
"_meta": {
"description": "used for nginx log",
"project": {
"name": "myProject",
"department": "myDepartment"
}
},
"phases": {
"warm": {
"min_age": "10d",
"actions": {
"forcemerge": {
"max_num_segments": 1
}
}
},
"delete": {
"min_age": "30d",
"actions": {
"delete": {}
}
}
}
}
}
)
$resp = $client->ilm()->putLifecycle([
"policy" => "my_policy",
"body" => [
"policy" => [
"_meta" => [
"description" => "used for nginx log",
"project" => [
"name" => "myProject",
"department" => "myDepartment",
],
],
"phases" => [
"warm" => [
"min_age" => "10d",
"actions" => [
"forcemerge" => [
"max_num_segments" => 1,
],
],
],
"delete" => [
"min_age" => "30d",
"actions" => [
"delete" => new ArrayObject([]),
],
],
],
],
],
]);
curl -X PUT -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"policy":{"_meta":{"description":"used for nginx log","project":{"name":"myProject","department":"myDepartment"}},"phases":{"warm":{"min_age":"10d","actions":{"forcemerge":{"max_num_segments":1}}},"delete":{"min_age":"30d","actions":{"delete":{}}}}}}' "$ELASTICSEARCH_URL/_ilm/policy/my_policy"
client.ilm().putLifecycle(p -> p
.name("my_policy")
.policy(po -> po
.phases(ph -> ph
.delete(d -> d
.actions(a -> a
.delete(de -> de)
)
.minAge(m -> m
.time("30d")
)
)
.warm(w -> w
.actions(a -> a
.forcemerge(f -> f
.maxNumSegments(1)
)
)
.minAge(m -> m
.time("10d")
)
)
)
.meta(Map.of("description", JsonData.fromJson("\"used for nginx log\""),"project", JsonData.fromJson("{\"name\":\"myProject\",\"department\":\"myDepartment\"}")))
)
);
{
"policy": {
"_meta": {
"description": "used for nginx log",
"project": {
"name": "myProject",
"department": "myDepartment"
}
},
"phases": {
"warm": {
"min_age": "10d",
"actions": {
"forcemerge": {
"max_num_segments": 1
}
}
},
"delete": {
"min_age": "30d",
"actions": {
"delete": {}
}
}
}
}
}
{
"acknowledged": true
}
GET _ilm/status
resp = client.ilm.get_status()
const response = await client.ilm.getStatus();
response = client.ilm.get_status
$resp = $client->ilm()->getStatus();
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_ilm/status"
client.ilm().getStatus();
{
"operation_mode": "RUNNING"
}
Perform chat completion inference
Generally available; Added in 8.18.0
The chat completion inference API enables real-time responses for chat completion tasks by delivering answers incrementally, reducing response times during computation.
It only works with the chat_completion
task type for openai
and elastic
inference services.
NOTE: The chat_completion
task type is only available within the _stream API and only supports streaming.
The Chat completion inference API and the Stream inference API differ in their response structure and capabilities.
The Chat completion inference API provides more comprehensive customization options through more fields and function calling support.
If you use the openai
, hugging_face
or the elastic
service, use the Chat completion inference API.
Query parameters
-
Specifies the amount of time to wait for the inference request to complete.
Values are
-1
or0
.
Body
Required
-
A list of objects representing the conversation. Requests should generally only add new messages from the user (role
user
). The other message roles (assistant
,system
, ortool
) should generally only be copied from the response to a previous completion request, such that the messages array is built up throughout a conversation.An object representing part of the conversation.
-
The ID of the model to use.
-
The upper bound limit for the number of tokens that can be generated for a completion request.
-
A sequence of strings to control when the model should stop generating additional tokens.
-
The sampling temperature to use.
tool_choice
string | object Controls which tool is called by the model. String representation: One of
auto
,none
, orrequrired
.auto
allows the model to choose between calling tools and generating a message.none
causes the model to not call any tools.required
forces the model to call one or more tools. Example (object representation):{ "tool_choice": { "type": "function", "function": { "name": "get_current_weather" } } }
One of: Controls which tool is called by the model. String representation: One of
auto
,none
, orrequrired
.auto
allows the model to choose between calling tools and generating a message.none
causes the model to not call any tools.required
forces the model to call one or more tools. Example (object representation):{ "tool_choice": { "type": "function", "function": { "name": "get_current_weather" } } }
-
A list of tools that the model can call. Example:
{ "tools": [ { "type": "function", "function": { "name": "get_price_of_item", "description": "Get the current price of an item", "parameters": { "type": "object", "properties": { "item": { "id": "12345" }, "unit": { "type": "currency" } } } } } ] }
A list of tools that the model can call.
-
Nucleus sampling, an alternative to sampling with temperature.
POST _inference/chat_completion/openai-completion/_stream
{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "What is Elastic?"
}
]
}
resp = client.inference.chat_completion_unified(
inference_id="openai-completion",
chat_completion_request={
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "What is Elastic?"
}
]
},
)
const response = await client.inference.chatCompletionUnified({
inference_id: "openai-completion",
chat_completion_request: {
model: "gpt-4o",
messages: [
{
role: "user",
content: "What is Elastic?",
},
],
},
});
response = client.inference.chat_completion_unified(
inference_id: "openai-completion",
body: {
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "What is Elastic?"
}
]
}
)
$resp = $client->inference()->chatCompletionUnified([
"inference_id" => "openai-completion",
"body" => [
"model" => "gpt-4o",
"messages" => array(
[
"role" => "user",
"content" => "What is Elastic?",
],
),
],
]);
curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"model":"gpt-4o","messages":[{"role":"user","content":"What is Elastic?"}]}' "$ELASTICSEARCH_URL/_inference/chat_completion/openai-completion/_stream"
client.inference().chatCompletionUnified(c -> c
.inferenceId("openai-completion")
.chatCompletionRequest(ch -> ch
.messages(m -> m
.content(co -> co
.string("What is Elastic?")
)
.role("user")
)
.model("gpt-4o")
)
);
{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "What is Elastic?"
}
]
}
{
"messages": [
{
"role": "assistant",
"content": "Let's find out what the weather is",
"tool_calls": [
{
"id": "call_KcAjWtAww20AihPHphUh46Gd",
"type": "function",
"function": {
"name": "get_current_weather",
"arguments": "{\"location\":\"Boston, MA\"}"
}
}
]
},
{
"role": "tool",
"content": "The weather is cold",
"tool_call_id": "call_KcAjWtAww20AihPHphUh46Gd"
}
]
}
{
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's the price of a scarf?"
}
]
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_current_price",
"description": "Get the current price of a item",
"parameters": {
"type": "object",
"properties": {
"item": {
"id": "123"
}
}
}
}
}
],
"tool_choice": {
"type": "function",
"function": {
"name": "get_current_price"
}
}
}
event: message
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[{"delta":{"content":"","role":"assistant"},"index":0}],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk"}}
event: message
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[{"delta":{"content":Elastic"},"index":0}],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk"}}
event: message
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[{"delta":{"content":" is"},"index":0}],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk"}}
(...)
event: message
data: {"chat_completion":{"id":"chatcmpl-Ae0TWsy2VPnSfBbv5UztnSdYUMFP3","choices":[],"model":"gpt-4o-2024-08-06","object":"chat.completion.chunk","usage":{"completion_tokens":28,"prompt_tokens":16,"total_tokens":44}}}
event: message
data: [DONE]
Create a Hugging Face inference endpoint
Generally available; Added in 8.12.0
Create an inference endpoint to perform an inference task with the hugging_face
service.
Supported tasks include: text_embedding
, completion
, and chat_completion
.
To configure the endpoint, first visit the Hugging Face Inference Endpoints page and create a new endpoint. Select a model that supports the task you intend to use.
For Elastic's text_embedding
task:
The selected model must support the Sentence Embeddings
task. On the new endpoint creation page, select the Sentence Embeddings
task under the Advanced Configuration
section.
After the endpoint has initialized, copy the generated endpoint URL.
Recommended models for text_embedding
task:
all-MiniLM-L6-v2
all-MiniLM-L12-v2
all-mpnet-base-v2
e5-base-v2
e5-small-v2
multilingual-e5-base
multilingual-e5-small
For Elastic's chat_completion
and completion
tasks:
The selected model must support the Text Generation
task and expose OpenAI API. HuggingFace supports both serverless and dedicated endpoints for Text Generation
. When creating dedicated endpoint select the Text Generation
task.
After the endpoint is initialized (for dedicated) or ready (for serverless), ensure it supports the OpenAI API and includes /v1/chat/completions
part in URL. Then, copy the full endpoint URL for use.
Recommended models for chat_completion
and completion
tasks:
Mistral-7B-Instruct-v0.2
QwQ-32B
Phi-3-mini-128k-instruct
For Elastic's rerank
task:
The selected model must support the sentence-ranking
task and expose OpenAI API.
HuggingFace supports only dedicated (not serverless) endpoints for Rerank
so far.
After the endpoint is initialized, copy the full endpoint URL for use.
Tested models for rerank
task:
bge-reranker-base
jina-reranker-v1-turbo-en-GGUF
Required authorization
- Cluster privileges:
manage_inference
Path parameters
-
The type of the inference task that the model will perform.
Values are
chat_completion
,completion
,rerank
, ortext_embedding
. -
The unique identifier of the inference endpoint.
Query parameters
-
Specifies the amount of time to wait for the inference endpoint to be created.
Values are
-1
or0
.
Body
-
The chunking configuration object.
-
The type of service supported for the specified task type. In this case,
hugging_face
.Value is
hugging_face
. -
Settings used to install the inference model. These settings are specific to the
hugging_face
service. -
Settings to configure the inference task. These settings are specific to the task type you specified.
PUT _inference/text_embedding/hugging-face-embeddings
{
"service": "hugging_face",
"service_settings": {
"api_key": "hugging-face-access-token",
"url": "url-endpoint"
}
}
resp = client.inference.put(
task_type="text_embedding",
inference_id="hugging-face-embeddings",
inference_config={
"service": "hugging_face",
"service_settings": {
"api_key": "hugging-face-access-token",
"url": "url-endpoint"
}
},
)
const response = await client.inference.put({
task_type: "text_embedding",
inference_id: "hugging-face-embeddings",
inference_config: {
service: "hugging_face",
service_settings: {
api_key: "hugging-face-access-token",
url: "url-endpoint",
},
},
});
response = client.inference.put(
task_type: "text_embedding",
inference_id: "hugging-face-embeddings",
body: {
"service": "hugging_face",
"service_settings": {
"api_key": "hugging-face-access-token",
"url": "url-endpoint"
}
}
)
$resp = $client->inference()->put([
"task_type" => "text_embedding",
"inference_id" => "hugging-face-embeddings",
"body" => [
"service" => "hugging_face",
"service_settings" => [
"api_key" => "hugging-face-access-token",
"url" => "url-endpoint",
],
],
]);
curl -X PUT -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"service":"hugging_face","service_settings":{"api_key":"hugging-face-access-token","url":"url-endpoint"}}' "$ELASTICSEARCH_URL/_inference/text_embedding/hugging-face-embeddings"
client.inference().put(p -> p
.inferenceId("hugging-face-embeddings")
.taskType(TaskType.TextEmbedding)
.inferenceConfig(i -> i
.service("hugging_face")
.serviceSettings(JsonData.fromJson("{\"api_key\":\"hugging-face-access-token\",\"url\":\"url-endpoint\"}"))
)
);
{
"service": "hugging_face",
"service_settings": {
"api_key": "hugging-face-access-token",
"url": "url-endpoint"
}
}
{
"service": "hugging_face",
"service_settings": {
"api_key": "hugging-face-access-token",
"url": "url-endpoint"
},
"task_settings": {
"return_documents": true,
"top_n": 3
}
}
Ingest
Ingest APIs enable you to manage tasks and resources related to ingest pipelines and processors.