Optimize ES|QL query performance

This guide covers practical techniques for writing fast ES|QL queries and operating ES|QL workloads at scale. It starts with common anti-patterns, then shows how to reduce scanned data, reduce returned data, and avoid expensive operations. It also covers tools for monitoring query performance and investigating slow queries across multiple clusters.

Tip

For a quick overview of the most common issues with ES|QL queries, refer to Common anti-patterns.

Before you begin

This guide assumes familiarity with ES|QL syntax and command pipelines. To learn the basics, refer to Get started with ES|QL queries.

This guide serves two audiences:

If you write ES|QL queries, the following sections cover the techniques most relevant to you:
If you administer clusters and need to monitor query performance across multiple clusters, the following section describes the tools available:
- Monitor query performance

Check your Elastic Stack version

If you're not on Elastic Cloud Serverless, check your Elastic Stack version. The ES|QL query engine improves with each release, so upgrading is often one of the highest-impact performance changes you can make.

Some tips on this page require a recent version of the Elastic Stack, and individual subsections carry an applicability badge when this is the case. Sections without a version badge apply to all versions where ES|QL is generally available.

The most important version-specific performance improvements are visible in the table below, including improvements in 9.x for query logging, time series support, query activity, and full-text search functions.

For clusters on a version before 8.17, upgrading provides the largest single performance improvement, because full-text search functions and Lucene pushdowns become available. For clusters on 8.17 but before 8.18, upgrading to 8.18 provides the next largest improvement. That release adds LIKE and RLIKE pushdown to Lucene and a mapping discovery optimization that reduces overhead on clusters with many indices.

Version	What improved	Impact
8.13	`CIDR_MATCH` pushed to Lucene	Faster IP filtering in security queries
8.16	Per-aggregation `WHERE`	Replaces slow `CASE`-based conditional aggregation
8.17	`MATCH` and `QSTR` full-text search functions	Orders of magnitude faster than `LIKE` or `RLIKE` for text search
8.18, 9.0	`LIKE` and `RLIKE` pushed to Lucene, mapping discovery optimization, `LOOKUP JOIN`	Faster pattern matching, cheaper queries on clusters with many indices, and native lookup joins
9.1	ES\|QL query log, full-text functions GA	Dedicated query performance logging, `MATCH`, `QSTR`, and `KQL` stable
9.2	`INLINE STATS`, `TS` command with `RATE` and `TBUCKET` in preview, `CHANGE_POINT` GA	Window-function-like queries, native time series support
9.3	`INLINE STATS` GA, `TRANGE`, Lucene-pushable `LOOKUP JOIN` predicates	Faster filtered joins, simpler time range syntax
9.4+	`TS` and time series aggregation functions GA, Query activity, unified query logging	Native time series support, real-time view of in-flight queries in Kibana, single log for all query types

Index only what you need

Query performance starts at index time. Your field mappings control what ES|QL can do efficiently.

If you only aggregate or sort on a field, and never filter, set index: false to save disk space. ES|QL can still read the field through doc values. Keep doc values enabled for fields that ES|QL needs to read, group, sort, or return. For fields that are rarely needed in results, remove them from the query output with KEEP or DROP.

Know your circuit breaker limits

ES|QL enforces memory limits through circuit breakers. When a query exceeds the limit, the cluster rejects it to protect node stability. High-cardinality aggregations are the most common trigger. To learn more, refer to Avoid high-cardinality STATS BY.

Common anti-patterns

These anti-patterns are the most common causes of ES|QL query latency in production.

Tip

AutoOps detects most of these patterns automatically and surfaces actionable recommendations. To browse detected events, refer to AutoOps events.

Pattern	What to look for	Why it's slow
Broad index pattern	`FROM *` or wide wildcards	Expensive mapping discovery, plus scans across many indices
Wide time range	`@timestamp` range spanning weeks or months	Scans proportionally more data
Missing `WHERE`	No filter conditions at all	Full index scan
Missing `KEEP`	No column selection	Returns all fields, producing large payloads
Missing `LIMIT`	Unbounded result set	Slow serialization, can trigger deserialization errors in Kibana
High-cardinality `STATS BY`	Grouping by raw timestamps, full URLs, or document IDs	Produces millions of buckets, can trip circuit breakers
`LIKE` or `RLIKE`	Wildcard or regex text matching	Slower than full-text functions for text search, especially pre 8.18/9.0
`GROK` or `DISSECT`	Text parsing on large datasets	CPU-intensive regex or tokenization per row
`CASE`	Conditional aggregation through `CASE`	Lazy evaluation, slow
`LOOKUP JOIN`	Join against a large lookup index	Cost is proportional to the lookup index size

Tip

The most impactful fixes are usually: add a time range filter, add a WHERE, and add a KEEP.

Reduce what you scan

Most ES|QL queries spend the bulk of their time reading data from disk. The fastest queries read the least data. This section covers the levers that most directly control scan size.

Narrow the time range

A tight time range is the single biggest performance lever in most workloads. ES|QL uses the @timestamp field to skip entire shards and segments that fall outside the range, so a narrower window directly reduces the amount of data read.

Avoid running queries that span more time than the result actually needs:

❌ Don't: Query without a time bound

		FROM logs-*
| WHERE host.name == "web-01"
| STATS count = COUNT(*) BY log.level
		
	

✅ Do: Add an explicit @timestamp filter to bound the scan

		FROM logs-*
| WHERE @timestamp > NOW() - 1 day
  AND host.name == "web-01"
| STATS count = COUNT(*) BY log.level
		
	

Bound the scan to a tight window

In Kibana, the time picker automatically applies a time range filter. When writing queries directly in the Kibana Console or through the API, always include an explicit @timestamp filter.

Filter early with WHERE

A WHERE clause earlier in the pipeline reduces the dataset before downstream commands process it. Conditions on indexed fields such as keyword, numeric, date, ip, geo_point, geo_shape, cartesian_point, or cartesian_shape types are pushed down to Lucene, which skips irrelevant documents entirely.

Without a WHERE, ES|QL scans every document in the matched indices:

❌ Don't: Filter after the aggregation

		FROM logs-*
| STATS count = COUNT(*) BY host.name, log.level
| WHERE log.level == "error"
		
	

✅ Do: Push the filter up so it runs before the aggregation

		FROM logs-*
| WHERE @timestamp > NOW() - 1 day
  AND log.level == "error"
| STATS count = COUNT(*) BY host.name
		
	

Filter pushed to Lucene

Restrict the index pattern

A broad FROM * forces ES|QL to discover field mappings across every index and then query each one. On clusters with thousands of indices, that discovery overhead alone can dominate query time. In Serverless, FROM * can also expand to all linked projects in cross-project search. Use project_routing to limit a cross-project query to the projects that it actually needs.

❌ Don't: Use wildcards that match more indices than the query needs

		FROM *
| WHERE @timestamp > NOW() - 1 hour
  AND event.category == "authentication"
| STATS failures = COUNT(*) BY user.name
		
	

✅ Do: Target a specific index pattern instead

		FROM logs-system-*
| WHERE @timestamp > NOW() - 1 hour
  AND event.category == "authentication"
| STATS failures = COUNT(*) BY user.name
		
	

When a query genuinely needs multiple patterns, list them explicitly with FROM. For example:

FROM logs-system-*, logs-auth-*

Use TS for time series data

For time series data streams (TSDS), use TS rather than FROM. TS understands time series structure, including dimensions, metrics, and time ordering, and skips data more efficiently than FROM paired with WHERE. It also unlocks time series functions such as RATE and bucketing through TBUCKET.

❌ Don't: Query TSDS indices through FROM when you intend to aggregate metrics

		FROM metrics-system.cpu-*
| WHERE @timestamp > NOW() - 1 hour
| STATS avg_cpu = AVG(system.cpu.user.pct) BY host.name, bucket = DATE_TRUNC(5 minutes, @timestamp)
		
	

✅ Do: Use TS with TBUCKET for time series metrics

		TS metrics-system.cpu-*
| STATS avg_cpu = AVG(AVG_OVER_TIME(system.cpu.user.pct))
        BY host.name, TBUCKET(5 minutes)
		
	

Inner-then-outer aggregation pattern
TBUCKET replaces DATE_TRUNC under TS

Important

TS only works on indices created as time series data streams. For non-TSDS indices, continue to use FROM.

Reduce what you return

Every column returned has to be read from storage, serialized, and transmitted. Shrinking the result set, by returning fewer columns or rows, often produces significant gains on large indices.

Select columns with KEEP

KEEP selects which columns to return. DROP does the inverse. Without either, ES|QL returns every field in every matching document. This is the single biggest source of avoidable overhead on indices with hundreds or thousands of fields.

❌ Don't: Return every field by default

		FROM logs-*
| WHERE @timestamp > NOW() - 1 hour AND log.level == "error"
| SORT @timestamp DESC
| LIMIT 100
		
	

✅ Do: Project only the fields the consumer actually needs

		FROM logs-*
| WHERE @timestamp > NOW() - 1 hour AND log.level == "error"
| KEEP @timestamp, host.name, message, log.level
| SORT @timestamp DESC
| LIMIT 100
		
	

Return only the four fields needed downstream

Tip

Use wildcards in KEEP sparingly. host.* is better than no KEEP at all, but host.name is better than host.* because it avoids pulling in adjacent fields.

When using the REST API on sparse datasets where many columns are null, consider setting the drop_null_columns query parameter. This removes columns that contain only null values from the response, which can significantly reduce serialization overhead.

Cap rows with LIMIT

Always include a LIMIT on queries that return raw rows. ES|QL appends a default limit of 1000 rows to every query. Reducing it with an explicit LIMIT is one of the simplest ways to speed up a query. Increasing it beyond the default makes serialization slower and can trigger deserialization errors in Kibana. The maximum configurable limit is 10,000 rows.

❌ Don't: Leave the result set unbounded

		FROM logs-*
| WHERE @timestamp > NOW() - 1 day AND log.level == "error"
| SORT @timestamp DESC
		
	

✅ Do: Cap the result to the rows the consumer actually needs

		FROM logs-*
| WHERE @timestamp > NOW() - 1 day AND log.level == "error"
| SORT @timestamp DESC
| LIMIT 100
		
	

Avoid expensive operations

Some ES|QL operations are intrinsically more expensive than their alternatives. Knowing the cheaper substitute, and when it applies, often replaces a slow query with a fast one. The subsections below are ordered roughly by impact, with the highest-leverage changes first.

Use full-text search instead of LIKE or RLIKE

For text search, prefer MATCH, MATCH_PHRASE, QSTR, or KQL over LIKE or RLIKE. The full-text search functions use the inverted index and are optimized for analyzed text. LIKE and RLIKE are pattern-matching operators. Pre 8.18/9.0 they are especially costly because they are not pushed down to Lucene. Leading wildcards (for example *something) are particularly expensive because they cannot use the inverted index efficiently.

❌ Don't: Use pattern matching on free text with LIKE

		FROM logs-*
| WHERE @timestamp > NOW() - 1 hour
  AND message LIKE "*connection refused*"
		
	

✅ Do: Use MATCH_PHRASE against the inverted index

		FROM logs-*
| WHERE @timestamp > NOW() - 1 hour
  AND MATCH_PHRASE(message, "connection refused")
		
	

Inverted index lookup, not a row scan

MATCH works on text and keyword fields. Use MATCH_PHRASE when the words must appear together in order. For Lucene query syntax with field:value and boolean operators, use QSTR. For Kibana Query Language syntax, use KQL.

MATCH can also target expressions that are not backed by an index, such as columns produced by EVAL or STATS. When the target is not an indexed field, MATCH evaluates by scanning values row by row instead of using the inverted index, which is slower on large datasets. For best performance, prefer searching indexed fields when possible.

Tip

To learn more about using ES|QL for search use cases, refer to ES|QL for search.

Avoid high-cardinality STATS BY

Each unique combination of BY values creates a bucket in memory. Grouping by high-cardinality fields such as raw timestamps, full URLs, or document IDs, or by many fields at once, can produce millions of buckets.

Warning

High-cardinality groupings can exhaust memory and trip circuit breakers. Always bucket timestamps and choose the lowest-cardinality representation of a field that still answers the question.

❌ Don't: Group by raw, high-cardinality fields

		FROM logs-*
| WHERE @timestamp > NOW() - 1 day
| STATS count = COUNT(*) BY url.full, user.name, @timestamp
		
	

✅ Do: Reduce cardinality before grouping

		FROM logs-*
| WHERE @timestamp > NOW() - 1 day
| STATS count = COUNT(*) BY url.path, user.name, bucket = DATE_TRUNC(1 hour, @timestamp)
		
	

Uses url.path instead of url.full, bucketed timestamps instead of raw

Common reductions include: bucketing timestamps with DATE_TRUNC or BUCKET, using url.path instead of url.full, and filtering to a known subset before the STATS.

Prefer fields backed by doc values

ES|QL reads field values through a block-loading system that strongly prefers doc values. Fields with doc values, such as keyword, numeric, date, ip, geo_point, geo_shape, cartesian_point, and cartesian_shape types, are read in fast columnar batches. Fields without doc values, such as text and match_only_text, fall back to reading _source, which requires decompressing and parsing the full JSON document per row. This applies to any operation that reads the field value, including filtering, grouping, sorting, and returning fields through KEEP.

If an exact .keyword subfield exists, the query planner automatically rewrites expressions to use it, so message and message.keyword perform the same in that case. However, if the text field has no keyword subfield, or if the subfield is not exact (for example, it uses ignore_above), the planner cannot rewrite and falls back to reading from _source, which is significantly slower. When no exact subfield is available, filter aggressively to limit the number of documents that require _source reads.

❌ Don't: Group by an analyzed field

		FROM logs-*
| WHERE @timestamp > NOW() - 1 hour
| STATS count = COUNT(*) BY message
		
	

✅ Do: Use the .keyword subfield

		FROM logs-*
| WHERE @timestamp > NOW() - 1 hour
| STATS count = COUNT(*) BY message.keyword
		
	

For free-text grouping, CATEGORIZE groups similar messages automatically:

		FROM logs-*
| WHERE @timestamp > NOW() - 1 hour
| STATS count = COUNT(*) BY category = CATEGORIZE(message)
| SORT count DESC
| LIMIT 20
		
	

Return spatial fields only when you need source precision

The spatial types geo_point, geo_shape, cartesian_point, and cartesian_shape are maintained at source precision in the original documents, but indexed at reduced precision by Lucene for performance. Reading spatial values from doc values is fast and usually precise enough. Returning the original spatial field preserves source precision, but requires reading from _source, which is slower. To prioritize performance, drop original spatial fields from the result unless the query consumer needs the exact original value.

To learn more, refer to Spatial precision.

Use per-aggregation WHERE instead of CASE

For conditional aggregations, attach a WHERE clause directly to each STATS expression rather than wrapping values in CASE. CASE is lazy-evaluated and slow for this pattern.

❌ Don't: Emulate conditional aggregations through CASE and SUM

		FROM logs-*
| WHERE @timestamp > NOW() - 1 day
| EVAL is_error = CASE(log.level == "error", 1, 0)
| EVAL is_warn = CASE(log.level == "warning", 1, 0)
| STATS
    total = COUNT(*),
    errors = SUM(is_error),
    warnings = SUM(is_warn)
  BY service.name
		
	

✅ Do: Compute each conditional metric directly with a per-aggregation WHERE

		FROM logs-*
| WHERE @timestamp > NOW() - 1 day
| STATS
    total = COUNT(*),
    errors = COUNT(*) WHERE log.level == "error",
    warnings = COUNT(*) WHERE log.level == "warning"
  BY service.name
		
	

Per-aggregation filter, no CASE needed
One filter per metric

Prefer DISSECT over GROK

GROK uses regular expressions, which are CPU-intensive per row. DISSECT uses delimiter-based tokenization and is much cheaper. When the log format uses consistent delimiters, prefer DISSECT. When you must use GROK, filter aggressively first to shrink the dataset.

❌ Don't: Use regex parsing when a delimiter is available

		FROM logs-*
| WHERE @timestamp > NOW() - 1 hour
| GROK message "%{TIMESTAMP_ISO8601:ts} %{LOGLEVEL:level} %{GREEDYDATA:msg}"
		
	

✅ Do: Use DISSECT for delimiter-based formats

		FROM logs-*
| WHERE @timestamp > NOW() - 1 hour
| DISSECT message "%{ts} %{level} %{msg}"
		
	

Delimiter tokenization, no regex engine

Filter before LOOKUP JOIN

LOOKUP JOIN combines each incoming row with matching rows from a lookup index. Joining fewer incoming rows is usually faster, and large lookup matches can increase memory pressure.

Filter the source data before joining, and keep the lookup index as small and purpose-built as possible:

		FROM logs-*
| WHERE @timestamp > NOW() - 1 hour
  AND event.category == "network"
| LOOKUP JOIN threat_list ON source.ip
| KEEP @timestamp, source.ip, threat_list.risk, event.action
		
	

ES|QL tries to push filters before the join when possible. Write the query with the selective filters before LOOKUP JOIN so the intended execution order is clear and the join receives the smallest practical input.

Use approximate aggregations where possible

For large STATS queries, exact results can be expensive. If approximate results are acceptable, approximate STATS queries can trade exactness for much faster execution on large datasets.

Approximation is useful for exploratory analysis, dashboard panels, and high-cardinality aggregations where a close estimate is enough. Use exact aggregations when the result feeds billing, compliance, alerting, or other workflows that require precise values.

Monitor query performance

Once a query is written, several tools help confirm whether it is actually fast and identify regressions over time. When reviewing query logs, scan for common anti-patterns first.

Inspect panel: Check a query from Discover or a dashboard.
Query activity: Find and cancel in-flight queries.
Profile API responses: Inspect how a query executed through the API.
Query logging: Analyze historical slow queries.
Task management API: Monitor or cancel ES|QL tasks from the API.

Inspect panel in Kibana

In Discover or within a dashboard, select Inspect to see the ES|QL query sent to the cluster and the took value, which is the server-side execution time in milliseconds. This helps clarify if the root cause is the query itself, the network, or Kibana's rendering.

Query activity

The Query activity page in Kibana provides a real-time view of all in-flight search work in your cluster, including ES|QL, Query DSL, EQL, and SQL queries. Use it to find long-running queries, trace them back to their source in Kibana, and cancel them when needed.

Profile API responses

When running an ES|QL query through the ES|QL query API, set the profile body parameter to true to include a profile object in the response. The profile output is intended for human debugging and can help identify which parts of a query contribute to its runtime. The response format can change at any time, so use it for investigation rather than automation.

Query logging

Query logging captures Query DSL, EQL, KQL, and ES|QL queries that exceed configurable duration thresholds and stores them in a managed data stream for analysis. This is the recommended way to log slow queries. To configure it, refer to Query logging.

For clusters on earlier versions, a legacy ES|QL-specific query log writes slow queries to a _esql_querylog.json file in the Elasticsearch log directory. To configure it, refer to ES|QL query log.

Task management API

The task management API lets you monitor and cancel long-running ES|QL queries.

List running ES|QL tasks:

				GET _tasks?actions=*esql*&detailed

Cancel a specific task:

				POST _tasks/<task_id>/_cancel

Inspect query logs:
- (recommended) Query logging : Log all query types through a managed data stream
- ES|QL query log : Log ES|QL queries to a file on each node
Circuit breaker settings: Configure ES|QL memory limits and troubleshoot circuit breaker errors
ES|QL task management: Monitor and cancel long-running queries
Approximate STATS queries : Trade exact results for faster aggregations on large datasets
Filing a support case: Learn what to include when reporting a slow or failing query
Explicit mapping: Control which fields are indexed and how
ES|QL for search: Use ES|QL for full-text search, vector search, and AI-powered retrieval
doc_values: Learn how columnar storage enables fast sorting, aggregations, and field reads
_source: Learn how the original JSON document is stored with each record