Loading

Optimize ES|QL query performance

This guide covers practical techniques for writing fast ES|QL queries and operating ES|QL workloads at scale. It starts with common anti-patterns, then shows how to reduce scanned data, reduce returned data, and avoid expensive operations. It also covers tools for monitoring query performance and investigating slow queries across multiple clusters.

Tip

For a quick overview of the most common issues with ES|QL queries, refer to Common anti-patterns.

This guide assumes familiarity with ES|QL syntax and command pipelines. To learn the basics, refer to Get started with ES|QL queries.

This guide serves two audiences:

If you're not on Elastic Cloud Serverless, check your Elastic Stack version. The ES|QL query engine improves with each release, so upgrading is often one of the highest-impact performance changes you can make.

Some tips on this page require a recent version of the Elastic Stack, and individual subsections carry an applicability badge when this is the case. Sections without a version badge apply to all versions where ES|QL is generally available.

The most important version-specific performance improvements are visible in the table below, including improvements in 9.x for query logging, time series support, query activity, and full-text search functions.

For clusters on a version before 8.17, upgrading provides the largest single performance improvement, because full-text search functions and Lucene pushdowns become available. For clusters on 8.17 but before 8.18, upgrading to 8.18 provides the next largest improvement. That release adds LIKE and RLIKE pushdown to Lucene and a mapping discovery optimization that reduces overhead on clusters with many indices.

Query performance starts at index time. Your field mappings control what ES|QL can do efficiently.

If you only aggregate or sort on a field, and never filter, set index: false to save disk space. ES|QL can still read the field through doc values. Keep doc values enabled for fields that ES|QL needs to read, group, sort, or return. For fields that are rarely needed in results, remove them from the query output with KEEP or DROP.

ES|QL enforces memory limits through circuit breakers. When a query exceeds the limit, the cluster rejects it to protect node stability. High-cardinality aggregations are the most common trigger. To learn more, refer to Avoid high-cardinality STATS BY.

These anti-patterns are the most common causes of ES|QL query latency in production.

Tip

AutoOps detects most of these patterns automatically and surfaces actionable recommendations. To browse detected events, refer to AutoOps events.

Pattern What to look for Why it's slow
Broad index pattern FROM * or wide wildcards Expensive mapping discovery, plus scans across many indices
Wide time range @timestamp range spanning weeks or months Scans proportionally more data
Missing WHERE No filter conditions at all Full index scan
Missing KEEP No column selection Returns all fields, producing large payloads
Missing LIMIT Unbounded result set Slow serialization, can trigger deserialization errors in Kibana
High-cardinality STATS BY Grouping by raw timestamps, full URLs, or document IDs Produces millions of buckets, can trip circuit breakers
LIKE or RLIKE Wildcard or regex text matching Slower than full-text functions for text search, especially pre 8.18/9.0
GROK or DISSECT Text parsing on large datasets CPU-intensive regex or tokenization per row
CASE Conditional aggregation through CASE Lazy evaluation, slow
LOOKUP JOIN Join against a large lookup index Cost is proportional to the lookup index size
Tip

The most impactful fixes are usually: add a time range filter, add a WHERE, and add a KEEP.

Most ES|QL queries spend the bulk of their time reading data from disk. The fastest queries read the least data. This section covers the levers that most directly control scan size.

A tight time range is the single biggest performance lever in most workloads. ES|QL uses the @timestamp field to skip entire shards and segments that fall outside the range, so a narrower window directly reduces the amount of data read.

Avoid running queries that span more time than the result actually needs:

Don't: Query without a time bound

FROM logs-*
| WHERE host.name == "web-01"
| STATS count = COUNT(*) BY log.level
		

Do: Add an explicit @timestamp filter to bound the scan

FROM logs-*
| WHERE @timestamp > NOW() - 1 day
  AND host.name == "web-01"
| STATS count = COUNT(*) BY log.level
		
  1. Bound the scan to a tight window

In Kibana, the time picker automatically applies a time range filter. When writing queries directly in the Kibana Console or through the API, always include an explicit @timestamp filter.

A WHERE clause earlier in the pipeline reduces the dataset before downstream commands process it. Conditions on indexed fields such as keyword, numeric, date, ip, geo_point, geo_shape, cartesian_point, or cartesian_shape types are pushed down to Lucene, which skips irrelevant documents entirely.

Without a WHERE, ES|QL scans every document in the matched indices:

Don't: Filter after the aggregation

FROM logs-*
| STATS count = COUNT(*) BY host.name, log.level
| WHERE log.level == "error"
		

Do: Push the filter up so it runs before the aggregation

FROM logs-*
| WHERE @timestamp > NOW() - 1 day
  AND log.level == "error"
| STATS count = COUNT(*) BY host.name
		
  1. Filter pushed to Lucene

A broad FROM * forces ES|QL to discover field mappings across every index and then query each one. On clusters with thousands of indices, that discovery overhead alone can dominate query time. In Serverless, FROM * can also expand to all linked projects in cross-project search. Use project_routing to limit a cross-project query to the projects that it actually needs.

Don't: Use wildcards that match more indices than the query needs

FROM *
| WHERE @timestamp > NOW() - 1 hour
  AND event.category == "authentication"
| STATS failures = COUNT(*) BY user.name
		

Do: Target a specific index pattern instead

FROM logs-system-*
| WHERE @timestamp > NOW() - 1 hour
  AND event.category == "authentication"
| STATS failures = COUNT(*) BY user.name
		

When a query genuinely needs multiple patterns, list them explicitly with FROM. For example:

FROM logs-system-*, logs-auth-*
		

For time series data streams (TSDS), use TS rather than FROM. TS understands time series structure, including dimensions, metrics, and time ordering, and skips data more efficiently than FROM paired with WHERE. It also unlocks time series functions such as RATE and bucketing through TBUCKET.

Don't: Query TSDS indices through FROM when you intend to aggregate metrics

FROM metrics-system.cpu-*
| WHERE @timestamp > NOW() - 1 hour
| STATS avg_cpu = AVG(system.cpu.user.pct) BY host.name, bucket = DATE_TRUNC(5 minutes, @timestamp)
		

Do: Use TS with TBUCKET for time series metrics

TS metrics-system.cpu-*
| STATS avg_cpu = AVG(AVG_OVER_TIME(system.cpu.user.pct))
        BY host.name, TBUCKET(5 minutes)
		
  1. Inner-then-outer aggregation pattern
  2. TBUCKET replaces DATE_TRUNC under TS
Important

TS only works on indices created as time series data streams. For non-TSDS indices, continue to use FROM.

Every column returned has to be read from storage, serialized, and transmitted. Shrinking the result set, by returning fewer columns or rows, often produces significant gains on large indices.

KEEP selects which columns to return. DROP does the inverse. Without either, ES|QL returns every field in every matching document. This is the single biggest source of avoidable overhead on indices with hundreds or thousands of fields.

Don't: Return every field by default

FROM logs-*
| WHERE @timestamp > NOW() - 1 hour AND log.level == "error"
| SORT @timestamp DESC
| LIMIT 100
		

Do: Project only the fields the consumer actually needs

FROM logs-*
| WHERE @timestamp > NOW() - 1 hour AND log.level == "error"
| KEEP @timestamp, host.name, message, log.level
| SORT @timestamp DESC
| LIMIT 100
		
  1. Return only the four fields needed downstream
Tip

Use wildcards in KEEP sparingly. host.* is better than no KEEP at all, but host.name is better than host.* because it avoids pulling in adjacent fields.

When using the REST API on sparse datasets where many columns are null, consider setting the drop_null_columns query parameter. This removes columns that contain only null values from the response, which can significantly reduce serialization overhead.

Always include a LIMIT on queries that return raw rows. ES|QL appends a default limit of 1000 rows to every query. Reducing it with an explicit LIMIT is one of the simplest ways to speed up a query. Increasing it beyond the default makes serialization slower and can trigger deserialization errors in Kibana. The maximum configurable limit is 10,000 rows.

Don't: Leave the result set unbounded

FROM logs-*
| WHERE @timestamp > NOW() - 1 day AND log.level == "error"
| SORT @timestamp DESC
		

Do: Cap the result to the rows the consumer actually needs

FROM logs-*
| WHERE @timestamp > NOW() - 1 day AND log.level == "error"
| SORT @timestamp DESC
| LIMIT 100
		

Some ES|QL operations are intrinsically more expensive than their alternatives. Knowing the cheaper substitute, and when it applies, often replaces a slow query with a fast one. The subsections below are ordered roughly by impact, with the highest-leverage changes first.

For text search, prefer MATCH, MATCH_PHRASE, QSTR, or KQL over LIKE or RLIKE. The full-text search functions use the inverted index and are optimized for analyzed text. LIKE and RLIKE are pattern-matching operators. Pre 8.18/9.0 they are especially costly because they are not pushed down to Lucene. Leading wildcards (for example *something) are particularly expensive because they cannot use the inverted index efficiently.

Don't: Use pattern matching on free text with LIKE

FROM logs-*
| WHERE @timestamp > NOW() - 1 hour
  AND message LIKE "*connection refused*"
		

Do: Use MATCH_PHRASE against the inverted index

FROM logs-*
| WHERE @timestamp > NOW() - 1 hour
  AND MATCH_PHRASE(message, "connection refused")
		
  1. Inverted index lookup, not a row scan

MATCH works on text and keyword fields. Use MATCH_PHRASE when the words must appear together in order. For Lucene query syntax with field:value and boolean operators, use QSTR. For Kibana Query Language syntax, use KQL.

Tip

To learn more about using ES|QL for search use cases, refer to ES|QL for search.

Each unique combination of BY values creates a bucket in memory. Grouping by high-cardinality fields such as raw timestamps, full URLs, or document IDs, or by many fields at once, can produce millions of buckets.

Warning

High-cardinality groupings can exhaust memory and trip circuit breakers. Always bucket timestamps and choose the lowest-cardinality representation of a field that still answers the question.

Don't: Group by raw, high-cardinality fields

FROM logs-*
| WHERE @timestamp > NOW() - 1 day
| STATS count = COUNT(*) BY url.full, user.name, @timestamp
		

Do: Reduce cardinality before grouping

FROM logs-*
| WHERE @timestamp > NOW() - 1 day
| STATS count = COUNT(*) BY url.path, user.name, bucket = DATE_TRUNC(1 hour, @timestamp)
		
  1. Uses url.path instead of url.full, bucketed timestamps instead of raw

Common reductions include: bucketing timestamps with DATE_TRUNC or BUCKET, using url.path instead of url.full, and filtering to a known subset before the STATS.

ES|QL reads field values through a block-loading system that strongly prefers doc values. Fields with doc values, such as keyword, numeric, date, ip, geo_point, geo_shape, cartesian_point, and cartesian_shape types, are read in fast columnar batches. Fields without doc values, such as text and match_only_text, fall back to reading _source, which requires decompressing and parsing the full JSON document per row. This applies to any operation that reads the field value, including filtering, grouping, sorting, and returning fields through KEEP.

If an exact .keyword subfield exists, the query planner automatically rewrites expressions to use it, so message and message.keyword perform the same in that case. However, if the text field has no keyword subfield, or if the subfield is not exact (for example, it uses ignore_above), the planner cannot rewrite and falls back to reading from _source, which is significantly slower. When no exact subfield is available, filter aggressively to limit the number of documents that require _source reads.

Don't: Group by an analyzed field

FROM logs-*
| WHERE @timestamp > NOW() - 1 hour
| STATS count = COUNT(*) BY message
		

Do: Use the .keyword subfield

FROM logs-*
| WHERE @timestamp > NOW() - 1 hour
| STATS count = COUNT(*) BY message.keyword
		

For free-text grouping, CATEGORIZE groups similar messages automatically:

FROM logs-*
| WHERE @timestamp > NOW() - 1 hour
| STATS count = COUNT(*) BY category = CATEGORIZE(message)
| SORT count DESC
| LIMIT 20
		

The spatial types geo_point, geo_shape, cartesian_point, and cartesian_shape are maintained at source precision in the original documents, but indexed at reduced precision by Lucene for performance. Reading spatial values from doc values is fast and usually precise enough. Returning the original spatial field preserves source precision, but requires reading from _source, which is slower. To prioritize performance, drop original spatial fields from the result unless the query consumer needs the exact original value.

To learn more, refer to Spatial precision.

For conditional aggregations, attach a WHERE clause directly to each STATS expression rather than wrapping values in CASE. CASE is lazy-evaluated and slow for this pattern.

Don't: Emulate conditional aggregations through CASE and SUM

FROM logs-*
| WHERE @timestamp > NOW() - 1 day
| EVAL is_error = CASE(log.level == "error", 1, 0)
| EVAL is_warn = CASE(log.level == "warning", 1, 0)
| STATS
    total = COUNT(*),
    errors = SUM(is_error),
    warnings = SUM(is_warn)
  BY service.name
		

Do: Compute each conditional metric directly with a per-aggregation WHERE

FROM logs-*
| WHERE @timestamp > NOW() - 1 day
| STATS
    total = COUNT(*),
    errors = COUNT(*) WHERE log.level == "error",
    warnings = COUNT(*) WHERE log.level == "warning"
  BY service.name
		
  1. Per-aggregation filter, no CASE needed
  2. One filter per metric

GROK uses regular expressions, which are CPU-intensive per row. DISSECT uses delimiter-based tokenization and is much cheaper. When the log format uses consistent delimiters, prefer DISSECT. When you must use GROK, filter aggressively first to shrink the dataset.

Don't: Use regex parsing when a delimiter is available

FROM logs-*
| WHERE @timestamp > NOW() - 1 hour
| GROK message "%{TIMESTAMP_ISO8601:ts} %{LOGLEVEL:level} %{GREEDYDATA:msg}"
		

Do: Use DISSECT for delimiter-based formats

FROM logs-*
| WHERE @timestamp > NOW() - 1 hour
| DISSECT message "%{ts} %{level} %{msg}"
		
  1. Delimiter tokenization, no regex engine

LOOKUP JOIN combines each incoming row with matching rows from a lookup index. Joining fewer incoming rows is usually faster, and large lookup matches can increase memory pressure.

Filter the source data before joining, and keep the lookup index as small and purpose-built as possible:

FROM logs-*
| WHERE @timestamp > NOW() - 1 hour
  AND event.category == "network"
| LOOKUP JOIN threat_list ON source.ip
| KEEP @timestamp, source.ip, threat_list.risk, event.action
		

ES|QL tries to push filters before the join when possible. Write the query with the selective filters before LOOKUP JOIN so the intended execution order is clear and the join receives the smallest practical input.

For large STATS queries, exact results can be expensive. If approximate results are acceptable, approximate STATS queries can trade exactness for much faster execution on large datasets.

Approximation is useful for exploratory analysis, dashboard panels, and high-cardinality aggregations where a close estimate is enough. Use exact aggregations when the result feeds billing, compliance, alerting, or other workflows that require precise values.

Once a query is written, several tools help confirm whether it is actually fast and identify regressions over time. When reviewing query logs, scan for common anti-patterns first.

In Discover or within a dashboard, select Inspect to see the ES|QL query sent to the cluster and the took value, which is the server-side execution time in milliseconds. This helps clarify if the root cause is the query itself, the network, or Kibana's rendering.

The Query activity page in Kibana provides a real-time view of all in-flight search work in your cluster, including ES|QL, Query DSL, EQL, and SQL queries. Use it to find long-running queries, trace them back to their source in Kibana, and cancel them when needed.

When running an ES|QL query through the ES|QL query API, set the profile body parameter to true to include a profile object in the response. The profile output is intended for human debugging and can help identify which parts of a query contribute to its runtime. The response format can change at any time, so use it for investigation rather than automation.

Query logging captures Query DSL, EQL, KQL, and ES|QL queries that exceed configurable duration thresholds and stores them in a managed data stream for analysis. This is the recommended way to log slow queries. To configure it, refer to Query logging.

For clusters on earlier versions, a legacy ES|QL-specific query log writes slow queries to a _esql_querylog.json file in the Elasticsearch log directory. To configure it, refer to ES|QL query log.

The task management API lets you monitor and cancel long-running ES|QL queries.

List running ES|QL tasks:

				GET _tasks?actions=*esql*&detailed
		

Cancel a specific task:

				POST _tasks/<task_id>/_cancel