Introduction to ESQL — A new query language for flexible, iterative analytics

blog-thumb-elevate-our-work-1680x980.png

The Elastic Platform has long been well-regarded as an analytical system for search use cases and machine-generated data. Analytics are focused on processing data-as-ingested, where significant thought is put into how to optimally structure data as it is indexed in Elasticsearch. Kibana exposes Elasticsearch aggregations and uses them to create interactive dashboards, visualizations, and alerts.

But as the Elastic Platform is more widely adopted as a search, security, observability, and general analytics platform, analyst users require the ability to take data-as-ingested, transform it to fit their investigative needs post-ingestion, and derive insights from underlying Elasticsearch index data. They need a concise, integrated, and efficient workflow enabled by rich and expressive queries where search, filter, aggregation, and transformation are performed via a single query expression with little-to-no UI context switching.

To solve these challenges, the Elastic team is currently developing the Elasticsearch Query Language (ESQL). ESQL provides Elastic users with a flexible, powerful, and robust query expression language to interrogate data. ESQL also provides a superior query UX with post-ingest processing capabilities that fundamentally transforms and expands the analytics and data processing capabilities of Elasticsearch.

New query and aggregation compute architecture 

ESQL is more than a language. It represents a significant investment in new compute capabilities within Elasticsearch. To achieve both the functional and performance requirements for ESQL, it is necessary to build an entirely new compute architecture. ESQL search, aggregation, and transformation functions are directly executed within Elasticsearch itself. Query expressions are not transpiled to QueryDSL for execution. Rather, we have built native support for ESQL functions within Elasticsearch.

ESQL introduces distributed compute capabilities to users in disparate roles and with varying skill levels. These compute capabilities enable ESQL to simplify user workflows in several key ways.

With ESQL you can:

  • Utilize a superior query UX: ESQL query expressions support complex analytics and data processing. They are easy to learn, read, and share.
  • Use the filter, aggregation, and transformation capabilities of Elasticsearch with subqueries and lookups, made possible by new Elasticsearch compute and data processing capabilities.
  • Use ESQL across Kibana in Discover, Kibana Lens, and Elastic Solutions, giving you seamless workflows. You will be able to visualize ESQL queries, share them with teams on dashboards or as queries, and use queries to create custom alerts.

How you can use ESQL 

ESQL is a piped query language wherein users process Elasticsearch data via a sequence of commands, delimited by pipes. The output of one command becomes the input for the next to define a logical data pipeline. ESQL expressions are linear, logical, and easily readable. They are simple enough to be authored, used, and modified by analysts of all experience levels. A simple example:

search index_name
| eval field_c = (field_a + field_b)
| sort field_c desc

The above expression retrieves all data from the index, creates a new fieldC for each record that is the sum of fieldA and fieldB. Finally, the results are sorted on fieldC.

Use ESQL for security

ESQL is particularly useful for ad-hoc threat hunting by security analysts. An analyst may begin by querying log data to show the unique processes spawned by “powershell.exe” and sorting by the string length of the command line arguments.

from winlog
  | where host.os.family == ‘windows’
  | where process.name == "powershell.exe"
  | unique process.command_line
  | sort len(process.command_lin) desc
  | limit 3

host.os.family

process.name

process.command_line

windows

powershell.exe

(get-acl \\smb_file\share).access | ft IdentityReference,FileSystemRights,AccessControlType,IsInherited,InheritanceFlags -auto

windows

powershell.exe

Get-ADComputer -property * -filter { ipv4address -eq ‘172.16.0.3’}

windowspowershell.exeGet-ADGroupMember -identity Helpdesk

The results show that PowerShell is being used to retrieve file system information and also information about Active Directory. This could be normal system behavior, but it could also indicate malicious activity.

To further investigate, the query is modified to filter for Active Directory and file system-related command line arguments. It then counts the unique values of process.command_line and groups by hostname.

from winlog
  | where host.os.family == ‘windows’
  | where process.name == "powershell.exe"
  | where process.command_line in (‘*get-acl*’, ‘*Get-AD*’)
  | stats count(unique process.command_line) as cl_count by hostname
  | sort cl_count desc
  | limit 3

cl_count

hostname

155

host2

74

host1

67host3

The results show that host2 has far more powershell processes invoking file and AD-related command-line arguments than other hosts. An analyst can continue modifying and extending the ESQL query expression to determine if there is malicious activity by host2. This investigation ultimately leads to an understanding of the threat vector such that both remedial action can be taken and how this threat can be prevented in the future.

Use ESQL for search

Elasticsearch is and always will be, you know, for search. Therefore, ESQL supports the search, relevance, and ranking functions that have long-been a part of Elasticsearch. ESQL makes it extremely simple to access the full power of these search functions.

Consider a simple terms aggregation where we want to produce buckets with the top three terms in the genre field, sorted by document count.

from music
| stats terms((genre), 3, doc_count, unwind)
| sort doc_count desc

doc_count

term

6

electronic

3

rock

2jazz

Bucketing aggregations such as date histogram are also supported. In this query, we create a histogram from the sales index using intervals of 50 based on the price field.

from sales
| stats histogram(price, 50)
| sort bucket desc

doc_count

bucket

3

200

2

150

0100
150
10

ESQL also makes it extremely simple to process data in a way that is similar to pipeline aggregations today. For example, let’s say you want to calculate the derivative of a derivative. In its most simple form:

from sales
| eval (stats derivative(sales)) as fist_der
| eval (stats derivative(first_der)) as second_der

Use ESQL for observability

Site reliability engineers (SREs) face a challenge in dealing with massive amounts of data. They are responsible for using this data to prevent and remediate system downtime and other related issues. They monitor thousands of systems that generate important trace, log, and metric data. This data is then used by the SREs to identify problems and implement measures to prevent system or application interruptions in the future. So, for an SRE, the ability to analyze system behavior with a combined understanding from multiple data sets is essential.  

Observability ingested data is inherently unpredictable. ESQL provides the means for SREs to correlate and reshape data to open up deeper insights into system and application behavior. It expands their ability to perform post-hoc analysis after an issue is identified, and this invaluable insight is directly used to prevent similar issues in the future. 

The section on data processing below will use Observability examples.

Explore a whole new way of processing data with new Elasticsearch compute capabilities

ESQL expressions are both flexible and powerful in unlocking insights from index data. But the heavy lifting behind these expressions happens in Elasticsearch itself. We have built a new compute engine to support the data processing functionality exposed by ESQL. Most notably, improved post-ingest data processing, intermediate data states, lookup functions, and subqueries. 

ESQL relies entirely on the new compute and processing capabilities in Elasticsearch. ESQL is not interpreted and executed via transpilation. Rather, ESQL expressions are processed entirely within Elasticsearch itself. 

Post-ingest processing

ESQL, as integrated throughout Kibana, makes it quick and easy to access the most common, useful aggregations and projections. Imagine working with an index of metrics data. An analyst may want to calculate ratios or perform aggregations on the ingested data. With ESQL, these can be easily derived from an underlying index.

from network_flow 
| stats count(*) filter (where transport == ‘udp’) as udp_count
| stats count(*) filter (where transport == ‘tcp’) as tcp_count
| eval total_transport_events = udp_count + tcp_count
| eval udp_per_total = udp_count / total_transport_events

By enabling the quick and easy aggregation and transformation of underlying indices, ESQL opens up significant new insights as analysts shape and explore the data. This simplifies data ingestion, as ESQL allows analysts to derive new structures and insights from a broad set of underlying indices.

Data pipelines and intermediate data

Another key part of supporting ESQL expressions is handling data in intermediate states. The ability to handle and modify data as it passes through distinct pipeline stages is core to the processing capabilities exposed by ESQL.  

Consider the following query, where we search the metrics index to find the five hostnames with the highest peak cpu utilization.

from metrics
| stats max(system.process.cpu.total.pct) as max_cpu by hostname
| where max_cpu > .800 and hostname == '*web*' 
| sort max_cpu desc
| limit 5

Each stage of this data pipeline produces tabular output. The initial from metrics command retrieves all index data. This table is then aggregated on system.process.cpu.total.pct and grouped by hostname, resulting in a unique table. These tabular results are then filtered and sorted to produce the desired output.

max_cpu

hostname

.989

1webapache

.978

1websftp

.964nfsweb
.9552webredis
.943web_staging_primary

This output can then be used as the basis for visualizations or alerts.

Lookups and subqueries

ESQL also introduces lookup and subquery functionality to Elasticsearch. 

ESQL can lookup values in a separate lookup index. This is most commonly used to enrich results at query time. Lookups are similar to SQL left joins in that they return fields from a foreign index using a specified key.

For example, a lookup index contains information on system users identified by a unique key. An ESQL expression can look up data in these indices to return this foreign data in results. This query enriches results from the access_logs index with user data from the user_info_lookup index. Specifically, the email and state fields from the lookup index are to be returned.

from access_logs where user != 'root'
| lookup user_info_lookup['email', 'state'] on userid

userid

[ … access_logs … ]

emailstateuserid

3455

bobNY3455

Subqueries enable users to embed distinct queries as arguments within other queries. For example, an SRE may want to use an aggregation as an argument in a query against the same index. Here, an SRE is using the count of total logins to calculate the % of logins by a specific user.

from user_login where userid = 1234
| eval stats count(*) as 1234_logins
| eval total_logins = [from logs where userid = *| stats count(*) as  total_logins]
| eval round((1234_logins / total_logins), 2) as 1234_pct

By using lookups and subqueries, analysts and SREs alike can leverage all of their Elasticsearch to generate incredibly rich data structures and unlock new insights from data in Elasticsearch.

ESQL fundamentally changes and improves how you interact with data in the Elastic Search Platform. ESQL unlocks the value of your data with powerful capabilities to quickly transform and join large data sets; to search, filter, and process vast amounts of data; and ultimately, to reduce response and resolution time. We are looking forward to seeing you put it to work!

Join us on this analytical journey 

Excited about ESQL, transforms and joins, and using multi-step queries? So are we. Keep an eye out for more news as the Elastic team continues to develop and prepare these capabilities for release.

Interested in trying out this solution before everyone else? You can reach out to us on the discussion forums or on our Elastic community Slack channel. We would love your feedback to help shape the direction of our new query language, compute engine, and query-based investigative workflows.

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.

  • We're hiring

    Work for a global, distributed team where finding someone like you is just a Zoom meeting away. Flexible work with impact? Development opportunities from the start?