Elasticsearch piped query language, ES|QL, now generally available

ES_QL_blog

Today, we are pleased to announce the general availability of ES|QL (Elasticsearch Query Language), a dynamic language designed from the ground up to transform, enrich, and simplify data investigations. Powered by a new query engine, ES|QL delivers advanced search using simple and familiar query syntax with concurrent processing, enhancing speed and efficiency regardless of the data source and structure. 

With ES|QL's piped syntax, users can easily chain multiple operations, simplifying complex data investigations and making querying more intuitive and iterative. To security and observability users, ES|QL will feel both familiar and innovative for exposing Elasticsearch's advanced search capabilities with an easy-to-use query language. Integrated with Kibana, ES|QL enhances the data visualization and analysis experience enabling users to conduct their entire investigation on one screen, without switching between multiple windows. 

With continuous development, we aim to establish ES|QL as a versatile language for all Elasticsearch use cases, including retrieval augmented generation (RAG). Integrating RAG with geospatial capabilities and ES|QL will enhance query accuracy from diverse data sources. The combination of ES|QL and the new Search AI Lake architecture provides enhanced scalability, cost efficiency, and simplified management by automatically adjusting resources based on demand. Decoupling compute from storage and index from search improves performance and flexibility, ensuring faster data retrieval and investigations across vast amounts of data.

ES|QL will be a differentiator for teams facing increasing observability and security demands. This article will dive into the various benefits and ways you can use ES|QL for your own use cases.

Advancements in Elasticsearch

For over 14 years, QueryDSL has served as the foundational language in Elasticsearch, delivering search, observability, and security to numerous organizations. As user needs evolved, it became clear that they required more than what QueryDSL alone could provide. They sought a query language that could not only simplify and streamline data investigations but also enhance the querying experience by integrating searching, enrichment, aggregation, and visualization into a singular, efficient interface. They desired advanced search capabilities, including lookups with concurrent processing to handle vast data volumes from varied sources and structures. 

In response, we developed the Elasticsearch Query Language (ES|QL), drawing inspiration from vectorized query execution and other database technologies. With ES|QL, users can utilize a familiar pipe ('|') syntax to chain operations, allowing for transformative and detailed data analysis.

FROM logs-system.auth*
| WHERE host.os.type == "linux"
  AND event.outcome == "success"
  AND system.auth.ssh.event == "Accepted"
  AND (user.name IS NOT NULL
       AND source.ip IS NOT NULL
       AND source.port IS NOT NULL
       AND system.auth.ssh.method IS NOT NULL)
| STATS auth_count = COUNT(*) BY user.name, source.ip,
  source.port, system.auth.ssh.method
| SORT auth_count DESC

Powered by a robust query engine, ES|QL offers advanced search capabilities with concurrent processing across cores and nodes, enabling users to query across diverse data sources and structures seamlessly.

There is no translation or transpilation to Query DSL; each ES|QL query is parsed, analyzed, and validated for semantics and optimized into an execution plan executed in parallel on the relevant nodes holding the data. The target nodes handle the query, making on-the-fly adjustments to the execution plan using the framework provided by ES|QL. The result is lightning-fast queries that you get out of the box. 

The road to GA

Since its introduction in 8.11, ES|QL has been on a journey of refinement and enhancement. The beta phase allowed our engineering team to gather valuable feedback from the community, enabling us to iterate and address the top needs of our users. Throughout this process, we enhanced ES|QL's capabilities while ensuring stability, performance, and seamless integration into core data exploration and visualization UX and workflows you use daily. Here are some features that brought ES|QL to general availability.

Stability and performance

We have been busy enhancing the dedicated ES|QL query engine to ensure it maintains robust performance under load, safeguarding the stability of the running node. To wit, see below the improvements in grouping in the last 6 months (for more tests and exact details about the underlying change see the dedicated benchmark page).

Performance_ESQL

Additionally, we've implemented memory tracking for precise resource management and conducted thorough stress tests, including the rigorous HeapAttack, to ensure that memory usage is carefully monitored during resource-intensive queries. Our circuit breakers are also in place to prevent OutOfMemoryErrors (OOMEs) on large and small heap sizes nodes.

Visualize data in Kibana Discover in a whole new way with ES|QL

ES|QL together with Elastic AI assistant 

We are excited about bringing generative AI and ES|QL together by first integrating them into the Observability and Security  AI assistant, allowing users to input natural language translated into ES|QL commands for an easy, iterative, and smooth workflow.

Visualize and perform ES|QL queries or edit them using the inline editing flyout, and seamlessly embed them into dashboards. This enhancement shortens the workflow by allowing in-line visualization editing when creating charts, making it easier for users to manage and save their visualizations directly within the assistant. 

Delivering significant improvements in query generation and performance. Users can now use natural language to visualize ES|QL queries, edit them using the inline editing flyout, and seamlessly embed them into dashboards. This enhancement shortens the workflow by allowing in-line visualization editing when creating charts, making it easier for users to manage and save their visualizations directly within the assistant.

Video Thumbnail

Create and edit ES|QL charts directly from the Kibana dashboard

Streamline your workflow and deliver quick insights into your data by creating and modifying charts built with ES|QL directly from within the Kibana Dashboard. You can also perform inline editing of the ES|QL query while in the chart to adapt to changes in troubleshooting or threat hunting quickly. 

Video Thumbnail

ES|QL Query History

It can be frustrating to repeat yourself and equally annoying if you need to rerun a query you executed a few moments ago. Now with ES|QL, you can quickly access recent queries with ES|QL query history. View, re-run your last 20 ES|QL queries directly within Kibana Discover, ES|QL visualizations, Kibana alerts, or Kibana maps for quick and easy access.

es_ql_discover

Hybrid planning and dynamic data reduction

For large Elasticsearch deployments, we have been testing ES|QL across hundreds of nodes and up to hundreds of thousands of shards and fields to ensure that query performance consistently remains performant as the cluster grows and more nodes are added.

We have extended ES|QL ability to perform hybrid planning to better deal with the dynamic nature of the data (whether it’s new fields added or new segments) and exploit the local data patterns particular to each node:

Coordinating_Node

After the coordinating node (that receives the ES|QL query and drives its execution) performs global planning based on the global view of the data, it broadcasts the plan to all data nodes that can execute the plan. However, before executing, each node changes the plan locally based on the actual storage statistics individual to each node. A common scenario is early filter evaluation in sparse mappings due to the schema evolution.

We are proactively developing a dynamic data reduction technique for scenarios with large shard sizes that minimize I/O traffic between the coordinator and data nodes, as well as reducing the duration that Lucene readers remain open during queries. This approach, which includes sharing intermediate results, shows great promise in enhancing the efficiency and runtime of queries across multiple shards. Stay tuned for more information about query execution and architecture in future blogs.

Async querying

Async querying empowers users to run long-running ES|QL queries asynchronously. Clients no longer have to wait idly for results; instead, they can monitor progress and retrieve data once it's ready. By utilizing the wait_for_completion_timeout parameter, users can tailor their experience, choosing whether to wait synchronously or switch to asynchronous mode after a specified timeout. This enhancement not only offers greater flexibility but also optimizes resource management, ensuring a smoother and more efficient querying process for our users

Long-running ES|QL queries can be executed asynchronously so the client can monitor the progress and retrieve the results when available instead of blocking for them:

POST /_query/async
{
  "query": """
    FROM library
    | STATS MAX(page_count) BY year = BUCKET(release_date, 1 year)
    | SORT year
    | LIMIT 5
  """,
  "wait_for_completion_timeout": "2s"
}

Through the wait_for_completion_timeout clients can pick a comfortable timeout to wait for the result (and have synchronous behavior) before switching to an asynchronous one.

Improved language and ergonomics 

We've streamlined the STATS command to offer greater flexibility and simplicity in data analysis. Previously, users had to resort to additional EVAL commands for arbitrary computations alongside aggregations and groupings which required a separate EVAL command:

FROM company
// use eval to manipulate the grouping column and
// create a conditional for data sanitization
| EVAL g = tenure % 10, trips = COALESCE(trips, 0) 
| STATS avg_trips = AVG() BY g

This restriction is no longer necessary as aggregations accept expressions (and themselves can also be combined) directly inside the STATS command, eliminating the need for extra EVALs and column pollution due to temporary fields:

FROM company
| STATS avg_trips = AVG(COALESCE(trips, 0)) BY g = tenure %10

Date time units 

ES|QL now boasts improved support for datetime filtering. Recognizing the common need for date-time arithmetic in filtering tasks, ES|QL now supports abbreviated units, making queries more intuitive and efficient. For example, users can now easily specify date ranges using familiar abbreviations like 'year,' 'month,' and 'week.' 

FROM index
| WHERE @timestamp > now() - 1 year + 1 month + 1 week

This update simplifies query construction, enabling users to express datetime conditions more succinctly and accurately.

Implicit data type conversion for string literals

To minimize the friction of creating dedicated types (such as dates) from string declarations, ES|QL now performs implicit conversions of string constants to their target type by using the built-in conversion functions:

FROM index
// convert the declared strings to ip and date-time
| WHERE host_ip in ("127.0.0.1", "::1")
  AND access_time > "2024-05-15T12:34:56Z"

Note that

  • Only constants (or literals) are candidates for conversions, columns are ignored - the user has to use conversion functions for those explicitly.
  • Converting string literals to their numeric equivalent is NOT supported, as these can be directly declared as such; that is “1” + 2 will throw an error, simply declare the expression as 1+2 instead.

Native ES|QL clients

While ES|QL is universally available through the _query REST endpoint, work is underway for offering rich, opinionated APIs for accessing ES|QL natively in various popular languages.

ESQL_Clients

While completing all the items above will take several releases, one can use ES|QL already through the regular Elasticsearch clients, for example, to access ES|QL results as Java or PHP objects and manipulate them as dataframes in Python; Jupyter users should refer to the dedicated getting started guide notebook.

Since the initial release as technical preview in 8.11, ES|QL has been making its way through various parts of the Elasticsearch ecosystem. Such as observability where it is used to streamline OTel operations using a specialized AI assistant. And if we had more time, we’d also mention the many other functions introduced, like multi-value scalar fields, geo-spatial analysis (both scalar and aggregate functions) and date time handling. 

ES|QL in Cross-Cluster Search in technical preview

Cross-cluster search in Elasticsearch enables users to query data across multiple Elasticsearch clusters as if it were stored in a single cluster, delivering unified querying, global insights, and many other efficiencies. Now, in technical preview, ES|QL with cross-cluster search capabilities extends its querying power to span across distributed clusters, empowering users to leverage ES|QL for querying and analyzing data regardless of its location all from a single UI. 

While ES|QL is available as a basic license at no cost, using ES|QL in cross cluster search will require an Enterprise level license. To use ES|QL in cross-cluster search, use the FROM command with the format <remote_cluster_name>:<target>, to retrieve data from my-index-000001 on the remote cluster. 

FROM cluster_one:my-index-000001
| LIMIT 10

Looking to the future

Search, embeddings and RAG

We are thrilled to share an exciting development: leveraging ES|QL for advanced information retrieval, including full-text search and AI/ML-powered exploration. Our team is dedicated to making ES|QL the optimal tool for scoring, hybrid ranking, and integrating with Large Language Models (LLMs) within Elasticsearch.

This dedicated command will streamline the retrieval process, enabling users to filter and score results. In the below example, we showcase a comprehensive search scenario, combining range filters, fast queries, and hybrid search techniques.

This is a preview of how it might look like, naming TBD (SEARCH or RETRIEVAL):

// dedicated search command
SEARCH images [
  // range filter
  | WHERE date > now() - 1 month
  // (fast) query for filtering and scoring (returned in _score column)
  | RANK MATCH "mountain lake"
  // filter by score
  | WHERE _score > 0.1
  // keep only top 100 results 
  | LIMIT 100
  // perform hybrid search on a user defined image vector using knn 
  | RANK KNN image user_image_vector K 10
]
// break down the results by rating and count the votes
| STATS c = COUNT(votes) BY rating
// return only the top 5 resuls
| LIMIT 5

For instance, the query above demonstrates retrieving the top 5 most popular images by rating, featuring the terms 'mountain lake' in their description and resembling a user-defined image vector. Behind the scenes, the engine intelligently manages filters, rearranges queries, and applies reranking strategies, ensuring optimal search performance.

This advancement promises to revolutionize information retrieval in Elasticsearch, offering users unparalleled control and efficiency in exploring and discovering relevant content.

Timeseries, metrics and O11y

Elasticsearch provides a dedicated solution for metrics through the timeseries data streams (TSDS), a powerful concept that can reduce disk storage by up to 70% by using specialized types and routing. 

We plan on leveraging fully these capabilities in ES|QL - first by introducing a dedicated command:

METRICS pods load=avg(cpu), writes=max(rate(indexing_requests)) BY pod
| SORT pod

Inline stats - aggregations without data reduction

The STATS command in ES|QL is invaluable for summarizing statistics, but it often poses a challenge when users want to aggregate data without losing its original context. For instance, if you wish to display the average category price alongside each individual t-shirt price, traditional aggregation methods can obscure the original data. Enter INLINESTATS: a feature designed to address this issue by performing 'inline' statistics. 

With INLINESTATS, users can compute statistics within each group and seamlessly integrate the results back into the original dataset, preserving the context of the originating groups. This powerful capability enhances the clarity and depth of statistical analysis in ES|QL, empowering users to derive meaningful insights while maintaining the integrity of their data.

FROM shop
// compute the average price for each category
// and add it under 'avg_price' column;
// the produced column is now available on all
// entries used to perform the inline grouping
| INLINESTATS avg_price = AVG(price) BY category

Get started today

The introduction of ES|QL marks a significant stride forward in Elastic's capabilities, offering users a powerful and intuitive tool for data querying and analysis. With its streamlined syntax, robust functionality, and innovative features, ES|QL opens up new avenues for users to unlock insights and derive value from their data. Whether you're a seasoned Elasticsearch user or just getting started, ES|QL invites you to explore, experiment, and experience the power of Elasticsearch Query Language firsthand.

Be sure to check out our demo playground full of examples or try on Elastic Cloud. Already have Elasticsearch running? Just upgrade your clusters to 8.14 and give it a try.

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.

Ready to try this out on your own? Start a free trial.
Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!