﻿---
title: ES|QL aggregation functions
description: The STATS and INLINE STATS commands support these aggregate functions: 
url: https://www.elastic.co/docs/reference/query-languages/esql/functions-operators/aggregation-functions
products:
  - Elasticsearch
applies_to:
  - Elastic Cloud Serverless: Generally available
  - Elastic Stack: Generally available
---

# ES|QL aggregation functions
The [`STATS`](https://www.elastic.co/docs/reference/query-languages/esql/commands/stats-by) and [`INLINE STATS`](https://www.elastic.co/docs/reference/query-languages/esql/commands/inlinestats-by) commands support these aggregate functions:
- [`ABSENT`](#esql-absent) <applies-to>Elastic Stack: Generally available since 9.2</applies-to>
- [`AVG`](#esql-avg)
- [`COUNT`](#esql-count)
- [`COUNT_DISTINCT`](#esql-count_distinct)
- [`MAX`](#esql-max)
- [`MEDIAN`](#esql-median)
- [`MEDIAN_ABSOLUTE_DEVIATION`](#esql-median_absolute_deviation)
- [`MIN`](#esql-min)
- [`PERCENTILE`](#esql-percentile)
- [`PRESENT`](#esql-present) <applies-to>Elastic Stack: Generally available since 9.2</applies-to>
- [`SAMPLE`](#esql-sample)
- [`ST_CENTROID_AGG`](#esql-st_centroid_agg) <applies-to>Elastic Stack: Preview</applies-to> <applies-to>Elastic Cloud Serverless: Preview</applies-to>
- [`ST_EXTENT_AGG`](#esql-st_extent_agg) <applies-to>Elastic Stack: Preview</applies-to> <applies-to>Elastic Cloud Serverless: Preview</applies-to>
- [`STD_DEV`](#esql-std_dev)
- [`SUM`](#esql-sum)
- [`TOP`](#esql-top)
- [`VALUES`](#esql-values) <applies-to>Elastic Stack: Preview</applies-to> <applies-to>Elastic Cloud Serverless: Preview</applies-to>
- [`VARIANCE`](#esql-variance)
- [`WEIGHTED_AVG`](#esql-weighted_avg)
- [`FIRST`](#esql-first)
- [`LAST`](#esql-last)


## `ABSENT`

<applies-to>
  - Elastic Stack: Generally available since 9.2
</applies-to>

**Syntax**
![Embedded](https://www.elastic.co/docs/reference/query-languages/esql/images/functions/absent.svg)

**Parameters**
<definitions>
  <definition term="field">
    Expression that outputs values to be checked for absence.
  </definition>
</definitions>

**Description**
Returns true if the input expression yields no non-null values within the current aggregation context. Otherwise it returns false.
**Supported types**

| field                   | result  |
|-------------------------|---------|
| aggregate_metric_double | boolean |
| boolean                 | boolean |
| cartesian_point         | boolean |
| cartesian_shape         | boolean |
| date                    | boolean |
| date_nanos              | boolean |
| dense_vector            | boolean |
| double                  | boolean |
| exponential_histogram   | boolean |
| geo_point               | boolean |
| geo_shape               | boolean |
| geohash                 | boolean |
| geohex                  | boolean |
| geotile                 | boolean |
| histogram               | boolean |
| integer                 | boolean |
| ip                      | boolean |
| keyword                 | boolean |
| long                    | boolean |
| tdigest                 | boolean |
| text                    | boolean |
| unsigned_long           | boolean |
| version                 | boolean |

**Examples**
```esql
FROM employees
| WHERE emp_no == 10020
| STATS is_absent = ABSENT(languages)
```


| is_absent:boolean |
|-------------------|
| true              |

To check for the absence inside a group use `ABSENT()` and `BY` clauses
```esql
FROM employees
| STATS is_absent = ABSENT(salary) BY languages
```


| is_absent:boolean | languages:integer |
|-------------------|-------------------|
| false             | 1                 |
| false             | 2                 |
| false             | 3                 |
| false             | 4                 |
| false             | 5                 |
| false             | null              |

To check for the absence and return 1 when it's true and 0 when it's false you can use to_integer()
```esql
FROM employees
| WHERE emp_no == 10020
| STATS is_absent = TO_INTEGER(ABSENT(languages))
```


| is_absent:integer |
|-------------------|
| 1                 |


## `AVG`

**Syntax**
![Embedded](https://www.elastic.co/docs/reference/query-languages/esql/images/functions/avg.svg)

**Parameters**
<definitions>
  <definition term="number">
    Expression that outputs values to average.
  </definition>
</definitions>

**Description**
The average of a numeric field.
**Supported types**

| number                  | result |
|-------------------------|--------|
| aggregate_metric_double | double |
| double                  | double |
| exponential_histogram   | double |
| integer                 | double |
| long                    | double |
| tdigest                 | double |

**Examples**
```esql
FROM employees
| STATS AVG(height)
```


| AVG(height):double |
|--------------------|
| 1.7682             |

The expression can use inline functions. For example, to calculate the average over a multivalued column, first use `MV_AVG` to average the multiple values per row, and use the result with the `AVG` function
```esql
FROM employees
| STATS avg_salary_change = ROUND(AVG(MV_AVG(salary_change)), 10)
```


| avg_salary_change:double |
|--------------------------|
| 1.3904535865             |


## `COUNT`

**Syntax**
![Embedded](https://www.elastic.co/docs/reference/query-languages/esql/images/functions/count.svg)

**Parameters**
<definitions>
  <definition term="field">
    Expression that outputs values to be counted. If omitted, equivalent to `COUNT(*)` (the number of rows).
  </definition>
</definitions>

**Description**
Returns the total number (count) of input values.
**Supported types**

| field                   | result |
|-------------------------|--------|
| aggregate_metric_double | long   |
| boolean                 | long   |
| cartesian_point         | long   |
| cartesian_shape         | long   |
| date                    | long   |
| date_nanos              | long   |
| dense_vector            | long   |
| double                  | long   |
| exponential_histogram   | long   |
| geo_point               | long   |
| geo_shape               | long   |
| geohash                 | long   |
| geohex                  | long   |
| geotile                 | long   |
| integer                 | long   |
| ip                      | long   |
| keyword                 | long   |
| long                    | long   |
| tdigest                 | long   |
| text                    | long   |
| unsigned_long           | long   |
| version                 | long   |

**Examples**
```esql
FROM employees
| STATS COUNT(height)
```


| COUNT(height):long |
|--------------------|
| 100                |

To count the number of rows, use `COUNT()` or `COUNT(*)`
```esql
FROM employees
| STATS count = COUNT(*) BY languages
| SORT languages DESC
```


| count:long | languages:integer |
|------------|-------------------|
| 10         | null              |
| 21         | 5                 |
| 18         | 4                 |
| 17         | 3                 |
| 19         | 2                 |
| 15         | 1                 |

The expression can use inline functions. This example splits a string into multiple values
using the `SPLIT` function and counts the values.
```esql
ROW words="foo;bar;baz;qux;quux;foo"
| STATS word_count = COUNT(SPLIT(words, ";"))
```


| word_count:long |
|-----------------|
| 6               |

To count the number of times an expression returns `TRUE` use a
[`WHERE`](https://www.elastic.co/docs/reference/query-languages/esql/commands/where) command to remove rows that
shouldn’t be included.
```esql
ROW n=1
| WHERE n < 0
| STATS COUNT(n)
```


| COUNT(n):long |
|---------------|
| 0             |

To count the number of times *multiple* expressions return `TRUE` use a WHERE inside the STATS.
```esql
FROM employees
| STATS
    gte20 = COUNT(*) WHERE height >= 2,
    lte18 = COUNT(*) WHERE height <= 1.8
```


| gte20:long | lte18:long |
|------------|------------|
| 20         | 56         |

`COUNT`ing a multivalued field returns the number of values. `COUNT`ing `NULL` returns 0.
`COUNT`ing `true` returns 1. `COUNT`ing `false` returns 1.
```esql
ROW mv = [1, 2], n = NULL, t = TRUE, f = FALSE
| STATS COUNT(mv), COUNT(n), COUNT(t), COUNT(f)
```


| COUNT(mv):long | COUNT(n):long | COUNT(t):long | COUNT(f):long |
|----------------|---------------|---------------|---------------|
| 2              | 0             | 1             | 1             |

You may see a pattern like `COUNT(<expression> OR NULL)`. This has the same meaning as
`COUNT() WHERE <expression>`. This relies on `COUNT(NULL)` to return `0` and builds on the
three-valued logic ([3VL](https://en.wikipedia.org/wiki/Three-valued_logic)): `TRUE OR NULL` is `TRUE`, but
`FALSE OR NULL` is `NULL`. Prefer the `COUNT() WHERE <expression>` pattern.
```esql
ROW n=1
| STATS COUNT(n > 0 OR NULL), COUNT(n < 0 OR NULL)
```


| COUNT(n > 0 OR NULL):long | COUNT(n < 0 OR NULL):long |
|---------------------------|---------------------------|
| 1                         | 0                         |


## `COUNT_DISTINCT`

**Syntax**
![Embedded](https://www.elastic.co/docs/reference/query-languages/esql/images/functions/count_distinct.svg)

**Parameters**
<definitions>
  <definition term="field">
    Column or literal for which to count the number of distinct values.
  </definition>
  <definition term="precision">
    Precision threshold. Refer to [`AGG-COUNT-DISTINCT-APPROXIMATE`](#esql-agg-count-distinct-approximate). The maximum supported value is 40000. Thresholds above this number will have the same effect as a threshold of 40000. The default value is 3000.
  </definition>
</definitions>

**Description**
Returns the approximate number of distinct values.
<note>
  [Counts are approximate](#esql-agg-count-distinct-approximate).
</note>

**Supported types**

| field      | precision     | result |
|------------|---------------|--------|
| boolean    | integer       | long   |
| boolean    | long          | long   |
| boolean    | unsigned_long | long   |
| boolean    |               | long   |
| date       | integer       | long   |
| date       | long          | long   |
| date       | unsigned_long | long   |
| date       |               | long   |
| date_nanos | integer       | long   |
| date_nanos | long          | long   |
| date_nanos | unsigned_long | long   |
| date_nanos |               | long   |
| double     | integer       | long   |
| double     | long          | long   |
| double     | unsigned_long | long   |
| double     |               | long   |
| integer    | integer       | long   |
| integer    | long          | long   |
| integer    | unsigned_long | long   |
| integer    |               | long   |
| ip         | integer       | long   |
| ip         | long          | long   |
| ip         | unsigned_long | long   |
| ip         |               | long   |
| keyword    | integer       | long   |
| keyword    | long          | long   |
| keyword    | unsigned_long | long   |
| keyword    |               | long   |
| long       | integer       | long   |
| long       | long          | long   |
| long       | unsigned_long | long   |
| long       |               | long   |
| text       | integer       | long   |
| text       | long          | long   |
| text       | unsigned_long | long   |
| text       |               | long   |
| version    | integer       | long   |
| version    | long          | long   |
| version    | unsigned_long | long   |
| version    |               | long   |

**Examples**
```esql
FROM hosts
| STATS COUNT_DISTINCT(ip0), COUNT_DISTINCT(ip1)
```


| COUNT_DISTINCT(ip0):long | COUNT_DISTINCT(ip1):long |
|--------------------------|--------------------------|
| 7                        | 8                        |

With the optional second parameter to configure the precision threshold
```esql
FROM hosts
| STATS COUNT_DISTINCT(ip0, 80000), COUNT_DISTINCT(ip1, 5)
```


| COUNT_DISTINCT(ip0, 80000):long | COUNT_DISTINCT(ip1, 5):long |
|---------------------------------|-----------------------------|
| 7                               | 9                           |

The expression can use inline functions. This example splits a string into multiple values using the `SPLIT` function and counts the unique values
```esql
ROW words="foo;bar;baz;qux;quux;foo"
| STATS distinct_word_count = COUNT_DISTINCT(SPLIT(words, ";"))
```


| distinct_word_count:long |
|--------------------------|
| 5                        |


### Counts are approximate

Computing exact counts requires loading values into a set and returning its
size. This doesn’t scale when working on high-cardinality sets and/or large
values as the required memory usage and the need to communicate those
per-shard sets between nodes would utilize too many resources of the cluster.
This `COUNT_DISTINCT` function is based on the
[HyperLogLog++](https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf)
algorithm, which counts based on the hashes of the values with some interesting
properties:
- configurable precision, which decides on how to trade memory for accuracy,
- excellent accuracy on low-cardinality sets,
- fixed memory usage: no matter if there are tens or billions of unique values, memory usage only depends on the configured precision.

For a precision threshold of `c`, the implementation that we are using requires about `c * 8` bytes.
The following chart shows how the error varies before and after the threshold:
![cardinality error](https://www.elastic.co/docs/reference/query-languages/images/cardinality_error.png)
For all 3 thresholds, counts have been accurate up to the configured threshold. Although not guaranteed,
this is likely to be the case. Accuracy in practice depends on the dataset in question. In general,
most datasets show consistently good accuracy. Also note that even with a threshold as low as 100,
the error remains very low (1-6% as seen in the above graph) even when counting millions of items.
The HyperLogLog++ algorithm depends on the leading zeros of hashed values, the exact distributions of
hashes in a dataset can affect the accuracy of the cardinality.
The `COUNT_DISTINCT` function takes an optional second parameter to configure
the precision threshold. The `precision_threshold` options allows to trade memory
for accuracy, and defines a unique count below which counts are expected to be
close to accurate. Above this value, counts might become a bit more fuzzy. The
maximum supported value is `40000`, thresholds above this number will have the
same effect as a threshold of `40000`. The default value is `3000`.

## `MAX`

**Syntax**
![Embedded](https://www.elastic.co/docs/reference/query-languages/esql/images/functions/max.svg)

**Parameters**
<definitions>
  <definition term="field">
  </definition>
</definitions>

**Description**
The maximum value of a field.
**Supported types**

| field                                                                               | result        |
|-------------------------------------------------------------------------------------|---------------|
| aggregate_metric_double                                                             | double        |
| boolean                                                                             | boolean       |
| date                                                                                | date          |
| date_nanos                                                                          | date_nanos    |
| double                                                                              | double        |
| exponential_histogram                                                               | double        |
| integer                                                                             | integer       |
| ip                                                                                  | ip            |
| keyword                                                                             | keyword       |
| long                                                                                | long          |
| tdigest                                                                             | double        |
| text                                                                                | keyword       |
| unsigned_long <applies-to>Elastic Stack: Generally available since 9.2</applies-to> | unsigned_long |
| version                                                                             | version       |

**Examples**
```esql
FROM employees
| STATS MAX(languages)
```


| MAX(languages):integer |
|------------------------|
| 5                      |

The expression can use inline functions. For example, to calculate the maximum over an average of a multivalued column, use `MV_AVG` to first average the multiple values per row, and use the result with the `MAX` function
```esql
FROM employees
| STATS max_avg_salary_change = MAX(MV_AVG(salary_change))
```


| max_avg_salary_change:double |
|------------------------------|
| 13.75                        |


## `MEDIAN`

**Syntax**
![Embedded](https://www.elastic.co/docs/reference/query-languages/esql/images/functions/median.svg)

**Parameters**
<definitions>
  <definition term="number">
    Expression that outputs values to calculate the median of.
  </definition>
</definitions>

**Description**
The value that is greater than half of all values and less than half of all values, also known as the 50% [`PERCENTILE`](#esql-percentile).
<note>
  Like [`PERCENTILE`](#esql-percentile), `MEDIAN` is [usually approximate](#esql-percentile-approximate).
</note>

**Supported types**

| number                | result |
|-----------------------|--------|
| double                | double |
| exponential_histogram | double |
| integer               | double |
| long                  | double |

**Examples**
```esql
FROM employees
| STATS MEDIAN(salary), PERCENTILE(salary, 50)
```


| MEDIAN(salary):double | PERCENTILE(salary, 50):double |
|-----------------------|-------------------------------|
| 47003                 | 47003                         |

The expression can use inline functions. For example, to calculate the median of the maximum values of a multivalued column, first use `MV_MAX` to get the maximum value per row, and use the result with the `MEDIAN` function
```esql
FROM employees
| STATS median_max_salary_change = MEDIAN(MV_MAX(salary_change))
```


| median_max_salary_change:double |
|---------------------------------|
| 7.69                            |

<warning>
  `MEDIAN` is also [non-deterministic](https://en.wikipedia.org/wiki/Nondeterministic_algorithm).
  This means you can get slightly different results using the same data.
</warning>


## `MEDIAN_ABSOLUTE_DEVIATION`

**Syntax**
![Embedded](https://www.elastic.co/docs/reference/query-languages/esql/images/functions/median_absolute_deviation.svg)

**Parameters**
<definitions>
  <definition term="number">
  </definition>
</definitions>

**Description**
Returns the median absolute deviation, a measure of variability. It is a robust statistic, meaning that it is useful for describing data that may have outliers, or may not be normally distributed. For such data it can be more descriptive than standard deviation.  It is calculated as the median of each data point’s deviation from the median of the entire sample. That is, for a random variable `X`, the median absolute deviation is `median(|median(X) - X|)`.
<note>
  Like [`PERCENTILE`](#esql-percentile), `MEDIAN_ABSOLUTE_DEVIATION` is [usually approximate](#esql-percentile-approximate).
</note>

**Supported types**

| number  | result |
|---------|--------|
| double  | double |
| integer | double |
| long    | double |

**Examples**
```esql
FROM employees
| STATS MEDIAN(salary), MEDIAN_ABSOLUTE_DEVIATION(salary)
```


| MEDIAN(salary):double | MEDIAN_ABSOLUTE_DEVIATION(salary):double |
|-----------------------|------------------------------------------|
| 47003                 | 10096.5                                  |

The expression can use inline functions. For example, to calculate the median absolute deviation of the maximum values of a multivalued column, first use `MV_MAX` to get the maximum value per row, and use the result with the `MEDIAN_ABSOLUTE_DEVIATION` function
```esql
FROM employees
| STATS m_a_d_max_salary_change = MEDIAN_ABSOLUTE_DEVIATION(MV_MAX(salary_change))
```


| m_a_d_max_salary_change:double |
|--------------------------------|
| 5.69                           |

<warning>
  `MEDIAN_ABSOLUTE_DEVIATION` is also [non-deterministic](https://en.wikipedia.org/wiki/Nondeterministic_algorithm).
  This means you can get slightly different results using the same data.
</warning>


## `MIN`

**Syntax**
![Embedded](https://www.elastic.co/docs/reference/query-languages/esql/images/functions/min.svg)

**Parameters**
<definitions>
  <definition term="field">
  </definition>
</definitions>

**Description**
The minimum value of a field.
**Supported types**

| field                                                                               | result        |
|-------------------------------------------------------------------------------------|---------------|
| aggregate_metric_double                                                             | double        |
| boolean                                                                             | boolean       |
| date                                                                                | date          |
| date_nanos                                                                          | date_nanos    |
| double                                                                              | double        |
| exponential_histogram                                                               | double        |
| integer                                                                             | integer       |
| ip                                                                                  | ip            |
| keyword                                                                             | keyword       |
| long                                                                                | long          |
| tdigest                                                                             | double        |
| text                                                                                | keyword       |
| unsigned_long <applies-to>Elastic Stack: Generally available since 9.2</applies-to> | unsigned_long |
| version                                                                             | version       |

**Examples**
```esql
FROM employees
| STATS MIN(languages)
```


| MIN(languages):integer |
|------------------------|
| 1                      |

The expression can use inline functions. For example, to calculate the minimum over an average of a multivalued column, use `MV_AVG` to first average the multiple values per row, and use the result with the `MIN` function
```esql
FROM employees
| STATS min_avg_salary_change = MIN(MV_AVG(salary_change))
```


| min_avg_salary_change:double |
|------------------------------|
| -8.46                        |


## `PERCENTILE`

**Syntax**
![Embedded](https://www.elastic.co/docs/reference/query-languages/esql/images/functions/percentile.svg)

**Parameters**
<definitions>
  <definition term="number">
  </definition>
  <definition term="percentile">
  </definition>
</definitions>

**Description**
Returns the value at which a certain percentage of observed values occur. For example, the 95th percentile is the value which is greater than 95% of the observed values and the 50th percentile is the `MEDIAN`.
**Supported types**

| number                | percentile | result |
|-----------------------|------------|--------|
| double                | double     | double |
| double                | integer    | double |
| double                | long       | double |
| exponential_histogram | double     | double |
| exponential_histogram | integer    | double |
| exponential_histogram | long       | double |
| integer               | double     | double |
| integer               | integer    | double |
| integer               | long       | double |
| long                  | double     | double |
| long                  | integer    | double |
| long                  | long       | double |
| tdigest               | double     | double |
| tdigest               | integer    | double |
| tdigest               | long       | double |

**Examples**
```esql
FROM employees
| STATS p0 = PERCENTILE(salary,  0)
     , p50 = PERCENTILE(salary, 50)
     , p99 = PERCENTILE(salary, 99)
```


| p0:double | p50:double | p99:double |
|-----------|------------|------------|
| 25324     | 47003      | 74970.29   |

The expression can use inline functions. For example, to calculate a percentile of the maximum values of a multivalued column, first use `MV_MAX` to get the maximum value per row, and use the result with the `PERCENTILE` function
```esql
FROM employees
| STATS p80_max_salary_change = PERCENTILE(MV_MAX(salary_change), 80)
```


| p80_max_salary_change:double |
|------------------------------|
| 12.132                       |


### `PERCENTILE` is (usually) approximate

There are many different algorithms to calculate percentiles. The naive implementation simply stores all the values in a sorted array. To find the 50th percentile, you simply find the value that is at `my_array[count(my_array) * 0.5]`.
Clearly, the naive implementation does not scale — the sorted array grows linearly with the number of values in your dataset. To calculate percentiles across potentially billions of values in an Elasticsearch cluster, *approximate* percentiles are calculated.
The algorithm used by the `percentile` metric is called TDigest (introduced by Ted Dunning in [Computing Accurate Quantiles using T-Digests](https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf)).
When using this metric, there are a few guidelines to keep in mind:
- Accuracy is proportional to `q(1-q)`. This means that extreme percentiles (e.g. 99%) are more accurate than less extreme percentiles, such as the median
- For small sets of values, percentiles are highly accurate (and potentially 100% accurate if the data is small enough).
- As the quantity of values in a bucket grows, the algorithm begins to approximate the percentiles. It is effectively trading accuracy for memory savings. The exact level of inaccuracy is difficult to generalize, since it depends on your data distribution and volume of data being aggregated

The following chart shows the relative error on a uniform distribution depending on the number of collected values and the requested percentile:
![percentiles error](https://www.elastic.co/docs/reference/query-languages/images/percentiles_error.png)
It shows how precision is better for extreme percentiles. The reason why error diminishes for large number of values is that the law of large numbers makes the distribution of values more and more uniform and the t-digest tree can do a better job at summarizing it. It would not be the case on more skewed distributions.
<warning>
  `PERCENTILE` is also [non-deterministic](https://en.wikipedia.org/wiki/Nondeterministic_algorithm).
  This means you can get slightly different results using the same data.
</warning>


## `PRESENT`

<applies-to>
  - Elastic Stack: Generally available since 9.2
</applies-to>

**Syntax**
![Embedded](https://www.elastic.co/docs/reference/query-languages/esql/images/functions/present.svg)

**Parameters**
<definitions>
  <definition term="field">
    Expression that outputs values to be checked for presence.
  </definition>
</definitions>

**Description**
Returns true if the input expression yields any non-null values within the current aggregation context. Otherwise it returns false.
**Supported types**

| field                   | result  |
|-------------------------|---------|
| aggregate_metric_double | boolean |
| boolean                 | boolean |
| cartesian_point         | boolean |
| cartesian_shape         | boolean |
| date                    | boolean |
| date_nanos              | boolean |
| dense_vector            | boolean |
| double                  | boolean |
| exponential_histogram   | boolean |
| geo_point               | boolean |
| geo_shape               | boolean |
| geohash                 | boolean |
| geohex                  | boolean |
| geotile                 | boolean |
| histogram               | boolean |
| integer                 | boolean |
| ip                      | boolean |
| keyword                 | boolean |
| long                    | boolean |
| tdigest                 | boolean |
| text                    | boolean |
| unsigned_long           | boolean |
| version                 | boolean |

**Examples**
```esql
FROM employees
| STATS is_present = PRESENT(languages)
```


| is_present:boolean |
|--------------------|
| true               |

To check for the presence inside a group use `PRESENT()` and `BY` clauses
```esql
FROM employees
| STATS is_present = PRESENT(salary) BY languages
```


| is_present:boolean | languages:integer |
|--------------------|-------------------|
| true               | 1                 |
| true               | 2                 |
| true               | 3                 |
| true               | 4                 |
| true               | 5                 |
| true               | null              |

To check for the presence and return 1 when it's true and 0 when it's false
```esql
FROM employees
| WHERE emp_no == 10020
| STATS is_present = TO_INTEGER(PRESENT(languages))
```


| is_present:integer |
|--------------------|
| 0                  |


## `SAMPLE`

<applies-to>
  - Elastic Stack: Generally available since 9.1
</applies-to>

**Syntax**
![Embedded](https://www.elastic.co/docs/reference/query-languages/esql/images/functions/sample.svg)

**Parameters**
<definitions>
  <definition term="field">
    The field to collect sample values for.
  </definition>
  <definition term="limit">
    The maximum number of values to collect.
  </definition>
</definitions>

**Description**
Collects sample values for a field.
**Supported types**

| field           | limit   | result          |
|-----------------|---------|-----------------|
| boolean         | integer | boolean         |
| cartesian_point | integer | cartesian_point |
| cartesian_shape | integer | cartesian_shape |
| date            | integer | date            |
| date_nanos      | integer | date_nanos      |
| double          | integer | double          |
| geo_point       | integer | geo_point       |
| geo_shape       | integer | geo_shape       |
| geohash         | integer | geohash         |
| geohex          | integer | geohex          |
| geotile         | integer | geotile         |
| integer         | integer | integer         |
| ip              | integer | ip              |
| keyword         | integer | keyword         |
| long            | integer | long            |
| text            | integer | keyword         |
| unsigned_long   | integer | unsigned_long   |
| version         | integer | version         |

**Example**
```esql
FROM employees
| STATS sample = SAMPLE(gender, 5)
```


| sample:keyword  |
|-----------------|
| [F, M, M, F, M] |


## `ST_CENTROID_AGG`

<applies-to>
  - Elastic Cloud Serverless: Preview
  - Elastic Stack: Preview
</applies-to>

**Syntax**
![Embedded](https://www.elastic.co/docs/reference/query-languages/esql/images/functions/st_centroid_agg.svg)

**Parameters**
<definitions>
  <definition term="field">
  </definition>
</definitions>

**Description**
Calculate the spatial centroid over a field with spatial geometry type. Supports `geo_point` and `cartesian_point`, as well as `geo_shape` and `cartesian_shape` <applies-to>Elastic Stack: Planned</applies-to>.
**Supported types**

| field                                                           | result          |
|-----------------------------------------------------------------|-----------------|
| cartesian_point                                                 | cartesian_point |
| cartesian_shape <applies-to>Elastic Stack: Planned</applies-to> | cartesian_point |
| geo_point                                                       | geo_point       |
| geo_shape <applies-to>Elastic Stack: Planned</applies-to>       | geo_point       |

**Example**
```esql
FROM airports
| STATS centroid=ST_CENTROID_AGG(location)
```


| centroid:geo_point                             |
|------------------------------------------------|
| POINT(-0.030548143003023033 24.37553649504829) |


## `ST_EXTENT_AGG`

<applies-to>
  - Elastic Cloud Serverless: Preview
  - Elastic Stack: Preview
</applies-to>

**Syntax**
![Embedded](https://www.elastic.co/docs/reference/query-languages/esql/images/functions/st_extent_agg.svg)

**Parameters**
<definitions>
  <definition term="field">
  </definition>
</definitions>

**Description**
Calculate the spatial extent over a field with geometry type. Returns a bounding box for all values of the field.
**Supported types**

| field           | result          |
|-----------------|-----------------|
| cartesian_point | cartesian_shape |
| cartesian_shape | cartesian_shape |
| geo_point       | geo_shape       |
| geo_shape       | geo_shape       |

**Example**
```esql
FROM airports
| WHERE country == "India"
| STATS extent = ST_EXTENT_AGG(location)
```


| extent:geo_shape                                                               |
|--------------------------------------------------------------------------------|
| BBOX (70.77995480038226, 91.5882289968431, 33.9830909203738, 8.47650992218405) |


## `STD_DEV`

**Syntax**
![Embedded](https://www.elastic.co/docs/reference/query-languages/esql/images/functions/std_dev.svg)

**Parameters**
<definitions>
  <definition term="number">
  </definition>
</definitions>

**Description**
The population standard deviation of a numeric field.
**Supported types**

| number  | result |
|---------|--------|
| double  | double |
| integer | double |
| long    | double |

**Examples**
```esql
FROM employees
| STATS std_dev_height = STD_DEV(height)
```


| std_dev_height:double |
|-----------------------|
| 0.2063704             |

The expression can use inline functions. For example, to calculate the population standard deviation of each employee’s maximum salary changes, first use `MV_MAX` on each row, and then use `STD_DEV` on the result
```esql
FROM employees
| STATS stddev_salary_change = STD_DEV(MV_MAX(salary_change))
```


| stddev_salary_change:double |
|-----------------------------|
| 6.87583                     |


## `SUM`

**Syntax**
![Embedded](https://www.elastic.co/docs/reference/query-languages/esql/images/functions/sum.svg)

**Parameters**
<definitions>
  <definition term="number">
  </definition>
</definitions>

**Description**
The sum of a numeric expression.
**Supported types**

| number                  | result |
|-------------------------|--------|
| aggregate_metric_double | double |
| double                  | double |
| exponential_histogram   | double |
| integer                 | long   |
| long                    | long   |
| tdigest                 | double |

**Examples**
```esql
FROM employees
| STATS SUM(languages)
```


| SUM(languages):long |
|---------------------|
| 281                 |

The expression can use inline functions. For example, to calculate the sum of each employee’s maximum salary changes, apply the `MV_MAX` function to each row and then sum the results
```esql
FROM employees
| STATS total_salary_changes = SUM(MV_MAX(salary_change))
```


| total_salary_changes:double |
|-----------------------------|
| 446.75                      |


## `TOP`

**Syntax**
![Embedded](https://www.elastic.co/docs/reference/query-languages/esql/images/functions/top.svg)

**Parameters**
<definitions>
  <definition term="field">
    The field to collect the top values for.
  </definition>
  <definition term="limit">
    The maximum number of values to collect.
  </definition>
  <definition term="order">
    The order to calculate the top values. Either `asc` or `desc`, and defaults to `asc` if omitted.
  </definition>
  <definition term="outputField">
    The extra field that, if present, will be the output of the TOP call instead of `field`.<applies-to>Elastic Stack: Generally available since 9.3</applies-to>
  </definition>
</definitions>

**Description**
Collects the top values for a field. Includes repeated values.
**Supported types**

| field   | limit   | order   | outputField | result  |
|---------|---------|---------|-------------|---------|
| boolean | integer | keyword |             | boolean |
| boolean | integer |         |             | boolean |
| date    | integer | keyword | date        | date    |
| date    | integer | keyword | double      | double  |
| date    | integer | keyword | integer     | integer |
| date    | integer | keyword | long        | long    |
| date    | integer | keyword |             | date    |
| date    | integer |         |             | date    |
| double  | integer | keyword | date        | date    |
| double  | integer | keyword | double      | double  |
| double  | integer | keyword | integer     | integer |
| double  | integer | keyword | long        | long    |
| double  | integer | keyword |             | double  |
| double  | integer |         |             | double  |
| integer | integer | keyword | date        | date    |
| integer | integer | keyword | double      | double  |
| integer | integer | keyword | integer     | integer |
| integer | integer | keyword | long        | long    |
| integer | integer | keyword |             | integer |
| integer | integer |         |             | integer |
| ip      | integer | keyword |             | ip      |
| ip      | integer |         |             | ip      |
| keyword | integer | keyword |             | keyword |
| keyword | integer |         |             | keyword |
| long    | integer | keyword | date        | date    |
| long    | integer | keyword | double      | double  |
| long    | integer | keyword | integer     | integer |
| long    | integer | keyword | long        | long    |
| long    | integer | keyword |             | long    |
| long    | integer |         |             | long    |
| text    | integer | keyword |             | keyword |
| text    | integer |         |             | keyword |

**Example**
```esql
FROM employees
| STATS top_salaries = TOP(salary, 3, "desc"), top_salary = MAX(salary)
```


| top_salaries:integer  | top_salary:integer |
|-----------------------|--------------------|
| [74999, 74970, 74572] | 74999              |


## `VALUES`

<applies-to>
  - Elastic Cloud Serverless: Preview
  - Elastic Stack: Preview
</applies-to>

**Syntax**
![Embedded](https://www.elastic.co/docs/reference/query-languages/esql/images/functions/values.svg)

**Parameters**
<definitions>
  <definition term="field">
  </definition>
</definitions>

**Description**
Returns unique values as a multivalued field. The order of the returned values isn’t guaranteed. If you need the values returned in order use [`MV_SORT`](/docs/reference/query-languages/esql/functions-operators/mv-functions#esql-mv_sort).
**Supported types**

| field           | result          |
|-----------------|-----------------|
| boolean         | boolean         |
| cartesian_point | cartesian_point |
| cartesian_shape | cartesian_shape |
| date            | date            |
| date_nanos      | date_nanos      |
| double          | double          |
| geo_point       | geo_point       |
| geo_shape       | geo_shape       |
| geohash         | geohash         |
| geohex          | geohex          |
| geotile         | geotile         |
| integer         | integer         |
| ip              | ip              |
| keyword         | keyword         |
| long            | long            |
| text            | keyword         |
| unsigned_long   | unsigned_long   |
| version         | version         |

**Example**
```esql
FROM employees
| EVAL first_letter = SUBSTRING(first_name, 0, 1)
| STATS first_name = MV_SORT(VALUES(first_name)) BY first_letter
| SORT first_letter
```


| first_name:keyword                                                                                | first_letter:keyword |
|---------------------------------------------------------------------------------------------------|----------------------|
| [Alejandro, Amabile, Anneke, Anoosh, Arumugam]                                                    | A                    |
| [Basil, Berhard, Berni, Bezalel, Bojan, Breannda, Brendon]                                        | B                    |
| [Charlene, Chirstian, Claudi, Cristinel]                                                          | C                    |
| [Danel, Divier, Domenick, Duangkaew]                                                              | D                    |
| [Ebbe, Eberhardt, Erez]                                                                           | E                    |
| Florian                                                                                           | F                    |
| [Gao, Georgi, Georgy, Gino, Guoxiang]                                                             | G                    |
| [Heping, Hidefumi, Hilari, Hironobu, Hironoby, Hisao]                                             | H                    |
| [Jayson, Jungsoon]                                                                                | J                    |
| [Kazuhide, Kazuhito, Kendra, Kenroku, Kshitij, Kwee, Kyoichi]                                     | K                    |
| [Lillian, Lucien]                                                                                 | L                    |
| [Magy, Margareta, Mary, Mayuko, Mayumi, Mingsen, Mokhtar, Mona, Moss]                             | M                    |
| Otmar                                                                                             | O                    |
| [Parto, Parviz, Patricio, Prasadram, Premal]                                                      | P                    |
| [Ramzi, Remzi, Reuven]                                                                            | R                    |
| [Sailaja, Saniya, Sanjiv, Satosi, Shahaf, Shir, Somnath, Sreekrishna, Sudharsan, Sumant, Suzette] | S                    |
| [Tse, Tuval, Tzvetan]                                                                             | T                    |
| [Udi, Uri]                                                                                        | U                    |
| [Valdiodio, Valter, Vishv]                                                                        | V                    |
| Weiyi                                                                                             | W                    |
| Xinglin                                                                                           | X                    |
| [Yinghua, Yishay, Yongqiao]                                                                       | Y                    |
| [Zhongwei, Zvonko]                                                                                | Z                    |
| null                                                                                              | null                 |

<tip>
  Use [`TOP`](#esql-top)
  if you need to keep repeated values.
</tip>

<warning>
  This can use a significant amount of memory and ES|QL doesn’t yet
  grow aggregations beyond memory. So this aggregation will work until
  it is used to collect more values than can fit into memory. Once it
  collects too many values it will fail the query with
  a [Circuit Breaker Error](https://www.elastic.co/docs/troubleshoot/elasticsearch/circuit-breaker-errors).
</warning>


## `VARIANCE`

**Syntax**
![Embedded](https://www.elastic.co/docs/reference/query-languages/esql/images/functions/variance.svg)

**Parameters**
<definitions>
  <definition term="number">
  </definition>
</definitions>

**Description**
The population variance of a numeric field.
**Supported types**

| number  | result |
|---------|--------|
| double  | double |
| integer | double |
| long    | double |

**Example**
```esql
FROM employees
| STATS var_height = VARIANCE(height)
```


| var_height:double |
|-------------------|
| 0.0425888         |


## `WEIGHTED_AVG`

**Syntax**
![Embedded](https://www.elastic.co/docs/reference/query-languages/esql/images/functions/weighted_avg.svg)

**Parameters**
<definitions>
  <definition term="number">
    A numeric value.
  </definition>
  <definition term="weight">
    A numeric weight.
  </definition>
</definitions>

**Description**
The weighted average of a numeric expression.
**Supported types**

| number  | weight  | result |
|---------|---------|--------|
| double  | double  | double |
| double  | integer | double |
| double  | long    | double |
| integer | double  | double |
| integer | integer | double |
| integer | long    | double |
| long    | double  | double |
| long    | integer | double |
| long    | long    | double |

**Example**
```esql
FROM employees
| STATS w_avg = WEIGHTED_AVG(salary, height) BY languages
| EVAL w_avg = ROUND(w_avg)
| KEEP w_avg, languages
| SORT languages
```


| w_avg:double | languages:integer |
|--------------|-------------------|
| 51464.0      | 1                 |
| 48477.0      | 2                 |
| 52379.0      | 3                 |
| 47990.0      | 4                 |
| 42119.0      | 5                 |
| 52142.0      | null              |


## `FIRST`

<applies-to>
  - Elastic Cloud Serverless: Preview
  - Elastic Stack: Preview
</applies-to>

**Syntax**
![Embedded](https://www.elastic.co/docs/reference/query-languages/esql/images/functions/first.svg)

**Parameters**
<definitions>
  <definition term="field">
    The search field
  </definition>
  <definition term="sortField">
    The sort field
  </definition>
</definitions>

**Description**
This function calculates the earliest occurrence of the search field (the first parameter), where sorting order is determined by the sort field (the second parameter). This sorting order is always ascending and null values always sort last. Both fields support null, single-valued, and multi-valued input. If the earliest sort field value appears in multiple documents, this function is allowed to return any corresponding search field value.
**Supported types**

| field      | sortField  | result     |
|------------|------------|------------|
| boolean    | date       | boolean    |
| boolean    | date_nanos | boolean    |
| boolean    | long       | boolean    |
| date       | date       | date       |
| date       | date_nanos | date       |
| date       | long       | date       |
| date_nanos | date       | date_nanos |
| date_nanos | date_nanos | date_nanos |
| date_nanos | long       | date_nanos |
| double     | date       | double     |
| double     | date_nanos | double     |
| double     | long       | double     |
| integer    | date       | integer    |
| integer    | date_nanos | integer    |
| integer    | long       | integer    |
| ip         | date       | ip         |
| ip         | date_nanos | ip         |
| ip         | long       | ip         |
| keyword    | date       | keyword    |
| keyword    | date_nanos | keyword    |
| keyword    | long       | keyword    |
| long       | date       | long       |
| long       | date_nanos | long       |
| long       | long       | long       |
| text       | date       | keyword    |
| text       | date_nanos | keyword    |
| text       | long       | keyword    |

**Example**
```esql
        @timestamp        |  name   | number
"2025-11-25T00:00:00.000Z | alpha   | 1"
"2025-11-25T00:00:01.000Z | alpha   | 2"
"2025-11-25T00:00:02.000Z | bravo   | null"
"2025-11-25T00:00:03.000Z | alpha   | 4"
"2025-11-25T00:00:04.000Z | bravo   | 5"
"2025-11-25T00:00:05.000Z | charlie | [6, 7, 8]"
"2025-11-25T00:00:06.000Z | delta   | null"

From dataset
| STATS first_val = FIRST(number, @timestamp)
```


| first_val:long |
|----------------|
| 1              |

<warning>
  This can use a significant amount of memory and ES|QL doesn’t yet
  grow aggregations beyond the memory available. This function will
  continue to work until it is used to collect more values than can
  fit into memory, in which case it will fail the query with a
  [Circuit Breaker Error](https://www.elastic.co/docs/troubleshoot/elasticsearch/circuit-breaker-errors).
  This is especially the case when grouping on a field with a large
  number of unique values, and even more so if the search field
  has multi-values of high cardinality.
</warning>


## `LAST`

<applies-to>
  - Elastic Cloud Serverless: Preview
  - Elastic Stack: Preview
</applies-to>

**Syntax**
![Embedded](https://www.elastic.co/docs/reference/query-languages/esql/images/functions/last.svg)

**Parameters**
<definitions>
  <definition term="field">
    The search field
  </definition>
  <definition term="sortField">
    The sort field
  </definition>
</definitions>

**Description**
This function calculates the latest occurrence of the search field (the first parameter), where sorting order is determined by the sort field (the second parameter). This sorting order is always ascending and null values always sort last. Both fields support null, single-valued, and multi-valued input. If the latest sort field value appears in multiple documents, this function is allowed to return any corresponding search field value.
**Supported types**

| field      | sortField  | result     |
|------------|------------|------------|
| boolean    | date       | boolean    |
| boolean    | date_nanos | boolean    |
| boolean    | long       | boolean    |
| date       | date       | date       |
| date       | date_nanos | date       |
| date       | long       | date       |
| date_nanos | date       | date_nanos |
| date_nanos | date_nanos | date_nanos |
| date_nanos | long       | date_nanos |
| double     | date       | double     |
| double     | date_nanos | double     |
| double     | long       | double     |
| integer    | date       | integer    |
| integer    | date_nanos | integer    |
| integer    | long       | integer    |
| ip         | date       | ip         |
| ip         | date_nanos | ip         |
| ip         | long       | ip         |
| keyword    | date       | keyword    |
| keyword    | date_nanos | keyword    |
| keyword    | long       | keyword    |
| long       | date       | long       |
| long       | date_nanos | long       |
| long       | long       | long       |
| text       | date       | keyword    |
| text       | date_nanos | keyword    |
| text       | long       | keyword    |

**Example**
```esql
        @timestamp        |  name   | number
"2025-11-25T00:00:00.000Z | alpha   | 1"
"2025-11-25T00:00:01.000Z | alpha   | 2"
"2025-11-25T00:00:02.000Z | bravo   | null"
"2025-11-25T00:00:03.000Z | alpha   | 4"
"2025-11-25T00:00:04.000Z | bravo   | 5"
"2025-11-25T00:00:05.000Z | charlie | [6, 7, 8]"
"2025-11-25T00:00:06.000Z | delta   | null"

From dataset
| STATS last_val = LAST(number, @timestamp) BY name
```


| last_val:long | name:keyword |
|---------------|--------------|
| 4             | alpha        |
| 5             | bravo        |
| [6, 7, 8]     | charlie      |
| null          | delta        |

<warning>
  This can use a significant amount of memory and ES|QL doesn’t yet
  grow aggregations beyond the memory available. This function will
  continue to work until it is used to collect more values than can
  fit into memory, in which case it will fail the query with a
  [Circuit Breaker Error](https://www.elastic.co/docs/troubleshoot/elasticsearch/circuit-breaker-errors).
  This is especially the case when grouping on a field with a large
  number of unique values, and even more so if the search field
  has multi-values of high cardinality.
</warning>