ES|QL `CHUNK` function

Parameters

field: The input to chunk. The input can be a single-valued or multi-valued field. In the case of a multi-valued argument, each value is chunked separately.
chunking_settings: Options to customize chunking behavior. Defaults to {"strategy":"sentence","max_chunk_size":300,"sentence_overlap":0}.

Description

Use CHUNK to split a text field into smaller chunks.

Chunk can be used on fields from the text family like text and semantic_text. Chunk will split a text field into smaller chunks. By default it uses a sentence-based chunking strategy; the strategy and its parameters are configurable via the chunking_settings parameter. The number of chunks returned, and the length of the sentences used to create the chunks can be specified.

Supported types

field	chunking_settings	result
keyword	named parameters	keyword
keyword		keyword
text	named parameters	keyword
text		keyword

Supported function named parameters

strategy: (keyword) The chunking strategy to use. Default value is sentence. Available strategies:

sentence: splits at sentence boundaries. Use sentence_overlap to share a sentence between adjacent chunks.
word: splits on individual words. Use overlap to share words between adjacent chunks.
recursive: splits using configurable separator patterns — either a predefined separator_group (plaintext or markdown) or a custom list of separators — falling back to sentence-level splitting when no separator produces a chunk within max_chunk_size.
none: returns the entire input as a single chunk.

For a full description of each strategy and how its options interact, refer to chunking strategies.

max_chunk_size: (integer) The maximum size of a chunk in words. This value cannot be lower than 20 (for sentence strategy) or 10 (for word or recursive strategies). This model should not exceed the window size for any associated models using the output of this function.
overlap: (integer) The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.
sentence_overlap: (integer) The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0. Defaults to 0.
separator_group: (keyword) Sets a predefined lists of separators based on the selected text type. Values may be markdown or plaintext. Only applicable to the recursive chunking strategy. When using the recursive chunking strategy one of separators or separator_group must be specified.
separators: (keyword) A list of strings used as possible split points when chunking text. Each string can be a plain string or a regular expression (regex) pattern. The system tries each separator in order to split the text, starting from the first item in the list. After splitting, it attempts to recombine smaller pieces into larger chunks that stay within the max_chunk_size limit, to reduce the total number of chunks generated. Only applicable to the recursive chunking strategy. When using the recursive chunking strategy one of separators or separator_group must be specified.

Example

		ROW result = CHUNK("It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief.", {"strategy": "word", "max_chunk_size": 10, "overlap": 1})
| MV_EXPAND result

result:keyword
It was the best of times, it was the worst
worst of times, it was the age of wisdom, it
, it was the age of foolishness, it was the epoch
epoch of belief.

ES|QL CHUNK function