Loading

ES|QL CHUNK function

Embedded
field
The input to chunk. The input can be a single-valued or multi-valued field. In the case of a multi-valued argument, each value is chunked separately.
chunking_settings

Options to customize chunking behavior. Defaults to {"strategy":"sentence","max_chunk_size":300,"sentence_overlap":0}.

Use CHUNK to split a text field into smaller chunks.

Chunk can be used on fields from the text famiy like text and semantic_text. Chunk will split a text field into smaller chunks, using a sentence-based chunking strategy. The number of chunks returned, and the length of the sentences used to create the chunks can be specified.

field chunking_settings result
keyword named parameters keyword
keyword keyword
text named parameters keyword
text keyword
separator_group
(keyword) Sets a predefined lists of separators based on the selected text type. Values may be markdown or plaintext. Only applicable to the recursive chunking strategy. When using the recursive chunking strategy one of separators or separator_group must be specified.
overlap
(integer) The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.
sentence_overlap
(integer) The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.
strategy
(keyword) The chunking strategy to use. Default value is sentence.
max_chunk_size
(integer) The maximum size of a chunk in words. This value cannot be lower than 20 (for sentence strategy) or 10 (for word or recursive strategies). This model should not exceed the window size for any associated models using the output of this function.
separators

(keyword) A list of strings used as possible split points when chunking text. Each string can be a plain string or a regular expression (regex) pattern. The system tries each separator in order to split the text, starting from the first item in the list. After splitting, it attempts to recombine smaller pieces into larger chunks that stay within the max_chunk_size limit, to reduce the total number of chunks generated. Only applicable to the recursive chunking strategy. When using the recursive chunking strategy one of separators or separator_group must be specified.

ROW result = CHUNK("It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief.", {"strategy": "word", "max_chunk_size": 10, "overlap": 1})
| MV_EXPAND result
		
result:keyword
It was the best of times, it was the worst
worst of times, it was the age of wisdom, it
, it was the age of foolishness, it was the epoch
epoch of belief.