ES|QL CHUNK function
field- The input to chunk. The input can be a single-valued or multi-valued field. In the case of a multi-valued argument, each value is chunked separately.
chunking_settings-
Options to customize chunking behavior. Defaults to {"strategy":"sentence","max_chunk_size":300,"sentence_overlap":0}.
Use CHUNK to split a text field into smaller chunks.
Chunk can be used on fields from the text famiy like text and semantic_text. Chunk will split a text field into smaller chunks, using a sentence-based chunking strategy. The number of chunks returned, and the length of the sentences used to create the chunks can be specified.
| field | chunking_settings | result |
|---|---|---|
| keyword | named parameters | keyword |
| keyword | keyword | |
| text | named parameters | keyword |
| text | keyword |
separator_group- (keyword) Sets a predefined lists of separators based on the selected text type. Values may be
markdownorplaintext. Only applicable to therecursivechunking strategy. When using therecursivechunking strategy one ofseparatorsorseparator_groupmust be specified. overlap- (integer) The number of overlapping words for chunks. It is applicable only to a
wordchunking strategy. This value cannot be higher than half themax_chunk_sizevalue. sentence_overlap- (integer) The number of overlapping sentences for chunks. It is applicable only for a
sentencechunking strategy. It can be either1or0. strategy- (keyword) The chunking strategy to use. Default value is
sentence. max_chunk_size- (integer) The maximum size of a chunk in words. This value cannot be lower than
20(forsentencestrategy) or10(forwordorrecursivestrategies). This model should not exceed the window size for any associated models using the output of this function. separators-
(keyword) A list of strings used as possible split points when chunking text. Each string can be a plain string or a regular expression (regex) pattern. The system tries each separator in order to split the text, starting from the first item in the list. After splitting, it attempts to recombine smaller pieces into larger chunks that stay within the
max_chunk_sizelimit, to reduce the total number of chunks generated. Only applicable to therecursivechunking strategy. When using therecursivechunking strategy one ofseparatorsorseparator_groupmust be specified.
ROW result = CHUNK("It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief.", {"strategy": "word", "max_chunk_size": 10, "overlap": 1})
| MV_EXPAND result
| result:keyword |
|---|
| It was the best of times, it was the worst |
| worst of times, it was the age of wisdom, it |
| , it was the age of foolishness, it was the epoch |
| epoch of belief. |