﻿---
title: Token graphs
description: When a tokenizer converts a text into a stream of tokens, it also records the following: The position of each token in the stream, The positionLength,...
url: https://www.elastic.co/docs/manage-data/data-store/text-analysis/token-graphs
products:
  - Elasticsearch
applies_to:
  - Elastic Cloud Serverless: Generally available
  - Elastic Stack: Generally available
---

# Token graphs
When a [tokenizer](/docs/manage-data/data-store/text-analysis/anatomy-of-an-analyzer#analyzer-anatomy-tokenizer) converts a text into a stream of tokens, it also records the following:
- The `position` of each token in the stream
- The `positionLength`, the number of positions that a token spans

Using these, you can create a [directed acyclic graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph), called a *token graph*, for a stream. In a token graph, each position represents a node. Each token represents an edge or arc, pointing to the next position.
![token graph qbf ex](https://www.elastic.co/docs/manage-data/images/elasticsearch-reference-token-graph-qbf-ex.svg)


## Synonyms

Some [token filters](/docs/manage-data/data-store/text-analysis/anatomy-of-an-analyzer#analyzer-anatomy-token-filters) can add new tokens, like synonyms, to an existing token stream. These synonyms often span the same positions as existing tokens.
In the following graph, `quick` and its synonym `fast` both have a position of `0`. They span the same positions.
![token graph qbf synonym ex](https://www.elastic.co/docs/manage-data/images/elasticsearch-reference-token-graph-qbf-synonym-ex.svg)


## Multi-position tokens

Some token filters can add tokens that span multiple positions. These can include tokens for multi-word synonyms, such as using "atm" as a synonym for "automatic teller machine".
However, only some token filters, known as *graph token filters*, accurately record the `positionLength` for multi-position tokens. These filters include:
- [`synonym_graph`](https://www.elastic.co/docs/reference/text-analysis/analysis-synonym-graph-tokenfilter)
- [`word_delimiter_graph`](https://www.elastic.co/docs/reference/text-analysis/analysis-word-delimiter-graph-tokenfilter)

Some tokenizers, such as the [`nori_tokenizer`](https://www.elastic.co/docs/reference/elasticsearch/plugins/analysis-nori-tokenizer), also accurately decompose compound tokens into multi-position tokens.
In the following graph, `domain name system` and its synonym, `dns`, both have a position of `0`. However, `dns` has a `positionLength` of `3`. Other tokens in the graph have a default `positionLength` of `1`.
![token graph dns synonym ex](https://www.elastic.co/docs/manage-data/images/elasticsearch-reference-token-graph-dns-synonym-ex.svg)


### Using token graphs for search

[Indexing](https://www.elastic.co/docs/manage-data/data-store/text-analysis/index-search-analysis) ignores the `positionLength` attribute and does not support token graphs containing multi-position tokens.
However, queries, such as the [`match`](https://www.elastic.co/docs/reference/query-languages/query-dsl/query-dsl-match-query) or [`match_phrase`](https://www.elastic.co/docs/reference/query-languages/query-dsl/query-dsl-match-query-phrase) query, can use these graphs to generate multiple sub-queries from a single query string.
<dropdown title="Example">
  A user runs a search for the following phrase using the `match_phrase` query:`domain name system is fragile`During [search analysis](https://www.elastic.co/docs/manage-data/data-store/text-analysis/index-search-analysis), `dns`, a synonym for `domain name system`, is added to the query string’s token stream. The `dns` token has a `positionLength` of `3`.
  ![token graph dns synonym ex](https://www.elastic.co/docs/manage-data/images/elasticsearch-reference-token-graph-dns-synonym-ex.svg)
  The `match_phrase` query uses this graph to generate sub-queries for the following phrases:
  ```text
  dns is fragile
  domain name system is fragile
  ```
  This means the query matches documents containing either `dns is fragile` *or* `domain name system is fragile`.
</dropdown>


### Invalid token graphs

The following token filters can add tokens that span multiple positions but only record a default `positionLength` of `1`:
- [`synonym`](https://www.elastic.co/docs/reference/text-analysis/analysis-synonym-tokenfilter)
- [`word_delimiter`](https://www.elastic.co/docs/reference/text-analysis/analysis-word-delimiter-tokenfilter)

This means these filters will produce invalid token graphs for streams containing such tokens.
In the following graph, `dns` is a multi-position synonym for `domain name system`. However, `dns` has the default `positionLength` value of `1`, resulting in an invalid graph.
![token graph dns invalid ex](https://www.elastic.co/docs/manage-data/images/elasticsearch-reference-token-graph-dns-invalid-ex.svg)

Avoid using invalid token graphs for search. Invalid graphs can cause unexpected search results.