﻿---
title: Create a custom analyzer
description: When the built-in analyzers do not fulfill your needs, you can create a custom analyzer which uses the appropriate combination of: zero or more character...
url: https://www.elastic.co/docs/manage-data/data-store/text-analysis/create-custom-analyzer
products:
  - Elasticsearch
applies_to:
  - Elastic Cloud Serverless: Generally available
  - Elastic Stack: Generally available
---

# Create a custom analyzer
When the built-in analyzers do not fulfill your needs, you can create a `custom` analyzer which uses the appropriate combination of:
- zero or more [character filters](https://www.elastic.co/docs/reference/text-analysis/character-filter-reference)
- a [tokenizer](https://www.elastic.co/docs/reference/text-analysis/tokenizer-reference)
- zero or more [token filters](https://www.elastic.co/docs/reference/text-analysis/token-filter-reference).


## Configuration

The `custom` analyzer accepts the following parameters:
<definitions>
  <definition term="type">
    Analyzer type. Accepts [built-in analyzer types](https://www.elastic.co/docs/reference/text-analysis/analyzer-reference). For custom analyzers, use `custom` or omit this parameter.
  </definition>
  <definition term="tokenizer">
    A built-in or customised [tokenizer](https://www.elastic.co/docs/reference/text-analysis/tokenizer-reference). (Required)
  </definition>
  <definition term="char_filter">
    An optional array of built-in or customised [character filters](https://www.elastic.co/docs/reference/text-analysis/character-filter-reference).
  </definition>
  <definition term="filter">
    An optional array of built-in or customised [token filters](https://www.elastic.co/docs/reference/text-analysis/token-filter-reference).
  </definition>
  <definition term="position_increment_gap">
    When indexing an array of text values, Elasticsearch inserts a fake "gap" between the last term of one value and the first term of the next value to ensure that a phrase query doesn’t match two terms from different array elements. Defaults to `100`. See [`position_increment_gap`](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/position-increment-gap) for more.
  </definition>
</definitions>


## Example configuration

Here is an example that combines the following:
<definitions>
  <definition term="Character Filter">
    - [HTML Strip Character Filter](https://www.elastic.co/docs/reference/text-analysis/analysis-htmlstrip-charfilter)
  </definition>
  <definition term="Tokenizer">
    - [Standard Tokenizer](https://www.elastic.co/docs/reference/text-analysis/analysis-standard-tokenizer)
  </definition>
  <definition term="Token Filters">
    - [Lowercase Token Filter](https://www.elastic.co/docs/reference/text-analysis/analysis-lowercase-tokenfilter)
    - [ASCII-Folding Token Filter](https://www.elastic.co/docs/reference/text-analysis/analysis-asciifolding-tokenfilter)
  </definition>
</definitions>

```json

{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "type": "custom", <1>
          "tokenizer": "standard",
          "char_filter": [
            "html_strip"
          ],
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      }
    }
  }
}


{
  "analyzer": "my_custom_analyzer",
  "text": "Is this <b>déjà vu</b>?"
}
```

The above example produces the following terms:
```text
[ is, this, deja, vu ]
```

The previous example used tokenizer, token filters, and character filters with their default configurations, but it is possible to create configured versions of each and to use them in a custom analyzer.
Here is a more complicated example that combines the following:
<definitions>
  <definition term="Character Filter">
    - [Mapping Character Filter](https://www.elastic.co/docs/reference/text-analysis/analysis-mapping-charfilter), configured to replace `:)` with `_happy_` and `:(` with `_sad_`
  </definition>
  <definition term="Tokenizer">
    - [Pattern Tokenizer](https://www.elastic.co/docs/reference/text-analysis/analysis-pattern-tokenizer), configured to split on punctuation characters
  </definition>
  <definition term="Token Filters">
    - [Lowercase Token Filter](https://www.elastic.co/docs/reference/text-analysis/analysis-lowercase-tokenfilter)
    - [Stop Token Filter](https://www.elastic.co/docs/reference/text-analysis/analysis-stop-tokenfilter), configured to use the pre-defined list of English stop words
  </definition>
</definitions>

Here is an example:
```json

{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": { <1>
          "char_filter": [
            "emoticons"
          ],
          "tokenizer": "punctuation",
          "filter": [
            "lowercase",
            "english_stop"
          ]
        }
      },
      "tokenizer": {
        "punctuation": { <2>
          "type": "pattern",
          "pattern": "[ .,!?]"
        }
      },
      "char_filter": {
        "emoticons": { <3>
          "type": "mapping",
          "mappings": [
            ":) => _happy_",
            ":( => _sad_"
          ]
        }
      },
      "filter": {
        "english_stop": { <4>
          "type": "stop",
          "stopwords": "_english_"
        }
      }
    }
  }
}


{
  "analyzer": "my_custom_analyzer",
  "text": "I'm a :) person, and you?"
}
```

The above example produces the following terms:
```text
[ i'm, _happy_, person, you ]
```