Core Typesedit

Each JSON field can be mapped to a specific core type. JSON itself already provides us with some typing, with its support for string, integer/long, float/double, boolean, and null.

The following sample tweet JSON document will be used to explain the core types:

{
    "tweet" {
        "user" : "kimchy",
        "message" : "This is a tweet!",
        "postDate" : "2009-11-15T14:12:12",
        "priority" : 4,
        "rank" : 12.3
    }
}

Explicit mapping for the above JSON tweet can be:

{
    "tweet" : {
        "properties" : {
            "user" : {"type" : "string", "index" : "not_analyzed"},
            "message" : {"type" : "string", "null_value" : "na"},
            "postDate" : {"type" : "date"},
            "priority" : {"type" : "integer"},
            "rank" : {"type" : "float"}
        }
    }
}

Stringedit

The text based string type is the most basic type, and contains one or more characters. An example mapping can be:

{
    "tweet" : {
        "properties" : {
            "message" : {
                "type" : "string",
                "store" : true,
                "index" : "analyzed",
                "null_value" : "na"
            },
            "user" : {
                "type" : "string",
                "index" : "not_analyzed",
                "norms" : {
                    "enabled" : false
                }
            }
        }
    }
}

The above mapping defines a string message property/field within the tweet type. The field is stored in the index (so it can later be retrieved using selective loading when searching), and it gets analyzed (broken down into searchable terms). If the message has a null value, then the value that will be stored is na. There is also a string user which is indexed as-is (not broken down into tokens) and has norms disabled (so that matching this field is a binary decision, no match is better than another one).

The following table lists all the attributes that can be used with the string type:

Attribute Description

index_name

[1.5.0] Deprecated in 1.5.0. Use copy_to instead The name of the field that will be stored in the index. Defaults to the property/field name.

store

Set to true to actually store the field in the index, false to not store it. Since by default Elasticsearch stores all fields of the source document in the special _source field, this option is primarily useful when the _source field has been disabled in the type definition. Defaults to false.

index

Set to analyzed for the field to be indexed and searchable after being broken down into token using an analyzer. not_analyzed means that its still searchable, but does not go through any analysis process or broken down into tokens. no means that it won’t be searchable at all (as an individual field; it may still be included in _all). Setting to no disables include_in_all. Defaults to analyzed.

doc_values

Set to true to store field values in a column-stride fashion. Automatically set to true when the fielddata format is doc_values.

term_vector

Possible values are no, yes, with_offsets, with_positions, with_positions_offsets. Defaults to no.

boost

The boost value. Defaults to 1.0.

null_value

When there is a (JSON) null value for the field, use the null_value as the field value. Defaults to not adding the field at all.

norms: {enabled: <value>}

Boolean value if norms should be enabled or not. Defaults to true for analyzed fields, and to false for not_analyzed fields. See the section about norms.

norms: {loading: <value>}

Describes how norms should be loaded, possible values are eager and lazy (default). It is possible to change the default value to eager for all fields by configuring the index setting index.norms.loading to eager.

index_options

Allows to set the indexing options, possible values are docs (only doc numbers are indexed), freqs (doc numbers and term frequencies), and positions (doc numbers, term frequencies and positions). Defaults to positions for analyzed fields, and to docs for not_analyzed fields. It is also possible to set it to offsets (doc numbers, term frequencies, positions and offsets).

analyzer

The analyzer used to analyze the text contents when analyzed during indexing and when searching using a query string. Defaults to the globally configured analyzer.

index_analyzer

The analyzer used to analyze the text contents when analyzed during indexing.

search_analyzer

The analyzer used to analyze the field when part of a query string. Can be updated on an existing field.

include_in_all

Should the field be included in the _all field (if enabled). If index is set to no this defaults to false, otherwise, defaults to true or to the parent object type setting.

ignore_above

The analyzer will ignore strings larger than this size. Useful for generic not_analyzed fields that should ignore long text.

position_offset_gap

Position increment gap between field instances with the same field name. Defaults to 0.

The string type also support custom indexing parameters associated with the indexed value. For example:

{
    "message" : {
        "_value":  "boosted value",
        "_boost":  2.0
    }
}

The mapping is required to disambiguate the meaning of the document. Otherwise, the structure would interpret "message" as a value of type "object". The key _value (or value) in the inner document specifies the real string content that should eventually be indexed. The _boost (or boost) key specifies the per field document boost (here 2.0).

Normsedit

Norms store various normalization factors that are later used (at query time) in order to compute the score of a document relatively to a query.

Although useful for scoring, norms also require quite a lot of memory (typically in the order of one byte per document per field in your index, even for documents that don’t have this specific field). As a consequence, if you don’t need scoring on a specific field, it is highly recommended to disable norms on it. In particular, this is the case for fields that are used solely for filtering or aggregations.

In case you would like to disable norms after the fact, it is possible to do so by using the PUT mapping API, like this:

PUT my_index/_mapping/my_type
{
  "properties": {
    "title": {
      "type": "string",
      "norms": {
        "enabled": false
      }
    }
  }
}

Please however note that norms won’t be removed instantly, but will be removed as old segments are merged into new segments as you continue indexing new documents. Any score computation on a field that has had norms removed might return inconsistent results since some documents won’t have norms anymore while other documents might still have norms.

Numberedit

A number based type supporting float, double, byte, short, integer, and long. It uses specific constructs within Lucene in order to support numeric values. The number types have the same ranges as corresponding Java types. An example mapping can be:

{
    "tweet" : {
        "properties" : {
            "rank" : {
                "type" : "float",
                "null_value" : 1.0
            }
        }
    }
}

The following table lists all the attributes that can be used with a numbered type:

Attribute Description

type

The type of the number. Can be float, double, integer, long, short, byte. Required.

index_name

[1.5.0] Deprecated in 1.5.0. Use copy_to instead The name of the field that will be stored in the index. Defaults to the property/field name.

store

Set to true to store actual field in the index, false to not store it. Defaults to false (note, the JSON document itself is stored, and it can be retrieved from it).

index

Set to no if the value should not be indexed. Setting to no disables include_in_all. If set to no the field should be either stored in _source, have include_in_all enabled, or store be set to true for this to be useful.

doc_values

Set to true to store field values in a column-stride fashion. Automatically set to true when the fielddata format is doc_values.

precision_step

The precision step (influences the number of terms generated for each number value). Defaults to 16 for long, double, 8 for short, integer, float, and 2147483647 for byte.

boost

The boost value. Defaults to 1.0.

null_value

When there is a (JSON) null value for the field, use the null_value as the field value. Defaults to not adding the field at all.

include_in_all

Should the field be included in the _all field (if enabled). If index is set to no this defaults to false, otherwise, defaults to true or to the parent object type setting.

ignore_malformed

Ignored a malformed number. Defaults to false.

coerce

Try convert strings to numbers and truncate fractions for integers. Defaults to true.

Token Countedit

The token_count type maps to the JSON string type but indexes and stores the number of tokens in the string rather than the string itself. For example:

{
    "tweet" : {
        "properties" : {
            "name" : {
                "type" : "string",
                "fields" : {
                    "word_count": {
                        "type" : "token_count",
                        "store" : "yes",
                        "analyzer" : "standard"
                    }
                }
            }
        }
    }
}

All the configuration that can be specified for a number can be specified for a token_count. The only extra configuration is the required analyzer field which specifies which analyzer to use to break the string into tokens. For best performance, use an analyzer with no token filters.

Technically the token_count type sums position increments rather than counting tokens. This means that even if the analyzer filters out stop words they are included in the count.

Dateedit

The date type is a special type which maps to JSON string type. It follows a specific format that can be explicitly set. All dates are UTC. Internally, a date maps to a number type long, with the added parsing stage from string to long and from long to string. An example mapping:

{
    "tweet" : {
        "properties" : {
            "postDate" : {
                "type" : "date",
                "format" : "YYYY-MM-dd"
            }
        }
    }
}

The date type will also accept a long number representing UTC milliseconds since the epoch, regardless of the format it can handle.

The following table lists all the attributes that can be used with a date type:

Attribute Description

index_name

[1.5.0] Deprecated in 1.5.0. Use copy_to instead The name of the field that will be stored in the index. Defaults to the property/field name.

format

The date format. Defaults to dateOptionalTime.

store

Set to true to store actual field in the index, false to not store it. Defaults to false (note, the JSON document itself is stored, and it can be retrieved from it).

index

Set to no if the value should not be indexed. Setting to no disables include_in_all. If set to no the field should be either stored in _source, have include_in_all enabled, or store be set to true for this to be useful.

doc_values

Set to true to store field values in a column-stride fashion. Automatically set to true when the fielddata format is doc_values.

precision_step

The precision step (influences the number of terms generated for each number value). Defaults to 16.

boost

The boost value. Defaults to 1.0.

null_value

When there is a (JSON) null value for the field, use the null_value as the field value. Defaults to not adding the field at all.

include_in_all

Should the field be included in the _all field (if enabled). If index is set to no this defaults to false, otherwise, defaults to true or to the parent object type setting.

ignore_malformed

Ignored a malformed number. Defaults to false.

Booleanedit

The boolean type Maps to the JSON boolean type. It ends up storing within the index either T or F, with automatic translation to true and false respectively.

{
    "tweet" : {
        "properties" : {
            "hes_my_special_tweet" : {
                "type" : "boolean"
            }
        }
    }
}

The boolean type also supports passing the value as a number or a string (in this case 0, an empty string, false, off and no are false, all other values are true).

The following table lists all the attributes that can be used with the boolean type:

Attribute Description

index_name

[1.5.0] Deprecated in 1.5.0. Use copy_to instead The name of the field that will be stored in the index. Defaults to the property/field name.

store

Set to true to store actual field in the index, false to not store it. Defaults to false (note, the JSON document itself is stored, and it can be retrieved from it).

index

Set to no if the value should not be indexed. Setting to no disables include_in_all. If set to no the field should be either stored in _source, have include_in_all enabled, or store be set to true for this to be useful.

boost

The boost value. Defaults to 1.0.

null_value

When there is a (JSON) null value for the field, use the null_value as the field value. Defaults to not adding the field at all.

Binaryedit

The binary type is a base64 representation of binary data that can be stored in the index. The field is not stored by default and not indexed at all.

{
    "tweet" : {
        "properties" : {
            "image" : {
                "type" : "binary"
            }
        }
    }
}

The following table lists all the attributes that can be used with the binary type:

index_name

[1.5.0] Deprecated in 1.5.0. Use copy_to instead The name of the field that will be stored in the index. Defaults to the property/field name.

store

Set to true to store actual field in the index, false to not store it. Defaults to false (note, the JSON document itself is already stored, so the binary field can be retrieved from there).

doc_values

Set to true to store field values in a column-stride fashion.

compress

Set to true to compress the stored binary value.

compress_threshold

Compression will only be applied to stored binary fields that are greater than this size. Defaults to -1

Enabling compression on stored binary fields only makes sense on large and highly-compressible values. Otherwise per-field compression is usually not worth doing as the space savings do not compensate for the overhead of the compression format. Normally, you should not configure any compression and just rely on the block compression of stored fields (which is enabled by default and can’t be disabled).

Fielddata filtersedit

It is possible to control which field values are loaded into memory, which is particularly useful for faceting on string fields, using fielddata filters, which are explained in detail in the Fielddata section.

Fielddata filters can exclude terms which do not match a regex, or which don’t fall between a min and max frequency range:

{
    tweet: {
        type:      "string",
        analyzer:  "whitespace"
        fielddata: {
            filter: {
                regex: {
                    "pattern":        "^#.*"
                },
                frequency: {
                    min:              0.001,
                    max:              0.1,
                    min_segment_size: 500
                }
            }
        }
    }
}

These filters can be updated on an existing field mapping and will take effect the next time the fielddata for a segment is loaded. Use the Clear Cache API to reload the fielddata using the new filters.

Similarityedit

Elasticsearch allows you to configure a similarity (scoring algorithm) per field. The similarity setting provides a simple way of choosing a similarity algorithm other than the default TF/IDF, such as BM25.

You can configure similarities via the similarity module

Configuring Similarity per Fieldedit

Defining the Similarity for a field is done via the similarity mapping property, as this example shows:

{
   "book":{
      "properties":{
         "title":{
            "type":"string", "similarity":"BM25"
         }
      }
   }
}

The following Similarities are configured out-of-box:

default
The Default TF/IDF algorithm used by Elasticsearch and Lucene in previous versions.
BM25
The BM25 algorithm. See Okapi_BM25 for more details.
Copy to fieldedit

Adding copy_to parameter to any field mapping will cause all values of this field to be copied to fields specified in the parameter. In the following example all values from fields title and abstract will be copied to the field meta_data.

{
  "book" : {
    "properties" : {
      "title" : { "type" : "string", "copy_to" : "meta_data" },
      "abstract" : { "type" : "string", "copy_to" : "meta_data" },
      "meta_data" : { "type" : "string" }
    }
}

Multiple fields are also supported:

{
  "book" : {
    "properties" : {
      "title" : { "type" : "string", "copy_to" : ["meta_data", "article_info"] }
    }
}
Multi fieldsedit

The fields options allows to map several core types fields into a single json source field. This can be useful if a single field need to be used in different ways. For example a single field is to be used for both free text search and sorting.

{
  "tweet" : {
    "properties" : {
      "name" : {
        "type" : "string",
        "index" : "analyzed",
        "fields" : {
          "raw" : {"type" : "string", "index" : "not_analyzed"}
        }
      }
    }
  }
}

In the above example the field name gets processed twice. The first time it gets processed as an analyzed string and this version is accessible under the field name name, this is the main field and is in fact just like any other field. The second time it gets processed as a not analyzed string and is accessible under the name name.raw.

Include in Alledit

The include_in_all setting is ignored on any field that is defined in the fields options. Setting the include_in_all only makes sense on the main field, since the raw field value is copied to the _all field, the tokens aren’t copied.

Updating a fieldedit

In the essence a field can’t be updated. However multi fields can be added to existing fields. This allows for example to have a different index_analyzer configuration in addition to the already configured index_analyzer configuration specified in the main and other multi fields.

Also the new multi field will only be applied on document that have been added after the multi field has been added and in fact the new multi field doesn’t exist in existing documents.

Another important note is that new multi fields will be merged into the list of existing multi fields, so when adding new multi fields for a field previous added multi fields don’t need to be specified.

Accessing Fieldsedit

The multi fields defined in the fields are prefixed with the name of the main field and can be accessed by their full path using the navigation notation: name.raw, or using the typed navigation notation tweet.name.raw.

Deprecated in 1.0.0.

The path option below is deprecated. Use copy_to instead for setting up custom _all fields

In older releases, the path option allows to control how fields are accessed. If the path option is set to full, then the full path of the main field is prefixed, but if the path option is set to just_name the actual multi field name without any prefix is used. The default value for the path option is full.

The just_name setting, among other things, allows indexing content of multiple fields under the same name (commonly used to set up custom _all fields in the past). In the example below the content of both fields first_name and last_name can be accessed by using any_name or tweet.any_name.

{
  "tweet" : {
    "properties": {
      "first_name": {
        "type": "string",
        "index": "analyzed",
        "path": "just_name",
        "fields": {
          "any_name": {"type": "string","index": "analyzed"}
        }
      },
      "last_name": {
        "type": "string",
        "index": "analyzed",
        "path": "just_name",
        "fields": {
          "any_name": {"type": "string","index": "analyzed"}
        }
      }
    }
  }
}