WARNING: Version 1.6 of Elasticsearch has passed its EOL date.
This documentation is no longer being maintained and may be removed. If you are running this version, we strongly advise you to upgrade. For the latest information, see the current release documentation.
Core Types
editCore Types
editEach JSON field can be mapped to a specific core type. JSON itself
already provides us with some typing, with its support for string
,
integer
/long
, float
/double
, boolean
, and null
.
The following sample tweet JSON document will be used to explain the core types:
{ "tweet" { "user" : "kimchy", "message" : "This is a tweet!", "postDate" : "2009-11-15T14:12:12", "priority" : 4, "rank" : 12.3 } }
Explicit mapping for the above JSON tweet can be:
{ "tweet" : { "properties" : { "user" : {"type" : "string", "index" : "not_analyzed"}, "message" : {"type" : "string", "null_value" : "na"}, "postDate" : {"type" : "date"}, "priority" : {"type" : "integer"}, "rank" : {"type" : "float"} } } }
String
editThe text based string type is the most basic type, and contains one or more characters. An example mapping can be:
{ "tweet" : { "properties" : { "message" : { "type" : "string", "store" : true, "index" : "analyzed", "null_value" : "na" }, "user" : { "type" : "string", "index" : "not_analyzed", "norms" : { "enabled" : false } } } } }
The above mapping defines a string
message
property/field within the
tweet
type. The field is stored in the index (so it can later be
retrieved using selective loading when searching), and it gets analyzed
(broken down into searchable terms). If the message has a null
value,
then the value that will be stored is na
. There is also a string
user
which is indexed as-is (not broken down into tokens) and has norms
disabled (so that matching this field is a binary decision, no match is
better than another one).
The following table lists all the attributes that can be used with the
string
type:
Attribute | Description |
---|---|
|
[1.5.0]
Deprecated in 1.5.0. Use |
|
Set to |
|
Set to |
|
Set to |
|
Possible values are |
|
The boost value. Defaults to |
|
When there is a (JSON) null value for the field, use the
|
|
Boolean value if norms should be enabled or
not. Defaults to |
|
Describes how norms should be loaded, possible values are
|
|
Allows to set the indexing
options, possible values are |
|
The analyzer used to analyze the text contents when
|
|
The analyzer used to analyze the text contents when
|
|
The analyzer used to analyze the field when part of a query string. Can be updated on an existing field. |
|
Should the field be included in the |
|
The analyzer will ignore strings larger than this size.
Useful for generic This option is also useful for protecting against Lucene’s term byte-length
limit of |
|
Position increment gap between field instances with the same field name. Defaults to 0. |
The string
type also support custom indexing parameters associated
with the indexed value. For example:
{ "message" : { "_value": "boosted value", "_boost": 2.0 } }
The mapping is required to disambiguate the meaning of the document.
Otherwise, the structure would interpret "message" as a value of type
"object". The key _value
(or value
) in the inner document specifies
the real string content that should eventually be indexed. The _boost
(or boost
) key specifies the per field document boost (here 2.0).
Norms
editNorms store various normalization factors that are later used (at query time) in order to compute the score of a document relatively to a query.
Although useful for scoring, norms also require quite a lot of memory (typically in the order of one byte per document per field in your index, even for documents that don’t have this specific field). As a consequence, if you don’t need scoring on a specific field, it is highly recommended to disable norms on it. In particular, this is the case for fields that are used solely for filtering or aggregations.
In case you would like to disable norms after the fact, it is possible to do so by using the PUT mapping API, like this:
PUT my_index/_mapping/my_type { "properties": { "title": { "type": "string", "norms": { "enabled": false } } } }
Please however note that norms won’t be removed instantly, but will be removed as old segments are merged into new segments as you continue indexing new documents. Any score computation on a field that has had norms removed might return inconsistent results since some documents won’t have norms anymore while other documents might still have norms.
Number
editA number based type supporting float
, double
, byte
, short
,
integer
, and long
. It uses specific constructs within Lucene in
order to support numeric values. The number types have the same ranges
as corresponding
Java
types. An example mapping can be:
{ "tweet" : { "properties" : { "rank" : { "type" : "float", "null_value" : 1.0 } } } }
The following table lists all the attributes that can be used with a numbered type:
Attribute | Description |
---|---|
|
The type of the number. Can be |
|
[1.5.0]
Deprecated in 1.5.0. Use |
|
Set to |
|
Set to |
|
Set to |
|
The precision step (influences the number of terms
generated for each number value). Defaults to |
|
The boost value. Defaults to |
|
When there is a (JSON) null value for the field, use the
|
|
Should the field be included in the |
|
Ignored a malformed number. Defaults to |
|
Try convert strings to numbers and truncate fractions for integers. Defaults to |
Token Count
editThe token_count
type maps to the JSON string type but indexes and stores
the number of tokens in the string rather than the string itself. For
example:
{ "tweet" : { "properties" : { "name" : { "type" : "string", "fields" : { "word_count": { "type" : "token_count", "store" : "yes", "analyzer" : "standard" } } } } } }
All the configuration that can be specified for a number can be specified
for a token_count. The only extra configuration is the required
analyzer
field which specifies which analyzer to use to break the string
into tokens. For best performance, use an analyzer with no token filters.
Technically the token_count
type sums position increments rather than
counting tokens. This means that even if the analyzer filters out stop
words they are included in the count.
Date
editThe date type is a special type which maps to JSON string type. It
follows a specific format that can be explicitly set. All dates are
UTC
. Internally, a date maps to a number type long
, with the added
parsing stage from string to long and from long to string. An example
mapping:
{ "tweet" : { "properties" : { "postDate" : { "type" : "date", "format" : "YYYY-MM-dd" } } } }
The date type will also accept a long number representing UTC milliseconds since the epoch, regardless of the format it can handle.
The following table lists all the attributes that can be used with a date type:
Attribute | Description |
---|---|
|
[1.5.0]
Deprecated in 1.5.0. Use |
|
The date
format. Defaults to |
|
Set to |
|
Set to |
|
Set to |
|
The precision step (influences the number of terms
generated for each number value). Defaults to |
|
The boost value. Defaults to |
|
When there is a (JSON) null value for the field, use the
|
|
Should the field be included in the |
|
Ignored a malformed number. Defaults to |
|
The unit to use when passed in a numeric values. Possible
values include |
Boolean
editThe boolean type Maps to the JSON boolean type. It ends up storing
within the index either T
or F
, with automatic translation to true
and false
respectively.
{ "tweet" : { "properties" : { "hes_my_special_tweet" : { "type" : "boolean" } } } }
The boolean type also supports passing the value as a number or a string
(in this case 0
, an empty string, false
, off
and no
are
false
, all other values are true
).
The following table lists all the attributes that can be used with the boolean type:
Attribute | Description |
---|---|
|
[1.5.0]
Deprecated in 1.5.0. Use |
|
Set to |
|
Set to |
|
The boost value. Defaults to |
|
When there is a (JSON) null value for the field, use the
|
Binary
editThe binary type is a base64 representation of binary data that can be stored in the index. The field is not stored by default and not indexed at all.
{ "tweet" : { "properties" : { "image" : { "type" : "binary" } } } }
The following table lists all the attributes that can be used with the binary type:
|
[1.5.0]
Deprecated in 1.5.0. Use |
|
Set to |
|
Set to |
|
Set to |
|
Compression will only be applied to stored binary fields that are greater
than this size. Defaults to |
Enabling compression on stored binary fields only makes sense on large and highly-compressible values. Otherwise per-field compression is usually not worth doing as the space savings do not compensate for the overhead of the compression format. Normally, you should not configure any compression and just rely on the block compression of stored fields (which is enabled by default and can’t be disabled).
Fielddata filters
editIt is possible to control which field values are loaded into memory, which is particularly useful for aggregating on string fields, using fielddata filters, which are explained in detail in the Fielddata section.
Fielddata filters can exclude terms which do not match a regex, or which
don’t fall between a min
and max
frequency range:
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter: { regex: { "pattern": "^#.*" }, frequency: { min: 0.001, max: 0.1, min_segment_size: 500 } } } } }
These filters can be updated on an existing field mapping and will take effect the next time the fielddata for a segment is loaded. Use the Clear Cache API to reload the fielddata using the new filters.
Similarity
editElasticsearch allows you to configure a similarity (scoring algorithm) per field.
The similarity
setting provides a simple way of choosing a similarity algorithm
other than the default TF/IDF, such as BM25
.
You can configure similarities via the similarity module
Configuring Similarity per Field
editDefining the Similarity for a field is done via the similarity
mapping
property, as this example shows:
{ "book":{ "properties":{ "title":{ "type":"string", "similarity":"BM25" } } } }
The following Similarities are configured out-of-box:
-
default
- The Default TF/IDF algorithm used by Elasticsearch and Lucene in previous versions.
-
BM25
- The BM25 algorithm. See Okapi_BM25 for more details.
Copy to field
editAdding copy_to
parameter to any field mapping will cause all values of this field to be copied to fields specified in
the parameter. In the following example all values from fields title
and abstract
will be copied to the field
meta_data
.
{ "book" : { "properties" : { "title" : { "type" : "string", "copy_to" : "meta_data" }, "abstract" : { "type" : "string", "copy_to" : "meta_data" }, "meta_data" : { "type" : "string" } } }
Multiple fields are also supported:
{ "book" : { "properties" : { "title" : { "type" : "string", "copy_to" : ["meta_data", "article_info"] } } }
Multi fields
editThe fields
options allows to map several core types fields into a single
json source field. This can be useful if a single field need to be
used in different ways. For example a single field is to be used for both
free text search and sorting.
{ "tweet" : { "properties" : { "name" : { "type" : "string", "index" : "analyzed", "fields" : { "raw" : {"type" : "string", "index" : "not_analyzed"} } } } } }
In the above example the field name
gets processed twice. The first time it gets
processed as an analyzed string and this version is accessible under the field name
name
, this is the main field and is in fact just like any other field. The second time
it gets processed as a not analyzed string and is accessible under the name name.raw
.
Include in All
editThe include_in_all
setting is ignored on any field that is defined in
the fields
options. Setting the include_in_all
only makes sense on
the main field, since the raw field value is copied to the _all
field,
the tokens aren’t copied.
Updating a field
editIn the essence a field can’t be updated. However multi fields can be
added to existing fields. This allows for example to have a different
index_analyzer
configuration in addition to the already configured
index_analyzer
configuration specified in the main and other multi fields.
Also the new multi field will only be applied on document that have been added after the multi field has been added and in fact the new multi field doesn’t exist in existing documents.
Another important note is that new multi fields will be merged into the list of existing multi fields, so when adding new multi fields for a field previous added multi fields don’t need to be specified.
Accessing Fields
editThe multi fields defined in the fields
are prefixed with the
name of the main field and can be accessed by their full path using the
navigation notation: name.raw
, or using the typed navigation notation
tweet.name.raw
.
Deprecated in 1.0.0.
The path
option below is deprecated. Use copy_to
instead for setting up custom _all fields
In older releases, the path
option allows to control how fields are accessed.
If the path
option is set to full
, then the full path of the main field
is prefixed, but if the path
option is set to just_name
the actual
multi field name without any prefix is used. The default value for
the path
option is full
.
The just_name
setting, among other things, allows indexing content of multiple
fields under the same name (commonly used to set up custom _all fields in the past).
In the example below the content of both fields first_name
and last_name
can be accessed by using any_name
or tweet.any_name
.
{ "tweet" : { "properties": { "first_name": { "type": "string", "index": "analyzed", "path": "just_name", "fields": { "any_name": {"type": "string","index": "analyzed"} } }, "last_name": { "type": "string", "index": "analyzed", "path": "just_name", "fields": { "any_name": {"type": "string","index": "analyzed"} } } } } }