Data Types

This functionality is in beta and is subject to change. The design and code is less mature than official GA features and is being provided as-is with no warranties. Beta features are not subject to the support SLA of official GA features.

Most of Elasticsearch data types are available in Elasticsearch SQL, as indicated below:

Elasticsearch type

SQL type

SQL precision

Core types

null

null

0

boolean

boolean

1

byte

tinyint

3

short

smallint

5

integer

integer

10

long

bigint

19

double

double

15

float

real

7

half_float

float

16

scaled_float

float

19

keyword

varchar

based on ignore_above

text

varchar

2,147,483,647

binary

varbinary

2,147,483,647

date

timestamp

29

ip

varchar

39

Complex types

object

struct

0

nested

struct

0

Unsupported types

types not mentioned above

unsupported

0

Obviously, not all types in Elasticsearch have an equivalent in SQL and vice-versa hence why, Elasticsearch SQL uses the data type particularities of the former over the latter as ultimately Elasticsearch is the backing store.

In addition to the types above, Elasticsearch SQL also supports at runtime SQL-specific types that do not have an equivalent in Elasticsearch. Such types cannot be loaded from Elasticsearch (as it does not know about them) however can be used inside Elasticsearch SQL in queries or their results.

The table below indicates these types:

SQL type

SQL precision

interval_year

7

interval_month

7

interval_day

23

interval_hour

23

interval_minute

23

interval_second

23

interval_year_to_month

7

interval_day_to_hour

23

interval_day_to_minute

23

interval_day_to_second

23

interval_hour_to_minute

23

interval_hour_to_second

23

interval_minute_to_second

23

SQL and multi-fields

A core concept in Elasticsearch is that of an analyzed field, that is a full-text value that is interpreted in order to be effectively indexed. These fields are of type text and are not used for sorting or aggregations as their actual value depends on the analyzer used hence why Elasticsearch also offers the keyword type for storing the exact value.

In most case, and the default actually, is to use both types when for strings which Elasticsearch supports through multi fields, that is the ability to index the same string in multiple ways; for example index it both as text for search but also as keyword for sorting and aggregations.

As SQL requires exact values, when encountering a text field Elasticsearch SQL will search for an exact multi-field that it can use for comparisons, sorting and aggregations. To do that, it will search for the first keyword that it can find that is not normalized and use that as the original field exact value.

Consider the following string mapping:

{
    "first_name" : {
        "type" : "text",
        "fields" : {
            "raw" : {
                "type" : "keyword"
            }
        }
    }
}

The following SQL query:

SELECT first_name FROM index WHERE first_name = 'John'

is identical to:

SELECT first_name FROM index WHERE first_name.raw = 'John'

as Elasticsearch SQL automatically picks up the raw multi-field from raw for exact matching.