Tech Topics

Introducing Field Aliases in Elasticsearch

In Elasticsearch 6.4, we introduced support for a new mapping type called alias. This data type allows users to define an alternate name for a field in an index, which can then be used in place of the target field in searches. Field aliases can be helpful in a few different situations.

Renaming fields

Changing the name of a field typically requires all documents containing the field to be reindexed, a potentially expensive undertaking. If you happen to be using time-based indexing, you can use field aliases to rename the field without having to reindex old data. The process works as follows:

  1. On each existing index, we add a field alias mapping the new field name to the old one.
  2. For newly-created indices, we start to use the new field name exclusively.
  3. Eventually the time-based indices containing the old name are dropped or archived, and the rename is complete.

With the field aliases in place, it’s possible to search across all the indices using the new field name while the rename is in progress.

Uniting indices under a common schema

There are many benefits to using a common schema like ECS for all your data: you can easily run searches that span multiple indices, make use of predefined dashboards and machine learning jobs, and more. Migrating existing indices to conform to the common schema can be challenging, however, as they may contain fields with custom names and formats. Field aliases can help in this process: given an index with non-standard field names, we can add field aliases with the names of standard schema fields. We can then search across all indices using the new names, allowing us to use shared dashboards and other tools.

Now that you have a sense of why field aliases are useful, let’s dive into an example to better understand how they work.

Field aliases in action

Imagine we’re part of a group of engineers who help maintain the international space station. We’re developing an application that collects telemetry data from the station in order to monitor and analyze its daily function. The data is stored in Elasticsearch indices that look something like this:

PUT station_telemetry_2019-04-15
{
  "mappings": {
    "properties": {
      "active_batteries": {
        "type": "integer"
      },
      "cabin_temperature": {
        "type": "float"
      }
    }
  }
}

Because of the large number of readings, the data is organized into time-based indices with a new index per day. Indexes are dropped when they grow older than one month.

We get word that an American analyst will be joining the project, and it dawns on us that none of the field names specify units! Remembering how dangerous mismatched units can be, we decide to rename the ambiguous fields. We want to ensure that newly-created indices use updated field names like cabin_temperature_celsius. However, our analysis regularly extends to the past week of data, which will also include indices with the original field name.

As we saw in the section on 'renaming fields', this is a situation in which field aliases can be helpful. On each existing index, we add a new alias field to the mapping called cabin_temperature_celsius that refers to the original field cabin_temperature:

PUT station_telemetry_2019-04-15/_mapping
{
  "properties": {
    "cabin_temperature_celsius": {
      "type": "alias",
      "path": "cabin_temperature"
    }
  }
}

We also make sure that going forward, new indices use the name cabin_temperature_celsius from the start. Now we can search across indices using the new name:

GET station_telemetry_*/_search
{
  "query": {
    "range": {
      "active_batteries": {
        "gte": 4
      }
    }
  },
  "aggs": {
    "average_temp": {
      "avg": {
        "field": "cabin_temperature_celsius"
      }
    }
  }
}

Calling the field capabilities API shows that cabin_temperature_celsius can be searched as a float across all indices. This is true even though it is mapped as a field alias in older indices, and as a concrete field the newly-created ones:

GET station_telemetry_*/_field_caps?fields=cabin_temperature_celsius

{
  "fields" : {
    "cabin_temperature_celsius" : {
      "float" : {
        "type" : "float",
        "searchable" : true,
        "aggregatable" : true
      }
    }
  }
}

We are now able to update our dashboards to use the new field names that include units, helping avoid confusion for our new colleague. After enough time, the indices containing the original field will be dropped, leaving only the new field names.

Searching vs. Indexing

When processing a search request, Elasticsearch checks each field to see if it matches the name of an alias. If it does, then the field is resolved to its target before executing the search request. Aliases can be used in almost all parts of the search request, including queries, aggregations, and sort fields, as well as docvalue fields and scripts. As we saw above, the field capabilities API supports aliases, which can be helpful in figuring out how to search across different indices.

It’s important to note that field aliases cannot be used when indexing documents: a document’s source can only contain concrete fields. For this reason, field aliases don’t appear in the result of APIs like GET, which return the document source untouched.

If the new field names need to be used when ingesting documents or loading their source, then field aliases may not be a good fit. However, when it’s most important to be able to search using these new names, the alias mapping type can provide a lighter-weight option to rewriting the data.

Learn more

The field alias documentation contains more detail on this new data type, including important information about its limitations. We hope you get the chance to try out field aliases with Elasticsearch either online or by downloading it, and look forward to your feedback on GitHub and the Discuss forums!