December 13, 2016 Engineering

Using Painless in Kibana scripted fields

By Tanya Bragin

Kibana provides powerful ways to search and visualize data stored in Elasticsearch. For the purpose of visualizations, Kibana looks for fields defined in Elasticsearch mappings and presents them as options to the user building a chart. But what happens if you forget to define an important value as a separate field in your schema? Or what if you want to combine two fields and treat them as one? This is where Kibana scripted fields come into play.

Scripted fields have actually been around since the early days of Kibana 4. At the time they were introduced, the only way to define them relied on Lucene Expressions, a scripting language in Elasticsearch which deals exclusively with numeric values. As a result, the power of scripted fields was limited to a subset of use cases. In 5.0, Elasticsearch introduced Painless, a safe and powerful scripting language that allows operating on a variety of data types, and as a result, scripted fields in Kibana 5.0 are that much more powerful.

In the rest of this blog, we'll walk you through how to create scripted fields for common use cases. We'll do so by relying on a dataset from Kibana Getting Started tutorial and use an instance of Elasticsearch and Kibana running in Elastic Cloud, which you can spin up for free.

The following video walks you through how to spin up a personal Elasticsearch and Kibana instance in Elastic Cloud and load a sample dataset into it. 

How scripted fields work

Elasticsearch allows you to specify scripted fields on every request. Kibana improves on this by allowing you to define a scripted field once in the Management section, so it can be used in multiple places in the UI going forward. Note that while Kibana stores scripted fields alongside its other configuration in the .kibana index, this configuration is Kibana-specific, and Kibana scripted fields are not exposed to API users of Elasticsearch.

When you go to define a scripted field in Kibana, you'll be given a choice of scripting language, allowing you to pick from all the languages installed on the Elasticsearch nodes that have dynamic scripting enabled. By default that is "expression" and "painless" in 5.0 and just "expression" in 2.x. You can install other scripting languages and enable dynamic scripting for them, but it is not recommended because they cannot be sufficiently sandboxed and have been deprecated.

Scripted fields operate on one Elasticsearch document at a time, but can reference multiple fields in that document. As a result, it is appropriate to use scripted fields to combine or transform fields within a single document, but not perform calculations based on on multiple documents (e.g. time-series math). Both Painless and Lucene expressions operate on fields stored in doc_values. So for string data, you will need to have the string to be stored in data type keyword. Scripted fields based on Painless also cannot operate directly on _source.

Once scripted fields are defined in "Management", user can interact with them the same way as with other fields in the rest of Kibana. Scripted fields automatically show up in the Discover field list and are available in Visualize for the purposes of creating visualizations. Kibana simply passes scripted field definitions to Elasticsearch at query time for evaluation. The resulting dataset is combined with other results coming back from Elasticsearch and presented to the user in a table or a chart.

There are a couple of known limitations when working with scripted fields at the time of writing this blog. You can apply most Elasticsearch aggregations available in Kibana visual builder to scripted fields, with the most notable exception of the significant terms aggregation. You can also filter on scripted fields via the filter bar in Discover, Visualize, and Dashboard, although you have to take care to write proper scripts that return well-defined values, as we show below. It is also important to refer to the "Best Practices" section below to ensure you do not destabilize your environment, when using scripted fields.

The following video shows how to use Kibana to create scripted fields.

Scripted field examples

This section presents a few examples of Lucene expressions and Painless scripted fields in Kibana in common scenarios. As mentioned above, these examples were developed on top of a dataset from Kibana Getting Started tutorial and assume you are using Elasticsearch and Kibana 5.1.1, as there are a couple of known issues related to filtering and sorting on certain types of scripted fields in earlier versions.

For the most part, scripted fields should work out of the box, as Lucene expressions and Painless are enabled by default in Elasticsearch 5.0. The only exception are scripts that require regex-based parsing of fields, which will require you to set the following setting in elasticsearch.yml to turn on regex matching for Painless: script.painless.regex.enabled: true

Perform a calculation on a single field

  • Example: Calculate kilobytes from bytes
  • Language: expressions
  • Return type: number
 doc['bytes'].value / 1024

Note: Keep in mind that Kibana scripted fields work on a single document at a time only, so there is no way to do time-series math in a scripted field.

Date math resulting in number

  • Example: Parse date into hour-of-day
  • Language: expressions
  • Return type: number

Lucene expressions provide a whole host of date manipulation functions out-of-the-box. However, since Lucene expressions only return numerical values, we'll have to use Painless to return a string-based day-of-week (below).

 doc['@timestamp'].date.hourOfDay

Note: Script above will return 1-24

doc['@timestamp'].date.dayOfWeek

Note: Script above will return 1-7

Combine two string values

  • Example: Combine source and destination or first and last name
  • Language: painless
  • Return type: string
 doc['geo.dest.keyword'].value + ':' + doc['geo.src.keyword'].value

Note: Because scripted fields need to operate on fields in doc_values, we are using .keyword versions of strings above.

Introducing logic

  • Example: Return label "big download" for any document with bytes over 10000
  • Language: painless
  • Return type: string
 if (doc['bytes'].value > 10000) { 
    return "big download";
}
return "";

Note: When introducing logic, ensure that every execution path has a well-defined return statement and a well-defined return value (not null). For instance, above scripted field will fail with a compile error when used in Kibana filters without the return statement at the end or if the statement returns null. Also keep in mind that breaking up logic into functions is not supported within Kibana scripted fields. 

Return substring

  • Example: Return the part after the last slash in the URL
  • Language: painless
  • Return type: string
 def path = doc['url.keyword'].value;
if (path != null) {
    int lastSlashIndex = path.lastIndexOf('/');
    if (lastSlashIndex > 0) {
    return path.substring(lastSlashIndex+1);
    }
}
return "";

Note: Whenever possible, avoid using regex expressions to extract substrings, as indexOf() operations are less resource-intensive and less error-prone. 

Match a string using regex, and take action on a match

  • Example: Return a string "error" if a substring "error" is found in field "referer", otherwise return a string "no error".
  • Language: painless
  • Return type: string
if (doc['referer.keyword'].value =~ /error/) { 
return "error"
} else {
return "no error"
}

Note: Simplified regex syntax is useful for conditionals based on a regex match. 

Match a string and return that match

  • Example: Return domain, the string after the last dot in the "host" field.
  • Language: painless
  • Return type: string
def m = /^.*\.([a-z]+)$/.matcher(doc['host.keyword'].value);
if ( m.matches() ) {
   return m.group(1)
} else {
   return "no match"
}

Note: Defining an object via the regex matcher() functions allows you to extract groups of characters that matched the regex and return them. 

Match a number and return that match

  • Example: Return the first octet of the IP address (stored as a string) and treat it as a number.
  • Language: painless
  • Return type: number
 def m = /^([0-9]+)\..*$/.matcher(doc['clientip.keyword'].value);
if ( m.matches() ) {
   return Integer.parseInt(m.group(1))
} else {
   return 0
}

Note: It is important to return the right data type in a script. Regex match returns a string, even if a number is matched, so you should explicitly convert it to an integer on return. 

Date math resulting in strings

  • Example: Parse date into day-of-week into string
  • Language: painless
  • Return type: string
LocalDateTime.ofInstant(Instant.ofEpochMilli(doc['@timestamp'].value), ZoneId.of('Z')).getDayOfWeek().getDisplayName(TextStyle.FULL, Locale.getDefault())

Note: Since Painless supports all of Java's native types, it provides access to native functions around those types, such as LocalDateTime(), useful in performing more advanced date math.

Best practices

As you see, the Painless scripted language provides powerful ways of extracting useful information out of arbitrary fields stored in Elasticsearch via Kibana scripted fields. However, with great power comes great responsibility. 

Below we outline a few best practices around using Kibana scripted fields.

  • Always use a development environment to experiment with scripted fields. Because scripted fields are immediately active after you save them in the Management section of Kibana (e.g. they appear in the Discover screen for that index pattern for all users), you should not develop scripted fields directly in production. We recommend that you try your syntax first in a development environment, evaluate the impact of scripted fields on realistic data sets and data volumes in staging, and only then promote them to production. 
  • Once you gain confidence that the scripted field provides value to your users, consider modifying your ingest to extract the field at index time for new data. This will save Elasticsearch processing at query time and will result in faster response times for Kibana users. You can also use the _reindex API in Elasticsearch to re-index existing data.