Rollup searchedit

Enables searching rolled-up data using the standard query DSL.

This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.

Requestedit

GET <target>/_rollup_search

Descriptionedit

The rollup search endpoint is needed because, internally, rolled-up documents utilize a different document structure than the original data. The rollup search endpoint rewrites standard query DSL into a format that matches the rollup documents, then takes the response and rewrites it back to what a client would expect given the original query.

Path parametersedit

<target>

(Required, string) Comma-separated list of data streams and indices used to limit the request. Wildcard expressions (*) are supported.

This target can include both rollup and non-rollup indices.

Rules for the <target> parameter:

  • At least one data stream, index, or wildcard expression must be specified. This target can include a rollup or non-rollup index. For data streams, the stream’s backing indices can only serve as non-rollup indices. Omitting the <target> parameter or using _all is not permitted.
  • Multiple non-rollup indices may be specified.
  • Only one rollup index may be specified. If more than one are supplied, an exception occurs.
  • Wildcard expressions may be used, but, if they match more than one rollup index, an exception occurs. However, you can use an expression to match multiple non-rollup indices or data streams.

Request bodyedit

The request body supports a subset of features from the regular Search API. It supports:

Functionality that is not available:

  • size: Because rollups work on pre-aggregated data, no search hits can be returned and so size must be set to zero or omitted entirely.
  • highlighter, suggestors, post_filter, profile, explain: These are similarly disallowed.

Examplesedit

Historical-only search exampleedit

Imagine we have an index named sensor-1 full of raw data, and we have created a rollup job with the following configuration:

PUT _rollup/job/sensor
{
  "index_pattern": "sensor-*",
  "rollup_index": "sensor_rollup",
  "cron": "*/30 * * * * ?",
  "page_size": 1000,
  "groups": {
    "date_histogram": {
      "field": "timestamp",
      "fixed_interval": "1h",
      "delay": "7d"
    },
    "terms": {
      "fields": [ "node" ]
    }
  },
  "metrics": [
    {
      "field": "temperature",
      "metrics": [ "min", "max", "sum" ]
    },
    {
      "field": "voltage",
      "metrics": [ "avg" ]
    }
  ]
}

This rolls up the sensor-* pattern and stores the results in sensor_rollup. To search this rolled up data, we need to use the _rollup_search endpoint. However, you’ll notice that we can use regular query DSL to search the rolled-up data:

GET /sensor_rollup/_rollup_search
{
  "size": 0,
  "aggregations": {
    "max_temperature": {
      "max": {
        "field": "temperature"
      }
    }
  }
}

The query is targeting the sensor_rollup data, since this contains the rollup data as configured in the job. A max aggregation has been used on the temperature field, yielding the following response:

{
  "took" : 102,
  "timed_out" : false,
  "terminated_early" : false,
  "_shards" : ... ,
  "hits" : {
    "total" : {
        "value": 0,
        "relation": "eq"
    },
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "max_temperature" : {
      "value" : 202.0
    }
  }
}

The response is exactly as you’d expect from a regular query + aggregation; it provides some metadata about the request (took, _shards, etc), the search hits (which is always empty for rollup searches), and the aggregation response.

Rollup searches are limited to functionality that was configured in the rollup job. For example, we are not able to calculate the average temperature because avg was not one of the configured metrics for the temperature field. If we try to execute that search:

GET sensor_rollup/_rollup_search
{
  "size": 0,
  "aggregations": {
    "avg_temperature": {
      "avg": {
        "field": "temperature"
      }
    }
  }
}
{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "There is not a rollup job that has a [avg] agg with name [avg_temperature] which also satisfies all requirements of query.",
        "stack_trace": ...
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "There is not a rollup job that has a [avg] agg with name [avg_temperature] which also satisfies all requirements of query.",
    "stack_trace": ...
  },
  "status": 400
}

Searching both historical rollup and non-rollup dataedit

The rollup search API has the capability to search across both "live" non-rollup data and the aggregated rollup data. This is done by simply adding the live indices to the URI:

GET sensor-1,sensor_rollup/_rollup_search 
{
  "size": 0,
  "aggregations": {
    "max_temperature": {
      "max": {
        "field": "temperature"
      }
    }
  }
}

Note the URI now searches sensor-1 and sensor_rollup at the same time

When the search is executed, the rollup search endpoint does two things:

  1. The original request is sent to the non-rollup index unaltered.
  2. A rewritten version of the original request is sent to the rollup index.

When the two responses are received, the endpoint rewrites the rollup response and merges the two together. During the merging process, if there is any overlap in buckets between the two responses, the buckets from the non-rollup index are used.

The response to the above query looks as expected, despite spanning rollup and non-rollup indices:

{
  "took" : 102,
  "timed_out" : false,
  "terminated_early" : false,
  "_shards" : ... ,
  "hits" : {
    "total" : {
        "value": 0,
        "relation": "eq"
    },
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "max_temperature" : {
      "value" : 202.0
    }
  }
}