Doc counts and overlapping jobs

There is an issue with doc counts, related to the above grouping limitation. Imagine you have two Rollup jobs saving to the same index, where one job is a "subset" of another job.

For example, you might have jobs with these two groupings:

PUT _xpack/rollup/job/sensor-all
{
    "groups" : {
      "date_histogram": {
        "field": "timestamp",
        "interval": "1h",
        "delay": "7d"
      },
      "terms": {
        "fields": ["node"]
      }
    },
    "metrics": [
        {
            "field": "price",
            "metrics": ["avg"]
        }
    ]
    ...
}

and

PUT _xpack/rollup/job/sensor-building
{
    "groups" : {
      "date_histogram": {
        "field": "timestamp",
        "interval": "1h",
        "delay": "7d"
      },
      "terms": {
        "fields": ["node", "building"]
      }
    }
    ...
}

The first job sensor-all contains the groupings and metrics that apply to all data in the index. The second job is rolling up a subset of data (in different buildings) which also include a building identifier. You did this because combining them would run into the limitation described in the previous section.

This mostly works, but can sometimes return incorrect doc_counts when you search. All metrics will be valid however.

The issue arises from the composite agg limitation described before, combined with search-time optimization. Imagine you try to run the following aggregation:

"aggs" : {
  "nodes": {
    "terms": {
      "field": "node"
    }
  }
}

This aggregation could be serviced by either sensor-all or sensor-building job, since they both group on the node field. So the RollupSearch API will search both of them and merge results. This will result in correct doc_counts and correct metrics. No problem here.

The issue arises from an aggregation that can only be serviced by sensor-building, like this one:

"aggs" : {
  "nodes": {
    "terms": {
      "field": "node"
    },
    "aggs": {
      "building": {
        "terms": {
          "field": "building"
        }
      }
    }
  }
}

Now we run into a problem. The RollupSearch API will correctly identify that only sensor-building job has all the required components to answer the aggregation, and will search it exclusively. Unfortunately, due to the composite aggregation limitation, that job only rolled up documents that have both a "node" and a "building" field. Meaning that the doc_counts for the "nodes" aggregation will not include counts for any document that doesn’t have [node, building] fields.

  • The doc_count for "nodes" aggregation will be incorrect because it only contains counts for nodes that also have buildings
  • The doc_count for "buildings" aggregation will be correct
  • Any metrics, on any level, will be correct

Workarounds

There are two main workarounds if you find yourself with a schema like the above.

Easiest and most robust method: use separate indices to store your rollups. The limitations arise because you have several document schemas co-habitating in a single index, which makes it difficult for rollups to correctly summarize. If you make several rollup jobs and store them in separate indices, these sorts of difficulties do not arise. It does, however, keep you from searching across several different rollup indices at the same time.

The other workaround is to include an "off-target" aggregation in the query, which pulls in the "superset" job and corrects the doc counts. The RollupSearch API determines the best job to search for each "leaf node" in the aggregation tree. So if we include a metric agg on price, which was only defined in the sensor-all job, that will "pull in" the other job:

"aggs" : {
  "nodes": {
    "terms": {
      "field": "node"
    },
    "aggs": {
      "building": {
        "terms": {
          "field": "building"
        }
      },
      "avg_price": {
        "avg": { "field": "price" } 
      }
    }
  }
}

Adding an avg aggregation here will fix the doc counts

Because only sensor-all job had an avg on the price field, the RollupSearch API is forced to pull in that additional job for searching, and will merge/correct the doc_counts as appropriate. This sort of workaround applies to any additional aggregation — metric or bucketing — although it can be tedious to look through the jobs and determine the right one to add.

Status

We realize this is an onerous limitation, and somewhat breaks the rollup contract of "pick the fields to rollup, we do the rest". We are actively working to get the limitation to composite agg fixed, and the related issues in Rollup. The documentation will be updated when the fix is implemented.