Create an infrastructure threshold rule

edit

Create an infrastructure threshold rule

edit

Based on the resources listed on the Inventory page within the Metrics app, you can create a threshold rule to notify you when a metric has reached or exceeded a value for a specific resource or a group of resources within your infrastructure.

Additionally, each rule can be defined using multiple conditions that combine metrics and thresholds to create precise notifications and reduce false positives.

  1. To access this page, go to Observability > Metrics.
  2. On the Inventory page or the Metrics Explorer page, click Alerts > Infrastructure.
  3. Select Create inventory alert.

When you select Create inventory alert, the parameters you configured on the Inventory page will automatically populate the rule. You can use the Inventory first to view which nodes in your infrastructure you’d like to be notified about and then quickly create a rule in just a few clicks.

Inventory conditions

edit

Conditions for each rule can be applied to specific metrics relating to the inventory type you select. You can choose the aggregation type, the metric, and by including a warning threshold value, you can be alerted on multiple threshold values based on severity scores. When creating the rule, you can still get notified if no data is returned for the specific metric or if the rule fails to query Elasticsearch.

In this example, Kubernetes Pods is the selected inventory type. The conditions state that you will receive a critical alert for any pods within the ingress-nginx namespace with a memory usage of 95% or above and a warning alert if memory usage is 90% or above.

Inventory rule

Before creating a rule, you can preview whether the conditions would have triggered the alert in the last hour, day, week, or month.

Preview rules

Action types

edit

You can extend your rules by connecting them to actions that use the following supported built-in integrations.

Action types

When configuring an action type, you can define precisely when the alert is triggered by selecting a specific threshold condition: Alert, Warning, or Recovered (a value that was once above a threshold has now dropped below it).

Configure when an alert is triggered

Action variables

edit

This section details the variables that infrastructure threshold rules will send to your actions.

Basic variables
edit
The default infrastructure threshold rule message detailing basic variables

The default message for an infrastructure threshold rule displays the basic variables you can use.

  • context.group: This variable resolves to the group that the rule conditions detected. For inventory rules, this is the name of a monitored host, pod, container, etc. For metric threshold rules, this is the value of the field specified in the Create alert per field or * if the rule is configured to aggregate your entire infrastructure.
  • context.alertState: Depending on why the action is triggered, this variable resolves to ALERT, NO DATA, or ERROR. ALERT means the rule condition is detected, NO DATA means that no data was returned for the time period that was queried, and ERROR indicates an error when querying the data.
  • context.reason: This variable describes why the rule is in its current state. For each of the conditions, it includes the detected value of the monitored metric and a description of the threshold.
  • context.timestamp: The timestamp of when the rule was detected.
Advanced variables
edit
The default infrastructure threshold rule message detailing advanced variables

Instead of using context.reason to provide all the information you need, there may be cases when you’d like to customize your action message. Infrastructure threshold rules provide advanced context variables that have a tree structure.

These variables must use the structure of {{context.[Variable Name].condition[Number]}}. For example, {{context.value.condition0}}, {{context.value.condition1}}, and so on. This is required even if your rule has only one condition (accessible with .condition0). Using just {{context.[Variable Name]}} evaluates to a blank line or [object Object] depending on the action type.

  • context.value.condition[X]: This variable resolves to the detected value of conditions 0, 1, 2, etc.
  • context.value.threshold[X]: This variable resolves to the threshold values of conditions 0, 1, 2, and so on.
  • context.value.metric[X]: This variable resolves to the monitored metric of conditions 0, 1, 2, and so on.

Settings

edit

With infrastructure threshold rules, it’s not possible to set an explicit index pattern as part of the configuration. The index pattern is instead inferred from Metrics indices on the Settings page of the Metrics app.

With each execution of the rule check, the Metrics indices setting is checked, but it is not stored when the rule is created.

The Timestamp field that is set under Settings determines which field is used for timestamps in queries.