Create an SLO

Important

For Observability serverless projects, The Editor role or higher is required to create SLOs. To learn more, refer to Assign user roles and privileges.

For Elastic Stack, to create and manage SLOs, you need an appropriate license, an Elasticsearch cluster with both transform and ingest node roles present, and SLO access must be configured.

To create an SLO, find SLOs in the main menu or use the global search field:

If you’re creating your first SLO, you’ll see an introductory page. Click the Create SLO button.
If you’ve created SLOs before, click the Create new SLO button in the upper-right corner of the page.

From here, complete the following steps:

Define your service-level indicator (SLI).
Set your objectives.
Describe your SLO.

Note

For Elastic Stack, the cluster must include one or more nodes with both ingest and transform roles. The roles can exist on the same node or be distributed across separate nodes. On Elastic Cloud Hosted deployments, this is handled by the hot nodes, which serve as both ingest and transform nodes.

Define your SLI

The type of SLI to use depends on the location of your data:

Custom KQL: Create an SLI based on raw logs coming from your services.
Timeslice metric: Create an SLI based on a custom equation that uses multiple aggregations.
Custom metric: Create an SLI to define custom equations from metric fields in your indices.
Histogram metric: Create an SLI based on histogram metrics.
APM latency and APM availability: Create an SLI based on services using application performance monitoring (APM).

Custom KQL

Create an indicator based on any of your Elasticsearch indices or data views. You define two queries: one that yields the good events from your index, and one that yields the total events from your index.

Example: You can define a custom KQL indicator based on the service-logs index with the good query defined as nested.field.response.latency <= 100 and nested.field.env : “production” and the total query defined as nested.field.env : “production”.

When defining a custom KQL SLI, set the following fields:

Index: The data view or index pattern you want to base the SLI on. For example, service-logs.
Timestamp field: The timestamp field used by the index.
Query filter: A KQL filter to specify relevant criteria by which to filter the index documents.
Good query: The query yielding events that are considered good or successful. For example, nested.field.response.latency <= 100 and nested.field.env : “production”.
Total query: The query yielding all events to take into account for computing the SLI. For example, nested.field.env : “production”.
Group by: The field used to group the data based on the values of the specific field. For example, you could group by the url.domain field, which would create individual SLOs for each value of the selected field.

Custom metric

Create an indicator to define custom equations from metric fields in your indices.

Example: You can define Good events as the sum of the field processor.processed with a filter of "processor.outcome: \"success\"", and the Total events as the sum of processor.processed with a filter of "processor.outcome: *".

When defining a custom metric SLI, set the following fields:

Source
- Index: The data view or index pattern you want to base the SLI on. For example, my-service-*.
- Timestamp field: The timestamp field used by the index.
- Query filter: A KQL filter to specify relevant criteria by which to filter the index documents. For example, 'field.environment : "production" and service.name : "my-service"'.
Good events
- Metric [A-Z]: The field that is aggregated using the sum aggregation for good events. For example, processor.processed.
- Filter [A-Z]: The filter to apply to the metric for good events. For example, "processor.outcome: \"success\"".
- Equation: The equation that calculates the good metric. For example, A.
Total events
- Metric [A-Z]: The field that is aggregated using the sum aggregation for total events. For example, processor.processed.
- Filter [A-Z]: The filter to apply to the metric for total events. For example, "processor.outcome: *".
- Equation: The equation that calculates the total metric. For example, A.
Group by: The field used to group the data based on the values of the specific field. For example, you could group by the url.domain field, which would create individual SLOs for each value of the selected field.

Timeslice metric

Create an indicator based on a custom equation that uses statistical aggregations and a threshold to determine whether a slice is good or bad. Supported aggregations include Average, Max, Min, Sum, Cardinality, Last value, Std. deviation, Doc count, and Percentile. The equation supports basic math and logic.

Note

This indicator requires you to use the Timeslices budgeting method.

Example: You can define an indicator to determine whether a Kubernetes StatefulSet is healthy. First you set the query filter to orchestrator.cluster.name: "elastic-k8s" AND kubernetes.namespace: "my-ns" AND data_stream.dataset: "kubernetes.state_statefulset". Then you define an equation that compares the number of ready (healthy) replicas to the number of observed replicas: A == B ? 1 : 0, where A retrieves the last value of kubernetes.statefulset.replicas.ready and B retrieves the last value of kubernetes.statefulset.replicas.observed. The equation returns 1 if the condition A == B is true (indicating the same number of replicas) or 0 if it’s false. If the value is less than 1, you can determine that the Kubernetes StatefulSet is unhealthy.

When defining a timeslice metric SLI, set the following fields:

Source
- Index: The data view or index pattern you want to base the SLI on. For example, metrics-*:metrics-*.
- Timestamp field: The timestamp field used by the index.
- Query filter: A KQL filter to specify relevant criteria by which to filter the index documents. For example, orchestrator.cluster.name: "elastic-k8s" AND kubernetes.namespace: "my-ns" AND data_stream.dataset: "kubernetes.state_statefulset".
Metric definition
- Aggregation [A-Z]: The type of aggregation to use.
- Field [A-Z]: The field to use in the aggregation. For example, kubernetes.statefulset.replicas.ready.
- Filter [A-Z]: The filter to apply to the metric.
- Equation: The equation that calculates the total metric. For example, A == B ? 1 : 0.
- Comparator: The type of comparison to perform.
- Threshold: The value to use along with the comparator to determine if the slice is good or bad.

Histogram metric

Histograms record data in a compressed format and can record latency and delay metrics. You can create an SLI based on histogram metrics using a range aggregation or a value_count aggregation for both the good and total events. Filtering with KQL queries is supported on both event types.

When using a range aggregation, both the from and to thresholds are required for the range and the events are the total number of events within that range. The range includes the from value and excludes the to value.

Example: You can define your Good events using the processor.latency field with a filter of "processor.outcome: \"success\"", and your Total events using the processor.latency field with a filter of "processor.outcome: *".

When defining a histogram metric SLI, set the following fields:

Source
- Index: The data view or index pattern you want to base the SLI on. For example, my-service-*.
- Timestamp field: The timestamp field used by the index.
- Query filter: A KQL filter to specify relevant criteria by which to filter the index documents. For example, field.environment : "production" and service.name : "my-service".
Good events
- Aggregation: The type of aggregation to use for good events, either Value count or Range.
- Field: The field used to aggregate events considered good or successful. For example, processor.latency.
- From: (range aggregation only) The starting value of the range for good events. For example, 0.
- To: (range aggregation only) The ending value of the range for good events. For example, 100.
- KQL filter: The filter for good events. For example, "processor.outcome: \"success\"".
Total events
- Aggregation: The type of aggregation to use for total events, either Value count or Range.
- Field: The field used to aggregate total events. For example, processor.latency.
- From: (range aggregation only) The starting value of the range for total events. For example, 0.
- To: (range aggregation only) The ending value of the range for total events. For example, 100.
- KQL filter: The filter for total events. For example, "processor.outcome : *".
Group by: The field used to group the data based on the values of the specific field. For example, you could group by the url.domain field, which would create individual SLOs for each value of the selected field.

APM latency and APM availability

There are two types of SLI you can create based on services using application performance monitoring (APM): APM latency and APM availability.

Use APM latency to create an indicator based on latency data received from your instrumented services and a latency threshold.

Example: You can define an indicator on an APM service named banking-service for the production environment, and the transaction name POST /deposit with a latency threshold value of 300ms.

Use APM availability to create an indicator based on the availability of your instrumented services. Availability is determined by calculating the percentage of successful transactions (event.outcome : "success") out of the total number of successful and failed transactions—unknown outcomes are excluded.

Example: You can define an indicator on an APM service named search-service for the production environment, and the transaction name POST /search.

When defining either an APM latency or APM availability SLI, set the following fields:

Service name: The APM service name.
Service environment: Either all or the specific environment.
Transaction type: Either all or the specific transaction type.
Transaction name: Either all or the specific transaction name.
Threshold (APM latency only): The latency threshold in milliseconds (ms) to consider the request as good.
Query filter: An optional query filter on the APM data.

Synthetics availability

Create an indicator based on the availability of your synthetic monitors. Availability is determined by calculating the percentage of checks that are successful (monitor.status : "up") out of the total number of checks.

Example: You can define an indicator based on a HTTP monitor being "up" for at least 99% of the time.

When defining a Synthetics availability SLI, set the following fields:

Monitor name — The name of one or more synthetic monitors.
Project — The ID of one or more projects containing synthetic monitors.
Tags — One or more tags assigned to synthetic monitors.
Query filter — An optional KQL query used to filter the Synthetics checks on some relevant criteria.

Note

Synthetics availability SLIs are automatically grouped by monitor and location.

Set your objectives

After defining your SLI, you need to set your objectives. To set your objectives, complete the following:

Select your budgeting method
Set your time window
Set your target/SLO percentage

Set your time window and duration

Select the durations over which you want to compute your SLO. You can select either a rolling or calendar aligned time window:


Rolling	Uses data from a specified duration that depends on when the SLO was created, for example the last 30 days.
Calendar aligned	Uses data from a specified duration that aligns with calendar, for example weekly or monthly.

Select your budgeting method

You can select either an occurrences or a timeslices budgeting method:


Occurrences	Uses the number of good events and the number of total events to compute the SLI.
Timeslices	Breaks the overall time window into smaller slices of a defined duration, and uses the number of good slices over the number of total slices to compute the SLI.

Create an SLO

Define your SLI

Custom KQL

Custom metric

Timeslice metric

Histogram metric

APM latency and APM availability

Synthetics availability

Set your objectives

Set your time window and duration

Select your budgeting method

Set your target/SLO (%)

Describe your SLO

SLO burn rate alert rule

Add an SLO Overview panel to a custom dashboard