Create a service-level objective (SLO) | Elastic Observability [8.8]

IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

› ›

« Configure service-level objective (SLO) access Cases »

Create a service-level objective (SLO)edit

This functionality is in beta and is subject to change. The design and code is less mature than official GA features and is being provided as-is with no warranties. Beta features are not subject to the support SLA of official GA features.

Before creating an SLO, you need to configure your SLO access.

To create an SLO, go to Observability → SLOs:

If you’re creating your first SLO, you’ll see an introductory page. Click the Create SLO button.
If you’ve created SLOs before, click the Create new SLO button in the upper-right corner of the page.

From here, complete the following:

Define your SLIedit

The type of SLI to use depends on the location of your data. If you’re creating an SLO based on raw logs coming from your services, you can use a custom KQL SLI. If you’re creating an SLO based on services using application performance monitoring (APM), you can use an APM latency or APM availability SLI. See the following table for more on each type of SLI:

Custom KQL

This indicator can be based on any elasticsearch index or index pattern you have. You define two queries, one that yields the good events and one that yields the total events from your index.

Example: You could define a custom KQL indicator based on the service-logs with the good query defined as nested.field.response.latency <= 100 and nested.field.env : “production” and the total query defined as nested.field.env : “production”.

APM latency

This indicator is based on the APM data that we receive from your instrumented services and a latency threshold.

Example: You could define an indicator on an APM service named banking-service for the production environment, and the transaction name POST /deposit with a latency threshold value of 300ms.

APM availability

This indicator is based on the APM data that we receive from your instrumented services.

Example: You could define an indicator on an APM service named search-service for the prod environment, and the transaction name POST /search.

Custom KQLedit

When defining a custom KQL SLI, you can set the following fields:

Index — The index or index pattern you want to base the SLI upon.
Timestamp field — The timestamp field used by the index.
Query filter — A filter to apply on the index.
Good query — The KQL query yielding the good events to take into account for computing the SLI.
Total query — The KQL query yielding all events to take into account for computing the SLI.

APM latency or availabilityedit

When defining an APM latency or APM availability SLI, you can set the following fields:

Service name — The APM service name.
Service environment — Either all or the specific environment.
Transaction type — Either all or the specific transaction type.
Transaction name — Either all or the specific transaction name.
Threshold (APM latency only) — The latency threshold in milliseconds (ms) to consider the request as good.
Query filter — An optional query filter on the APM data.

Set your objectivesedit

After defining your SLI, you need to set your objectives. To set your objectives, complete the following:

Select your budgeting methodedit

You can select either an occurrences or a timeslices budgeting method:

Occurrences

Uses the number of good events and the number of total events to compute the SLO.

Example: You have a 30 day rolling SLO with a 95% target, and over the past 30 days there were 1,355,700 total events. The error budget is 100-95 = 5%, or about 66,785 bad events tolerated before violating the SLO.

If we had 1,300,000 good events over the same period, the observed value is Good Events / Total Events = 0.95891421 => 95.89%.

Timeslices

Breaks the overall time window into smaller slices of a defined duration and uses the number of good slices over the number of total slices to compute the SLO.

Timeslice target (%) - Individual timeslices target that determines if the slice is good or bad. Timeslice window (in minutes) - The size of the timeslice window size.

Example: A 30 days rolling SLO defined with 5 min slices has a total of 30*24*12 = 8640 slices. If the SLO target is 98%, we have a 100-98 = 2% error budget or 8640 * 0.02 = 172 bad slices available before we violate the SLO.

Set your time windowedit

Select the durations over which you want to compute your SLO. Then time window uses the data from the defined rolling period. For example, the last 30 days.

Set your target/SLO (%)edit

The SLO target objective in percentage.

Describe your SLOedit

After setting your objectives, give your SLO a name, a short description, and add any relevant tags.

SLO burn rate alert ruleedit

When the Create an SLO burn rate alert rule checkbox is selected, the Create rule window opens immediately after you click the Create SLO button. Here you can define your SLO burn rate alert rule. For more information, see Create an SLO burn rate rule.

« Configure service-level objective (SLO) access Cases »