Agent Skills for Elastic Observability

Elastic Observability provides a wide set of capabilities, from configuring OpenTelemetry instrumentation, writing ES|QL queries to search logs and metrics, defining SLOs with the correct indicator types and equation syntax, triaging noisy alert storms, and stitching together service health from multiple signals. SREs are now looking to autmoate further with AI Agents.

Elastic's Agent skills are open source packages that give your AI coding agent native Elastic expertise. If you're already using Elastic Agent Builder, you get AI agents that work natively with your Observability data. The Elastic Agent Skills deliver native platform expertise directly to your AI coding agent, so you can stop debugging AI-generated errors and start shipping production-ready code with the full depth of Elastic.

Skills can be used for specialized tasks across the Elastic stack — Elasticsearch, Kibana, Elastic Security, Elastic Observability, and more. Each skill lives in its own folder with a SKILL.md file containing metadata and instructions the agent follows.

Observability is releasing five skills that together cover the core workflows SREs and developers perform daily.Running Elastic Observability today involves a wide surface area: configuring OpenTelemetry instrumentation, writing ES|QL queries to search logs and metrics, defining SLOs with the correct indicator types and equation syntax, tand stitching together service health from multiple signals. Each of these tasks requires domain expertise and familiarity with specific APIs, index patterns, and Kibana workflows. For teams managing dozens of services across multiple environments, this is repetitive, error-prone, and time-consuming.

This article walks through the current Observability skill set, shows an end-to-end workflow, and highlights where these skills are useful in day-to-day operations.

Why this matters for observability teams

Modern observability work is usually ad hoc and cross-cutting. In one hour, you may instrument a new service, inspect logs for an incident, check error-budget status, and validate service health across several signals.

Each step often needs different APIs, index patterns, and Kibana workflows. Agent Skills package this task knowledge into reusable units so an agent can execute these steps consistently.

The observability skills

The observability set currently focuses on five connected workflows:

Instrument applications Adds the Elastic Distributions of OpenTelemetry to Python, Java, or .NET services (tracing, metrics, logs) or helps migrate from the classic Elastic APM agents to EDOT, with correct OTLP endpoints and configuration
Search logs Provides visibility into Elastic Streams — the data routing and processing layer for observability data.
Manage SLOs Creates and manages Service-Level Objectives in Elastic Observability via the Kibana API — from data exploration through SLO definition, creation, and lifecycle management.
Assess service health Provides a unified view of service health by combining signals from APM, infrastructure metrics, logs, SLOs, and alerts into a single assessment.
Observe LLM applications Monitors and troubleshoots LLM-powered applications — tracking token usage, latency, error rates, and model performance across inference calls.

What Agent Skills are

Agent Skills are self-contained folders with instructions, scripts, and resources that an AI agent loads dynamically for a specific task. Elastic publishes official skills in elastic/agent-skills, based on the Agent Skills standard.

At a practical level, this means:

You describe the goal.
The agent selects the relevant skill or you specify it.
The skill applies known consistent steps and API patterns, Elastic recommendeds, for that job.

Practical example: from incident question to root-cause

As an SRE, you're notified that a specific customer is experiencing errors. Support has been trying to trouble shoot, but they need help. Support provides a transaction ID to investigate.

You've loaded Elastic's Agent Skills to Claude. You ask Claude:

Find out why transaction with id 01ba6cf8e60253bdeb26026caa3278a1 is having issues over the last 24 hours.

Claude, with Elastic O11y Skills added, analyzes the issue for that specific transaction with Elastic.

it uses the log-search skill to narrow down likely causes
the root cause is identified
and a potential remediation is recommended

How to get started

Install Elastic skills with the skills CLI:

npx skills add elastic/agent-skills

Install a specific skill directly:

npx skills add elastic/agent-skills --skill logs-search

Then run your agent and give it an outcome-focused request, for example:

My cart service is experiencing some slowness, are there any errors over the last 3 hours? Please give me a summary of these logs.

The key shift is that the request is outcome-first. The skill captures implementation details such as API order, field expectations, and verification steps.

What is next

The planned scope includes broader workflow coverage. As skills mature, teams can combine them into repeatable operating patterns that still support ad hoc investigation.

If you want to try this model now, get Elastic's Agent Skills, start with one service and one workflow:

Assess service health.
Run guided log investigation for one real incident.
Add SLO management after baseline telemetry quality is in place.
Understand how well your LLM is performing for your developers.

This gives you a concrete way to evaluate agent-assisted observability work without changing your full operating model in one step.