Jeffrey Rengifo

Contextual AI: Stop pinging the SRE: three MCP tools that turn Elastic Agent Builder into your team's runbook

Build three MCP tools in Elastic Agent Builder that read endpoint health, recent deploys and SLO burn rate directly in your editor. Encode your platform team's runbook once; every developer gets self-serve production context without pinging an SRE.

14 min read

A developer asks their editor, "Is it safe to merge this PR?" and gets a real answer in seconds, not a 10–15 minute dashboard hunt or a Slack ping to an SRE. This post shows how to build three MCP tools in Elastic Agent Builder that read endpoint health, recent deploys, and SLO burn rate, and encode the platform team's interpretation rules, error rate thresholds, deploy warm-up windows, and burn rate limits directly into the tool descriptions. The result is contextual AI: an agent that reasons over production signals using the runbook the platform team wrote once.

Prerequisites for Elastic Agent Builder MCP tools

  • An Elastic Cloud deployment with Elastic Stack 9.3+ (or Elastic Cloud Serverless) with Agent Builder enabled.
  • An APM-ingested service. If your cluster does not already have APM data, the companion notebook includes instructions to generate synthetic traffic using elastic/apm-integration-testing with the opbeans-node demo app.
  • An MCP-compatible client: Claude Code, Cursor, or VS Code with an MCP extension.
  • Basic familiarity with ES|QL syntax.
  • Node.js 18+ (for the mcp-remote bridge).

If you are new to MCP or need to set up the Elastic MCP server for the first time, check out Connect Agent Builder tools to any AI agent with Elastic MCP server for the full setup walkthrough. This article assumes the MCP server is already configured.

The problem: why developers fly blind

A developer is about to merge a pull request. The change looks simple: increasing the timeout for the downstream recommendations service call from 2 seconds to 5 seconds. But before hitting the merge button, a question lingers: is the service healthy enough to absorb this change right now?

To answer that question today, the developer has two options:

  1. Check dashboards manually. Open the APM UI, look at error rates, scan latency charts, find the SLO page, and look for recent deploys. This takes 10-15 minutes and requires knowing what to look for and how to interpret it.
  2. Ask an SRE. Ping the platform team on Slack: "Hey, is checkout healthy? I want to merge something." This creates an interruption, adds latency to the decision, and doesn't scale.

The core problem is not the data. Elastic already collects everything: traces, metrics, error logs, deploy markers, and SLO budgets. The problem is that correlating multiple signals requires mental overhead and domain knowledge that most developers don't have.

An SRE knows that a p99 spike after a deploy is normal for 5 minutes, that an error rate under 0.5% is acceptable during a release window, and that merging when the SLO budget is below 20% is risky. That knowledge lives in runbooks, tribal memory, and experience.

What if the platform engineer could encode that knowledge into tools that any developer can query from their editor?

How MCP tools in Elastic Agent Builder encode your runbook

The key insight is this: a tool is not just a query; it is a query plus interpretation. A dashboard shows you a p99 of 450ms. A well-designed tool tells you "p99 is 450ms, which is within normal range for this service, and has been stable since the last deploy 2 hours ago."

The difference is that the tool description carries the domain knowledge. When a platform engineer creates a tool in Agent Builder, they write descriptions like: "Error rate above 1% typically indicates a regression. If this coincides with a recent deploy, the deploy is the likely cause." That description becomes part of the context the AI agent uses when reasoning across multiple tool results.

This is what we mean by contextual AI: the AI agent does not just fetch data; it reasons over it using the interpretation rules that the platform team encoded.

Here is the architecture:

The platform engineer authors the tools once. Every developer on the team benefits from their own editor, without needing to learn ES|QL or understand APM data models.

Setting up the Elastic Agent Builder sample environment

The full end-to-end setup (traffic generation with opbeans-node, deploy annotations, SLO creation, and the three Agent Builder tools) is available as a runnable notebook at this repository: notebook.ipynb. The sections below focus on the ES|QL queries and tool descriptions: the why behind each tool, not the mechanics of posting them.

Building Tool 1: get_endpoint_health

This tool answers the question: "How is this endpoint performing right now?" It returns error rate, latency percentiles (p50, p95, p99), and throughput for a given service and endpoint within a time window.

Here is the full tool configuration as created in Agent Builder:

{
  "id": "get_endpoint_health",
  "type": "esql",
  "description": "Returns the current health of a service endpoint: error rate, latency percentiles (p50/p95/p99), and throughput. Use this tool to assess whether a service is healthy before making changes. Interpretation guide: error rate below 0.5% is healthy, 0.5-1% is elevated (check for recent deploys), above 1% indicates a problem. For latency, compare p99 against the service baseline: checkout is typically under 500ms, product-search under 200ms. A sudden p99 spike within 15 minutes of a deploy suggests the deploy caused a regression.",
  "tags": ["apm", "reliability", "health"],
  "configuration": {
    "query": "FROM traces-apm-* | WHERE service.name == ?serviceName AND @timestamp >= NOW() - ?timeWindow AND transaction.duration.us IS NOT NULL | STATS total_transactions = COUNT(*), error_count = SUM(CASE(event.outcome == \"failure\", 1, 0)), p50_latency_ms = PERCENTILE(transaction.duration.us, 50) / 1000, p95_latency_ms = PERCENTILE(transaction.duration.us, 95) / 1000, p99_latency_ms = PERCENTILE(transaction.duration.us, 99) / 1000 BY service.name | EVAL error_rate_pct = ROUND(error_count / total_transactions * 100, 2) | EVAL throughput_per_min = ROUND(total_transactions / ?windowMinutes, 1)",
    "params": {
      "serviceName": {
        "type": "keyword",
        "description": "The APM service name to check (e.g., opbeans-node)"
      },
      "timeWindow": {
        "type": "keyword",
        "description": "Time window to analyze, in ES|QL duration format (e.g., 30 minutes, 1 hour, 6 hours)"
      },
      "windowMinutes": {
        "type": "integer",
        "description": "Time window in minutes, used to calculate throughput per minute"
      }
    }
  }
}

The query uses the traces-apm-* data stream, which contains raw transaction data. We filter with transaction.duration.us IS NOT NULL to select only transaction events (excluding spans). Using traces-apm-* is more portable than the pre-aggregated metrics-apm.transaction.1m-* stream, which only populates after sustained traffic.

Notice the description field. It is not just "returns health metrics." It includes interpretation rules: what error rate thresholds mean, what latency baselines look like, and how to correlate spikes with deploys. This is the runbook encoded in the tool.

Building Tool 2: get_recent_deploys

This tool answers: "What has been deployed recently?" Deploy history is a critical context because most production issues correlate with code changes. The agent needs this to reason about whether current metrics are normal or reflect a recent deployment.

Deploy annotations are stored in the observability-annotations index. Here is the full tool configuration:

{
  "id": "get_recent_deploys",
  "type": "esql",
  "description": "Returns the deployment history for a service over the last 24 hours, including version numbers, timestamps, and deploy messages. Use this tool to understand the deployment timeline when assessing service health. Key patterns: if a deploy happened within the last 15 minutes, elevated error rates or latency may be expected (warm-up period). If metrics degraded immediately after a deploy, the deploy is the likely cause. Multiple deploys in a short window (under 2 hours) increase risk because it becomes harder to isolate which change caused an issue.",
  "tags": ["apm", "deploys", "change-tracking"],
  "configuration": {
    "query": "FROM observability-annotations | WHERE service.name == ?serviceName AND @timestamp >= NOW() - 24 hours | SORT @timestamp DESC | KEEP @timestamp, service.version, service.environment, message | LIMIT 10",
    "params": {
      "serviceName": {
        "type": "keyword",
        "description": "The APM service name to check deploy history for"
      }
    }
  }
}

Again, the description encodes domain knowledge: the 15-minute warm-up window, the correlation between deploys and metric changes, and the risk of multiple rapid deploys. This is how a platform engineer transfers their intuition into something an AI agent can reason with.

Building Tool 3: get_slo_status

This tool answers: "How much error budget do we have left?" SLO budget is the platform team's quantified way of expressing risk tolerance. If the budget is nearly spent, even a small change could cause a violation.

Unlike the previous tools that query APM data, this one queries the internal SLO indices where Elastic stores pre-computed SLI data. The query calculates the current burn rate, that is, how fast the service is consuming error budget relative to the allowed threshold:

{
  "id": "get_slo_status",
  "type": "esql",
  "description": "Returns the current SLO burn rate for a service over the last hour. The response includes: SLI value (current performance), error budget target, and burn rate percentage. The burn rate tells you how fast the service is consuming error budget relative to the allowed threshold. Interpretation: a burn rate below 100% means the service is consuming budget slower than the limit (sustainable). Between 100-200%, the service is burning budget faster than planned (proceed with caution). Above 200%, the service is burning budget at double the allowed rate (delay non-critical changes). Above 500%, investigate immediately. Note: this measures the current burn rate over the last hour, not cumulative budget consumption over the full SLO window. A temporarily high burn rate does not mean the overall budget is exhausted.",
  "tags": ["slo", "reliability", "budget"],
  "configuration": {
    "query": "FROM .slo-observability.sli-v* | WHERE slo.id == ?sloId AND @timestamp >= NOW() - 1 hour | STATS sli_value = AVG(slo.numerator) / AVG(slo.denominator) BY slo.id, slo.name | EVAL error_budget_target = 0.995 | EVAL burn_rate_pct = ROUND((1 - sli_value) / (1 - error_budget_target) * 100, 1)",
    "params": {
      "sloId": {
        "type": "keyword",
        "description": "The SLO identifier. Use the SLO ID for the service you are evaluating."
      }
    }
  }
}

Note on the SLI index: the version suffix in .slo-observability.sli-v* depends on your Stack release (e.g., v3.6 in Stack 9.3). Verify with GET _cat/indices/.slo-observability.*?v and adjust the pattern if your cluster uses a different version.

The burn rate interpretation rules in the description are the most valuable part. A raw number like "burn rate 85%" means nothing to a developer without context. The tool description translates that into actionable guidance: "below 100% means sustainable, above 200% means delay non-critical changes."

Connecting to your editor via MCP

With all three tools created in Agent Builder, they are automatically available through the MCP server endpoint. Configure your MCP client to connect.

Claude Code configuration

Add the Elastic MCP server to your Claude Code settings:

{
  "mcpServers": {
    "elastic-agent-builder": {
      "command": "npx",
      "args": [
        "mcp-remote",
        "https://your-kibana-url/api/agent_builder/mcp",
        "--header",
        "Authorization: ApiKey your-base64-api-key"
      ]
    }
  }
}

Cursor configuration

For Cursor, add the server in Settings > MCP Servers:

{
  "mcpServers": {
    "elastic-agent-builder": {
      "command": "npx",
      "args": [
        "mcp-remote",
        "https://your-kibana-url/api/agent_builder/mcp",
        "--header",
        "Authorization: ApiKey your-base64-api-key"
      ]
    }
  }
}

Once connected, your editor's AI agent will discover all three tools automatically. You can verify by asking: "What Elastic tools do you have available?" The agent should list get_endpoint_health, get_recent_deploys, and get_slo_status.

API key permissions: the API key needs the feature_agentBuilder.read Kibana privilege and read access to the relevant indices (traces-apm.*, observability-annotations, .slo-observability.*). For production use, set the key expiry to 30-90 days and follow the principle of least privilege.

The scenario: "Is it safe to merge this PR?"

A developer on the team has a pull request that increases the timeout for the downstream recommendations service call from 2 seconds to 5 seconds in opbeans-node. Before merging, they ask the agent:

Developer: "I'm about to merge PR #42, which increases the recommendations service timeout from 2s to 5s in opbeans-node. Is it safe to merge right now?"

The agent begins its multi-signal reasoning chain. Here is what happens.

Step 1: the agent calls get_endpoint_health

The agent checks the current health of the service:

Step 2: the agent calls get_recent_deploys

Next, it checks for recent deployments:

Step 3: the agent calls get_slo_status

The agent's response

After correlating all three results, the agent produces a recommendation:

The agent pulled the current p99, checked recent deploys, and read the SLO burn rate. It combined those signals with the timeout change in the PR, flagged the merge as risky, and recommended next steps.

Conclusion: when to use MCP tools instead of pinging an SRE

With Elasticsearch, Agent Builder, and MCP, a developer can answer questions like "is it safe to merge this PR?" from inside their editor, in seconds, without pinging an SRE. Elasticsearch holds the signals: traces, deploy markers, and SLO budgets. Agent Builder is where the platform team encodes how to read those signals: the thresholds, the warm-up windows, the correlation rules. MCP is what carries those tools into the developer's editor.

The query pulls the data. The description tells the agent how to read it. The platform engineer writes the runbook once, and every developer on the team gets to use it.

Next steps: extend Elastic Agent Builder MCP tools to CI/CD

Share this article