How to trace MCP server tool calls with OpenTelemetry and Elastic APM

An MCP server is just a Node process, which means OpenTelemetry instrumentation is one --import flag away. What is new is what happens after the traces land in Elastic APM. The same Claude Desktop session that produced them can query them back through the Elastic Agent Builder MCP. The agent analyzes its own tool-call latency, identifies slow tools, and explains failures without leaving the chat. Observability stops being a dashboard that the human checks after the fact and becomes the context the agent uses while working. This post walks through the OTel semantic conventions for MCP, the wrapper pattern for tool spans, and how the loop closes on the Elastic side.

Prerequisites

Elastic Cloud hosted (9.3+) or serverless
Claude Desktop
An MCP server instrumented with the Elastic Distribution of OpenTelemetry (EDOT). We cover how to instrument one below.

The observability gap in MCP servers

MCP servers ship with no built-in observability, which means tool-call latency, errors, and performance baselines are invisible to developers. MCP (Model Context Protocol) servers are increasingly used as infrastructure for AI-powered applications, giving AI models access to databases, APIs, internal tools, and business data. The MCP SDK does not instrument any of it.

The gap shows up in three concrete ways:

When a tool call takes 3 seconds, you don't know if the bottleneck is in your business logic, a downstream API, or the data layer.
When a tool call fails, you get the error message but no context about what the server was doing before it failed.
When you add a new tool, you have no baseline to compare its performance against.

These are the same problems any backend service faces. The answer is the same one backend developers have used for years: distributed tracing with OpenTelemetry.

MCP servers are standard programmatic processes. There is nothing special about them from an instrumentation perspective. You add the OTel SDK, define spans around your tool handlers, and ship traces to your backend. The only new part is knowing which span names and attributes to use so your traces are meaningful and consistent.

What we built

For this article, we use the @modelcontextprotocol/server-everything package, the official reference MCP server published by Anthropic. It ships with a set of tools that cover the common patterns you will find in real-world MCP servers: simple request/response, parameterized calls, long-running operations, and calls that return structured data.

The server exposes several tools. In this article we use three of them:

echo: receives a string and returns it unchanged. A minimal request/response tool, useful for verifying that the instrumentation pipeline works end to end.
get-sum: receives two numbers and returns their sum. Represents a parameterized tool with simple business logic.
trigger-long-running-operation: starts a multi-step operation that takes several seconds to complete. Simulates tools that call downstream APIs or run expensive computations.

We instrumented it with EDOT Node.js (@elastic/opentelemetry-node), which is Elastic's distribution of the OpenTelemetry SDK. EDOT replaces the five or six individual OTel packages you would otherwise install and adds the elasticapm connector that the Kibana APM UI needs to build its service maps, transaction groupings, and latency charts. Without that connector, raw OTLP data arrives in Elasticsearch but the APM views have nothing to build from.

The architecture looks like this:

This is the loop: Claude executes tools, generates telemetry, and then uses a second MCP to analyze that telemetry. The observability data becomes something the AI can reason about, not just something that sits in a dashboard waiting for a human to check it.

Both MCP servers are active simultaneously in Claude Desktop. The instrumented MCP generates telemetry. The Agent Builder MCP lets us query it.

To connect Claude Desktop to both servers, the claude_desktop_config.json looks like this:

{
  "mcpServers": {
    "everything": {
      "command": "node",
      "args": [
        "--import",
        "/path/to/node_modules/@elastic/opentelemetry-node/import.mjs",
        "/path/to/everything/dist/index.js",
        "stdio"
      ],
      "env": {
        "OTEL_SERVICE_NAME": "everything-mcp-server",
        "OTEL_EXPORTER_OTLP_ENDPOINT": "https://<your-otlp-endpoint>",
        "OTEL_EXPORTER_OTLP_HEADERS": "Authorization=ApiKey <your-api-key>",
        "OTEL_LOG_LEVEL": "none"
      }
    },
    "elastic-agent-builder": {
      "command": "npx",
      "args": [
        "mcp-remote",
        "https://<your-kibana-url>/api/agent_builder/mcp",
        "--header",
        "Authorization:ApiKey <your-api-key>"
      ]
    }
  }
}

The --import /path/to/@elastic/opentelemetry-node/import.mjs flag is all it takes for zero-code auto-instrumentation. But auto-instrumentation only captures HTTP calls, database queries, and other Node.js instrumented libraries. MCP tool calls are application logic, and application logic needs manual spans.

What traces MCP tool calls generate

The OpenTelemetry specification includes official semantic conventions for MCP. Following them means your traces are consistent, searchable by name across tools and teams, and compatible with any OTel-aware backend, including Elastic APM.

Span naming follows the pattern {mcp.method.name} {target}. For a tool call, this becomes tools/call echo or tools/call get-sum. This is what you will see as the transaction name in Kibana APM.

Key attributes on each span:

Attribute	Value	Purpose
`mcp.method.name`	`tools/call`	The MCP protocol method
`gen_ai.tool.name`	`echo`	The specific tool invoked
`gen_ai.operation.name`	`execute_tool`	GenAI semantic convention
`error.type`	error class name	Set only on failure

The wrapper pattern that creates these spans looks like this:

const tracer = trace.getTracer('everything-mcp-server', '1.0.0');

function withToolSpan(toolName, fn) {
  return tracer.startActiveSpan(`tools/call ${toolName}`, (span) => {
    span.setAttribute('mcp.method.name', 'tools/call');
    span.setAttribute('gen_ai.tool.name', toolName);
    span.setAttribute('gen_ai.operation.name', 'execute_tool');

    try {
      const result = fn();
      span.end();
      return result;
    } catch (err) {
      span.recordException(err);
      span.setStatus({ code: SpanStatusCode.ERROR, message: err.message });
      span.setAttribute('error.type', err.constructor.name);
      span.end();
      throw err;
    }
  });
}

Each tool handler wraps its logic in withToolSpan. The result is a named span in Elastic APM for every tool invocation, with duration, status, and error details attached.

Security note: The OTel spec defines two optional attributes for tool calls: gen_ai.tool.call.arguments and gen_ai.tool.call.result. Both are flagged as potentially containing sensitive data. The get-env tool in the everything server is a good example of why this matters: it returns all environment variables, which may include API keys and credentials. Capture these attributes only if you have confirmed the data is safe to store in your observability backend, and consider masking or filtering at the SDK level before export.

Kibana APM: exploring MCP server performance

After starting Claude Desktop with both MCPs configured and triggering a few tool calls, the everything-mcp-server service appears in Kibana under Observability > Applications > Services Inventory:

Transactions view

Kibana groups traces by transaction name. Because we follow the semantic conventions, each tool gets its own row: tools/call echo, tools/call get-sum, tools/call trigger-long-running-operation. You can immediately see latency, throughput, and error rate per tool without any configuration.

This is where the value of consistent span naming becomes concrete. If you have five different developers adding tools to an MCP server and everyone follows tools/call {toolName}, the APM UI stays organized automatically.

Trace waterfall

Clicking on a specific trace shows the waterfall view. For a single tool call, the waterfall is straightforward: one span covering the full execution. If your tool handler makes downstream HTTP requests or database queries that are auto-instrumented, those appear as child spans. You can see exactly how much time was spent in business logic versus waiting for external calls.

Latency distribution

The latency chart shows p50, p95, and p99 distribution across all executions of a given tool. This makes it easy to distinguish between tools that are consistently fast and those that have occasional outliers. The trigger-long-running-operation tool, for example, shows a wide distribution depending on how many steps were requested: a useful baseline for understanding expected execution time ranges before setting alerts.

Error tracking

Failed tool calls appear in the Errors panel with their full stack trace, the span attributes attached at the time of failure, and a count of how many times the error has occurred. If you record the exception with span.recordException(err), Kibana links the error directly to the trace that produced it.

Closing the loop with the Agent Builder MCP

The Elastic Agent Builder MCP lets Claude query its own trace data from Elasticsearch in the same chat session that produced the traces. The Agent Builder MCP can query any Elasticsearch index the API key has access to, and APM traces are stored under .ds-traces-apm.otel-default-*. Granting the Agent Builder API key read access to those indices is what closes the loop: the agent that executed the tool calls can now reason about how they performed.

Here is what this looks like in practice. To generate traces, let's ask in Claude Desktop: "Use the echo tool to say hello, then use get-sum to add 1337 and 42, then run a long-running operation with 3 steps."

Claude executes three tool calls on the instrumented MCP. Three spans land in Elastic APM.

Now, without leaving the chat, let's try querying the traces by asking: "Search the APM trace data from the last 10 minutes. What tool calls were made, and how long did each one take?"

We can confirm the information against the services data:

Claude uses the Agent Builder MCP to run a query against the traces index. It returns the tool names, durations, and status from the actual trace data, then synthesizes an answer in natural language.

You can go further by asking:

"Which of those tool calls had the highest p95 latency?"
"Did any tool calls fail? If so, what was the error message?"
"Compare the latency of the echo tool vs get-sum across all calls in the last hour."

Each of these questions translates into an ES|QL query via the Agent Builder's platform.core.execute_esql tool, run against the APM trace indices.

Why Elastic for MCP observability

The Agent Builder closes the loop: this is the part that is specific to the Elastic ecosystem. Because APM data lives in Elasticsearch, and Elasticsearch is queryable via the Agent Builder MCP, you can bring your AI agent's own observability data back into the conversation. Your AI can reflect on its own performance and spot anomalies.

APM UI built for distributed tracing: Kibana's APM interface is designed for exactly this kind of data: named transactions, trace waterfalls, latency percentiles, error tracking with stack traces, and service maps.

Managed OTLP endpoint: Elastic APM accepts OTLP directly since Elastic 8.x. You point OTEL_EXPORTER_OTLP_ENDPOINT at your APM server and it works.

EDOT simplifies the setup: the elasticapm connector is included, which means the APM UI views work without any additional configuration.

Conclusion

MCP servers do not need special observability tooling. They are programmatic processes, and OpenTelemetry is the right instrument for processes. The OTel MCP semantic conventions are stable and give you a consistent naming scheme that scales across tools and teams.

What makes the Elastic setup interesting is not the instrumentation itself. It is the second MCP. When your observability data lives in Elasticsearch, you can query it from the same AI session that generated it. That feedback loop is new, and it opens up use cases that dashboards alone cannot cover: real-time anomaly questions, automated triage, and AI-assisted incident investigation from the chat interface your team is already using.

Next steps

Frequently asked questions

How do I add tracing to an MCP server? Use the OpenTelemetry SDK and follow the official OTel MCP semantic conventions. With the Elastic Distribution of OpenTelemetry (EDOT) for Node.js, a single --import flag enables auto-instrumentation for HTTP and database calls. Tool-call spans need to be added manually using a wrapper that sets mcp.method.name, gen_ai.tool.name, and gen_ai.operation.name.

Why are my MCP tool calls slow and how do I find the bottleneck? Without tracing, an MCP tool call is a black box: you see the result but not where the time went. Instrument the server with OpenTelemetry, ship traces to Elastic APM, and use the trace waterfall view in Kibana to see exactly how much time was spent in business logic versus downstream HTTP or database calls.

Can I send MCP server traces to Elastic APM without a custom collector? Yes. Elastic APM accepts OTLP directly. Set OTEL_EXPORTER_OTLP_ENDPOINT to your APM endpoint and OTEL_EXPORTER_OTLP_HEADERS with an API key, and traces flow in. EDOT bundles the elasticapm connector that the Kibana APM UI needs for service maps and transaction grouping.

What span names and attributes should I use for MCP tool calls? Follow the OpenTelemetry MCP semantic conventions: name spans {mcp.method.name} {target} (for example, tools/call echo), and set mcp.method.name, gen_ai.tool.name, and gen_ai.operation.name=execute_tool. On failure, set error.type to the error class name. Consistent naming means the Elastic APM transactions view groups your tool calls automatically.

How is this different from sending MCP traces to Datadog or Grafana? Any OTel-compatible backend can receive the traces. The Elastic-specific part is the Agent Builder MCP, which lets the same AI agent that generated the traces query them back from Elasticsearch in natural language. That feedback loop, where the AI reasons about its own tool-call performance, is not available with backends that do not expose their data through an MCP server.

Should I capture MCP tool call arguments and results in my traces? The OTel spec defines gen_ai.tool.call.arguments and gen_ai.tool.call.result as optional and warns they may contain sensitive data. Tools like get-env, which returns environment variables, illustrate the risk: API keys and credentials can land in your observability backend. Capture these only when the data is safe to store, and consider masking at the SDK level before export.

Does this work for MCP servers written in languages other than Node.js? The OpenTelemetry MCP semantic conventions are language-agnostic. EDOT is available for Node.js, Java, Python, .NET, and other languages, and any of them can send OTLP to Elastic APM. The wrapper pattern shown in this post translates directly: open a span around the tool handler, set the standard attributes, record exceptions on failure.