Context length exceeded in Elastic Agent Builder conversations

A context_length_exceeded error occurs when a conversation exceeds the maximum context length supported by the LLM. It typically happens when tool responses return large amounts of data that consume the available token budget.

Broad questions or data spread across many indices can also cause slow responses or incomplete answers, even before hitting the context limit.

Symptoms

In the UI, you might encounter messages such as:

This conversation exceeded the maximum context length. This typically occurs when tools return a large response. Try again with a different request or start a new conversation.
Something in the query caused the model to freeze mid-thought. Performance debugging can be broad - try narrowing your question.

The equivalent API error is errCode: context_length_exceeded. For example:

		{
  "error": {
    "code": "agentExecutionError",
    "message": "The request exceeded the model's maximum context length...",
    "meta": {
      "errCode": "context_length_exceeded"
    }
  }
}
		
	

You might also experience:

Slow agent responses
Incomplete or failed answers to broad questions
Agent timing out during data retrieval

Causes

These symptoms share a root cause: the agent is retrieving more data than it can efficiently process. Common causes include:

Broad queries with built-in agents: Built-in agents search across many indices and fields. If you ask a vague question, the agent may retrieve more data than can fit in the context window.
Index search tools with large indices: Index search tools allow the LLM to decide what to retrieve. If your indices contain large documents, many fields, or high document counts, these tools can return more data than can fit in the context window.
Aggregation-style questions: Questions like "summarize all errors from last week" or "compare metrics across all services" force the agent to retrieve data from many documents at once.
Long conversations: Each message adds to the context. A long back-and-forth conversation can exhaust the token budget even if individual tool responses are small.

Diagnosis

To identify which factor is contributing to the issue:

Check which tools your agent uses. Built-in agents and index search tools can return large responses for broad queries.
Review the indices your agent queries. Large documents, many fields, or high document counts increase the risk of exceeding context limits with index search tools.
Consider your prompt patterns. Broad or aggregation-style questions require more data retrieval.
Check conversation length. Long conversations accumulate context from previous messages. You can view token usage after each response to monitor consumption.

Resolution

There are a few quick fixes you can try to mitigate this issue, or you can create custom agents with custom tools that target only the data you need.

Quick fixes

Write more targeted prompts: Narrow your chat questions to reduce the scope of data retrieval. Specific questions retrieve less data than exploratory questions.
Add index metadata descriptions: Help agents select more appropriate indices by adding _meta.description fields to your index mappings.
Start a new conversation: If you've been working in a long conversation, begin a fresh one. You can optionally provide a brief summary of relevant context from the previous conversation.
Switch to a model with a larger context window: Some LLMs support larger context windows that can accommodate bigger tool responses. Refer to Model configuration in Elastic Agent Builder for options.

Use custom agents with custom tools

The most effective long-term solution is to create a custom agent with ES|QL tools that retrieve only the data you need.

Create custom ES|QL tools with targeted queries

ES|QL tools give you precise control over what data is retrieved. Instead of letting the agent dynamically decide what to retrieve, you define exactly which fields to include and how many results to return.

For example, try creating purpose-built tools that:
- Return only identifier fields (like IDs and names) for initial searches
- Retrieve full details only for specific records
- Filter or aggregate data before retrieval
Always include a LIMIT clause in your ES|QL queries to cap the number of results.

Tip

Learn more in creating custom tools.
Create a custom agent

Create a new agent to use your custom tools. Refer to Custom agents in Elastic Agent Builder for instructions.
Assign tools and refine instructions

Assign your custom ES|QL tools to the custom agent. Update the agent's system prompt to guide how it uses the tools.