Loading

Token usage in Elastic Agent Builder

Serverless Elasticsearch Preview Serverless Observability Unavailable Serverless Security Unavailable Stack Preview 9.2.0

When working with Elastic Agent Builder, total token usage typically exceeds the visible conversation text. Because Elastic Agent Builder uses an agentic framework, a single user request often triggers multiple model calls to process reasoning steps, run tools, and interpret results.

Token counts include:

  • Input Tokens: These accumulate throughout the session. They include the user's current query, the conversation history from previous rounds, system prompts, and the results returned from any tools used during execution.
  • Output Tokens: These include the final response visible to the user, as well as all internal reasoning steps, tool calls, and intermediate results generated by the model.
Note

Each conversation round includes all previous rounds as context. This means token usage at each step depends on the entire conversation size, not only the current message.

For more information on billing and token costs, refer to Elastic pricing.

At the end of each round, the total token usage is displayed after the agent response:

Screenshot of the token usage display, showing input and output token counts

To view the raw JSON response including detailed token information, click the View JSON button. This opens a modal with the complete, raw response data:

Screenshot of the JSON raw response modal