Token usage in Elastic Agent Builder
Serverless Elasticsearch Serverless Observability Serverless Security Stack
When working with Elastic Agent Builder, total token usage typically exceeds the visible conversation text. Because Elastic Agent Builder uses an agentic framework, a single user request often triggers multiple model calls to process reasoning steps, run tools, and interpret results.
Token counts include:
- Input Tokens: These accumulate throughout the session. They include the user's current query, the conversation history from previous rounds, system prompts, and the results returned from any tools used during execution.
- Output Tokens: These include the final response visible to the user, as well as all internal reasoning steps, tool calls, and intermediate results generated by the model.
Each conversation round includes all previous rounds as context. This means token usage at each step depends on the entire conversation size, not only the current message.
For more information on billing and token costs, refer to Elastic pricing.
At the end of each round, the total token usage is displayed after the agent response:
To view the raw JSON response including detailed token information, click the View JSON button. This opens a modal with the complete, raw response data: