Creating reliable agents with structured outputs in Elasticsearch

Explore what structured outputs are and how to leverage them in Elasticsearch to ground agents in the most relevant context for data contracts.

Get hands-on with Elasticsearch: Dive into our sample notebooks, start a free cloud trial, or try Elastic on your local machine now.

We’re quickly moving from simple chatbots to agents that can take real, consequential actions on your systems. To make sure these agents are dependable, we can’t rely purely on the free-form text anymore. The ability to generate predictable, machine-readable outputs has become an important layer in building reliable AI agents. Structured outputs are also a key layer in context engineering, which is a set of strategies that ensure LLMs are grounded in the most relevant information for their task. Together, these patterns help turn LLMs from simple conversational tools into reliable components that you can safely integrate into larger systems. In this piece, we’ll walk through what structured outputs are and how they can be leveraged to provide reliable output that meets key contracts. If you’re new to context engineering, check out our article here.

Structured outputs

Structured outputs are LLM responses that conform to predefined schemas or data structures, instead of free-form text. Rather than receiving unpredictable responses, developers can specify exactly how a response should be formatted.

In the example below, if you give an LLM access to your indices within Elasticsearch and ask it to, “analyze this Elasticsearch index”, it’ll respond with a narrative explanation that’s likely to change each time you ask it the same prompt. With structured outputs, you can request a response with specific fields like indexName, documentCount, healthStatus, etc., each with defined types and validation rules.

The structured format on the right can be immediately validated against a schema, without adding the extra step of parsing the text.

The largest model providers are quickly taking note of how important structured outputs are becoming, with Google, OpenAI, and Anthropic releasing support for structured outputs in each of their respective APIs. OpenAI goes beyond by releasing new models that are trained to better understand and adhere to complex schemas. Below is OpenAI’s evaluation of how well their models follow complex JSON schemas.

In this visual, their model gpt-40-2024-08-06 with structured outputs has a perfect score, while their previous model gpt-4-0613 without structured outputs scores below 40%.

How this affects multi-agent systems

Imagine a scenario where a system of agents passed around unstructured, free-form data. Each agent would need custom parsing logic to understand the responses from other agents, which not only bloats your token usage but will surely break in practice. This is compounded by the fact that LLM outputs are probabilistic and therefore unpredictable. How can we trust this type of system to take real actions on our systems? Structured outputs define a contract between agents, replacing ambiguity with reliable, predictable behavior.

The importance of standardization for AI agents and MCP

OpenAPI revolutionized REST API development by giving developers a shared, predictable way to describe endpoints, parameters, and responses. Structured outputs bring that same idea to AI agents by providing contracts that standardize how data is exchanged between agents and systems.

These contracts ensure that:

  • Downstream systems can parse responses reliably: When an agent needs to perform an action like updating a database, calling an API, or triggering a workflow, the receiving system must be able to trust the shape and integrity of the data.
  • Type safety is maintained: Structured outputs enable compile-time or runtime validation, catching errors before they propagate throughout the system and turn into bigger problems.
  • Integration is predictable: With defined schemas, integrating agents into existing infrastructure follows patterns developers already know from traditional API development.
  • Multi-agent systems can understand each other: When multiple agents need to collaborate, structured outputs provide a common language for exchanging information.

The Model Context Protocol (MCP) extends this by standardizing how agents exchange context between models, tools, and applications. When agents communicate via MCP, structured outputs ensure that the context being shared maintains its structure across systems.

MCP is responsible for the transport and the lifecycle of context, while structured outputs define the shape and constraints of the data within the context.

Together, they enable:

  • Composable agents can be reused or replaced
  • Clear contracts between models, tools, and applications
  • More reliable automation, especially when we need agents to trigger real-world actions
  • Scalable multi-agent architectures

Other emerging protocols like the agent-to-agent (A2A) protocol also emphasize schemas and contracts to enable reliable communication directly between agents.

MCP and other protocols like A2A alongside structured outputs bring what OpenAPI brought to microservices, which is a shared contract that turns ad-hoc integrations into reliable systems.

Technologies for creating schemas

Now, how do we actually implement structured outputs? Luckily, popular ecosystems like Python and JavaScript already have mature schema and validation libraries that make implementing structured outputs easier. You can use these tools today to control the shape of data your LLM returns, validate it at runtime, or reject it if the model hallucinates. In this section, we’ll look at the most common tools developers reach for and what happens under the hood.

Zod and the JavaScript ecosystem

In the JavaScript and TypeScript space, Zod has become the go-to library for schema definition and validation, due to its efficiency, ease of use, and integration with popular AI orchestration frameworks like Vercel’s AI SDK and Mastra. To see this in action, we’ll look at my colleague Carly’s example. Carly used Zod alongside the AI SDK to create a schema that forces the LLM to return itinerary data in a validated, type-safe format.

The schema below does 3 things:

  1. Ensures that the LLM returns valid JSON.
  2. Ensures that the data has the correct types, constraints, and nesting.
  3. Generates application-ready data that doesn’t need extra processing.

Let’s take a closer look at the important parts of this schema.

Trip information

These fields above are simple strings but what we can conclude is that the LLM is not free to make up the structure. The fields title and location have to be included and they must be strings or the response will be rejected.

Hotel details

Take note that amount is defined as a number type and dates use ISO formats, which means the output can be used right away for calculations, sorting, or storage without any extra parsing.

Flight information

Flights are an array of objects because trips usually involve multiple legs. We cap flightNo to 8 characters and use datetime() instead of date() to include departure times.

When we run this, the model should generate a JSON object that looks like:

If the model spits out invalid JSON, breaks a specified constraint, or doesn’t include a required field, the request fails immediately instead of silently pushing through bad data.

Pydantic and the Python ecosystem

For Python developers, Pydantic plays a similar role to Zod in JavaScript/TypeScript, giving you runtime validation and strongly typed structured outputs.

Let’s use the same travel itinerary example, but this time we’ll use Pydantic models and LangChain’s support for structured outputs.

This approach feels pretty similar to the Zod example, where you define a schema once and then rely on the framework to handle validation for you at runtime. The main difference is that Pydantic gives you back actual Python objects instead of plain validated JSON. What you get back is a TravelItinerary instance with nested models and properly typed fields, which tends to fit cleaner with Python-based agent pipelines.

When we run this, the model generates structured data maps onto the Pydantic models, and we should get back an object like:

This JSON should be identical to what we generated using ZOD. Under the hood, this JSON is automatically converted to a TravelItinerary object with nested Hotel, Flight, and Excursion instances. Again, if the model spits out invalid data, breaks a constraint, or doesn’t include a required field, validation fails right away.

Under the hood: JSON schemas

At the API level, all of these approaches essentially convert to JSON schema. Libraries like Zod and Pydantic exist to make defining these schemas intuitive and developer-friendly.

Working directly with raw JSON schemas can still be useful when you need language-agnostic contracts shared across teams or services, but the tradeoff is that you lose native types, composability, and much of the developer experience that the libraries provide.

Combining Elasticsearch with structured outputs

Controlling what the LLM outputs is still only half the battle. Next, we need to know how to make these outputs useful in real systems. Elasticsearch is a natural fit here because it’s designed to work equally well with both structured and unstructured data. This mirrors modern agent architectures where unstructured data provides rich context to power reasoning and retrieval, and structured outputs act as contracts that applications can rely on. Elasticsearch is central to this loop.

Here’s an example of how Elasticsearch fits into this approach:

1. Unstructured inputs

User queries, documents, chat history, logs, or tool traces, for example, are ingested into an Elasticsearch index. To capture both exact text matching and semantic meaning, we’ll use a mix of text fields and vector embeddings as we index this data.

2. Elasticsearch as the context engine

In the moment that an AI agent needs relevant context, it can query Elasticsearch using these different types of search:

  • Semantic/vector search: To search by the underlying meaning of a word.
  • Keyword/text search: For exact matches and filters.
  • Geospatial search: To search by location.
  • Hybrid search: To search using a mix of the above.

3. LLM reasoning

The retrieved context is passed back to the LLM to ground its response in the most relevant data instead of relying on its trained data.

4. Structured output generation

The model is restricted by the schema we created using either Zod or Pydantic, and produces a validated JSON object instead of free-form text.

5. Structured indexing

The validated output is indexed back into Elasticsearch using explicit mappings, making it easier to query, aggregate, and analyze.

6. Reuse and automate

Now that we’ve added structure to the data, it becomes easy to query, filter, aggregate, or use it as input for downstream systems and workflows.

This loop lets agents use Elasticsearch both as the retrieval layer and as a memory store that enables context-driven reasoning, automation, and long-term learning.

Limitations

Marius Schroder’s structured prompting article mentions some limitations with structured prompting that can also be applied to structured outputs.

He mentions that:

  • A schema can guarantee format but not correctness: The model can still output garbage where the JSON is structurally valid but filled with the wrong data. For example, an itinerary schema might require a valid ISO date, a price with a numeric type, and a flight number under 8 characters. The model could still return a flight that’s on February 30th (impossible date), or assign a 10 dollar price to a five-star hotel. In this instance, the structure is valid but the facts are wrong, illustrating that schemas validate the shape of data but not the truth.
  • Complex or deeply nested schemas can still be a point of failure: You can still run into parsing failures or token limits, if the output is large enough, the model might cut off different parts.
  • Not great for creative scenarios: In this case, free-form text might be the better choice especially if you don’t want to handcuff the LLM too much in creative tasks.

Conclusion

This article dives into the importance of providing structured outputs in multi-agent systems, the most common tools developers reach for, and how Elasticsearch can be a natural extension. If you want to learn more, be sure to check out these resources below.

Resources

  1. What is context engineering? | Carly Richmond
  2. LangChain: Structured output
  3. A Hands-On Guide to Anthropic’s New Structured Output Capabilities | Thomas Reed
  4. OpenAI: Introducing structured outputs in the API
  5. OpenAI: Structured outputs
  6. Structured Prompting in real projects — checklist & best practices | Marius Schroder
  7. Improving Structured Outputs in the Gemini API

Related Content

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as you are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself