Agent Builder is available now GA. Get started with an Elastic Cloud Trial, and check out the documentation for Agent Builder here.
When working with large knowledge bases in Elasticsearch, finding information is only half the battle. Engineers often need to synthesize results from multiple documents, generate summaries, and trace answers back to their sources. Model Context Protocol (MCP) provides a standardized way to connect Elasticsearch with large language model–powered (LLM-powered) applications to accomplish this. While Elastic offers official solutions, like Elastic Agent Builder (which includes an MCP endpoint among its features), building a custom MCP server gives you full control over search logic, result formatting, and how retrieved content is passed to an LLM for synthesis, summaries, and citations.
In this article, we’ll explore the benefits of building a custom Elasticsearch MCP server and show how to create one in TypeScript that connects Elasticsearch to LLM-powered applications.
Why build a custom Elasticsearch MCP server?
Elastic provides some alternatives for MCP servers:
- Elastic Agent Builder MCP server for Elasticsearch 9.2+
- Elasticsearch MCP server for older versions (Python)
If you need more control over how your MCP server interacts with Elasticsearch, building your own custom server gives you the flexibility to tailor it exactly to your needs. For example, Agent Builder's MCP endpoint is limited to Elasticsearch Query Language (ES|QL) queries, while a custom server allows you to use the full Query DSL. You also gain control over how results are formatted before being passed to the LLM and can integrate additional processing steps, like the OpenAI-powered summarization we'll implement in this tutorial.
By the end of this article, you’ll have an MCP server in TypeScript that searches for information stored in an Elasticsearch index, summarizes it, and provides citations. We'll use Elasticsearch for retrieval, OpenAI's gpt-4o-mini model to summarize and generate citations, and Claude Desktop as the MCP client and UI to take in user queries and give responses. The end result is an internal knowledge assistant that helps engineers discover and synthesize best practices across their organization’s technical docs.

Prerequisites:
- Node.js 20 +
- Elasticsearch
- OpenAI API key
- Claude Desktop
What is MCP?
MCP is an open standard, created by Anthropic, that provides secure, bidirectional connections between LLMs and external systems, like Elasticsearch. You can read more about the current state of MCP in this article.
The MCP landscape is evolving every day, with servers available for a wide range of use cases. On top of that, it’s easy to build your own custom MCP server, as we’ll show in this article.
MCP clients
There’s a long list of available MCP clients, each with its own characteristics and limitations. For simplicity and popularity, we’ll use Claude Desktop as our MCP client. It will serve as the chat interface where users can ask questions in natural language, and it will automatically invoke the tools exposed by our MCP server to search documents and generate summaries.

Creating an Elasticsearch MCP server
Using the TypeScript SDK, we can easily create a server that understands how to query our Elasticsearch data based on a user query input.
Here are the steps in this article to integrate the Elasticsearch MCP server with the Claude Desktop client:
Configure MCP server for Elasticsearch
To begin, let's initialize a node application:
This will create a package.json file, and with it, we can start installing the necessary dependencies for this application.
- @elastic/elasticsearch will give us access to the Elasticsearch Node.js library.
- @modelcontextprotocol/sdk provides the core tools to create and manage an MCP server, register tools, and handle communication with MCP clients.
- openai allows interaction with OpenAI models to generate summaries or natural language responses.
- zod helps define and validate structured schemas for input and output data in each tool.
ts-node, @types/node, and typescript will be used during development to type the code and compile the scripts.
Set up the dataset
To provide the data that Claude Desktop can query using our MCP server, we’ll use a mock internal knowledge base dataset. Here’s what a document from this dataset will look like:
To ingest the data, we prepared a script that creates an index in Elasticsearch and loads the dataset into it. You can find it here.
MCP server
Create a file named index.ts and add the following code to import the dependencies and handle environment variables:
Also, let’s initialize the clients to handle the Elasticsearch and OpenAI calls:
To make our implementation more robust and ensure structured input and output, we'll define schemas using zod. This allows us to validate data at runtime, catch errors early, and make the tool responses easier to process programmatically:
Learn more about structured outputs here.
Now let’s initialize the MCP server:
Defining the MCP tools
With everything configured, we can start writing the tools that will be exposed by our MCP server. This server exposes two tools:
search_docs: Searches for documents in Elasticsearch using full-text search.summarize_and_cite: Summarizes and synthesizes information from previously retrieved documents to answer a user question. This tool also adds citations referencing the source documents.
Together, these tools form a simple “retrieve-then-summarize” workflow, where one tool fetches relevant documents and the other uses those documents to generate a summarized, cited response.
Tool response format
Each tool can accept arbitrary input parameters, but it must respond with the following structure:
- Content: This is the response of the tool in an unstructured format. This field is usually used to return text, images, audio, links, or embeddings. For this application, it will be used to return formatted text with the information generated by the tools.
- structuredContent: This is an optional return used to provide the results of each tool in a structured format. This is useful for programmatic purposes. Although it isn't used in this MCP server, it can be useful if you want to develop other tools or process the results programmatically.
With that structure in mind, let’s dive into each tool in detail.
Search_docs tool
This tool performs a full-text search in the Elasticsearch index to retrieve the most relevant documents based on the user query. It highlights key matches and provides a quick overview with relevance scores.
We configure fuzziness: “AUTO” to have a variable typo tolerance based on the length of the token that’s being analyzed. We also set title^2 to increase the score of the documents where the match happens on the title field.
summarize_and_cite tool
This tool generates a summary based on documents retrieved in the previous search. It uses OpenAI’s gpt-4o-mini model to synthesize the most relevant information to answer the user’s question, providing responses derived directly from the search results. In addition to the summary, it also returns citation metadata for the source documents used.
Finally, we need to start the server using stdio. This means the MCP client will communicate with our server by reading and writing to its standard input and output streams. stdio is the simplest transport option and works well for local MCP servers launched as subprocesses by the client. Add the following code at the end of the file:
Now compile the project using the following command:
This will create a dist folder, and inside it, an index.js file.
Load the MCP server into Claude Desktop
Follow this guide to configure the MCP server with Claude Desktop. In the Claude configuration file, we need to set the following values:
The args value should point to the compiled file in the dist folder. You also need to set the environment variables in the configuration file with the exact same names defined in the code.
Test it out
Before executing each tool, click on Search and Tools to make sure that the tools are enabled. Here you can also enable or disable each one:

Finally, let’s test the MCP server from the Claude Desktop chat and start asking questions:

For the question “Search for documents about authentication methods and role-based access control”, the search_docs tool is executed and returns the following results:
The response is, “Great! I found 5 relevant documents about authentication methods and role-based access control. Here's what was found:”
The tool call returns the source documents as part of its response payload, which are later used to generate citations.

It’s also possible to chain multiple tools in a single interaction. In this case, Claude Desktop analyzes the user's question and determines that it needs to first call search_docs to retrieve relevant documents and to then pass those results to summarize_and_cite to generate the final answer, all without requiring separate prompts from the user:

In this case, for the query “What are the main recommendations to improve authentication and access control across our systems? Include references.”, we obtained the following results:
As in the previous step, we can see the response from each tool for this question:

Note: If a submenu appears asking whether you approve the use of each tool, select Always allow or Allow once.

Conclusion
MCP servers represent a significant step toward standardizing LLM tools for both local and remote applications. Though full compatibility is still in the works, we’re moving fast in that direction.
In this article, we learned how to build a custom MCP server in TypeScript that connects Elasticsearch to LLM-powered applications. Our server exposes two tools: search_docs for retrieving relevant documents using Query DSL; and summarize_and_cite for generating summaries with citations via OpenAI models and Claude Desktop as client UI.
The future of compatibility between different client and server providers looks promising. Next steps include adding more functionalities and flexibility to your agent. There’s a practical article on how you can parameterize your queries using search templates to gain precision and flexibility.




