Software developers are increasingly using machine learning models to improve the relevance of data presented to their users. This is particularly true for applications using natural language interfaces, for example: search, question/answer, completion, and chat.
The Elasticsearch Relevance Engine (ESRE) is a collection of tools from Elastic that combines machine learning models, data transformation and storage (including vectors), and data search and retrieval. ESRE also includes tools for data security, and tools to integrate with other software, including various data sources and large language models (LLMs).
Read on to learn about the components of ESRE, or jump directly to Examples for example applications and implementations.
Machine Learning modelsedit
Machine learning models enable your applications to understand natural language data and enrich or transform that data (at index time and at query time).
Uses of machine learning models include:
- Generating vector embeddings
- Extracting information from unstructured text, such as named entities or the answers to questions
- Classifying text, such as its language or its sentiment (positive or negative)
To perform these operations, you must deploy one or more trained models.
Elastic provides the following relevant features:
- The Elastic Sparse Encoder trained model for general purpose semantic search, without fine-tuning
- Interfaces to deploy and manage third party trained models for vector search and natural language processing
- Cloud infrastructure on which to deploy these models
Elastic Sparse Encoder modeledit
The Elastic Sparse Encoder model is a machine learning model, built and trained by Elastic, which enables general-purpose semantic search for English language data.
At index time, the Elastic Sparse Encoder model enriches each document with an additional text expansion field that uses weighted tokens to capture the relationships between words and their meanings. At query time, when using a text expansion query, the sparse encoder model applies the same transformation to users' query text. The result is semantic search: relevance is based on meaning and intention, rather than strict keyword matching on the original document fields.
The Elastic Sparse Encoder model is a zero shot out of domain machine learning model, which means it does not require additional training or fine-tuning with your own data. Use this Elastic model to get started with semantic search without needing to identify and manage additional models.
Deploy the model, create a pipeline with an inference processor, and ingest (or re-index) your data through the pipeline.
3rd party model managementedit
Many public and private trained models are available to enrich your data, each solving a different problem. For example, Hugging Face catalogs thousands of available models.
To use a third party model with Elastic, you must import and deploy the model, and then create an ingest pipeline with an inference processor to perform data transformation.
Elastic provides the following interfaces to manage the trained models you are using:
- A Kibana UI at Machine Learning > Model Management > Trained Models
Elasticsearch APIs grouped under
- The Eland language client, implemented in Python
Documentation: Deploy trained models
Elastic Cloud ML instancesedit
Elastic Cloud includes infrastructure on which to deploy and run trained models. When creating an Elastic Cloud deployment, enable Machine Learning instances, and optionally enable Autoscaling.
Documentation: Set up machine learning features
Elastic provides the capabilities to store data of various types, including unstructured text and dense vectors (embeddings). Use Elastic to store your data before and after transformation by a machine learning model.
Elastic stores data as documents (with fields) within indices, and supports many field types, including dense vectors. Mapping is the process of defining how a document, and the fields it contains, are stored and indexed.
Use the Index and Document APIs or the Kibana dev tools console to manage data manually. Some ingestion tools, such as the web crawler and connectors, manage indices and documents on your behalf.
Use the Reindex API to re-index data already stored in Elasticsearch, for example, to run the data through a machine learning ingest pipeline.
Vector field typesedit
To store dense vectors for vector search, use the dense vector field type. At query time, use a kNN query to retrieve this information.
To store sparse vectors for text expansion (for example, when using the Elastic Sparse Encoder model), use the rank feature field type. At query time, use a text expansion query to retrieve this information.
Elastic provides tools to transform your data, regardless of how it is stored (within Elastic or outside).
Ingest pipelines are general purpose transformation tools, while inference processors enable the use of machine learning models within these pipelines.
After deploying machine learning models, use these ingestion tools to apply ML transformations to your data as you index or re-index your documents. Extract text, classify documents, or create embeddings and store this data within additional fields.
An ingest pipeline enables you to "pipe" incoming data through a series of processors that transform the data before storage. Use an ingest pipeline to enrich documents with additional fields, including fields generated by machine learning models. Use an inference processor to employ a trained model within your pipeline.
Documentation: Ingest pipelines
An inference processor is a pipeline task that uses a deployed, trained model to transform incoming data during indexing or re-indexing.
Documentation: Inference processor
Search and retrievaledit
After using machine learning models to enrich your documents with additional fields or embeddings, choose from a variety of retrieval methods that take advantage of this additional data.
Use Elastic for semantic search with dense vectors (kNN) or sparse vectors (Elastic Sparse Encoder), and combine these results with those from BM25 text search, optionally boosting on additional NLP fields.
Perform any of these retrieval methods through the same API endpoint:
Text (BM25 + NLP)edit
Use a full-text query to search documents enriched by machine learning models.
Elasticsearch provides a domain-specific language (DSL) for describing a full-text search query. Use this query DSL to design full-text queries, targeting the various fields of your documents.
Depending on your use case, use the additional fields you have added through natural language processing to improve the relevance of your results.
_search API with the
query request body parameter to specify a search query using Elasticsearch’s Query DSL.
For example, the
match query is the standard query for performing a full-text search, including options for fuzzy matching.
Text expansion (Sparse Encoder)edit
Use a text expansion query to perform semantic search on documents enriched by the Elastic Sparse Encoder model.
At index time, the Elastic Sparse Encoder model enriches each document with an additional text expansion field that uses weighted tokens to capture the relationships between words and their meanings. At query time, the text expansion query uses the sparse encoder model to apply the same transformation to users' query text. The result is semantic search: relevance is based on meaning and intention, rather than strict keyword matching on the original document fields.
_search API with the
query.text_expansion request body parameters to query the text expansion field using the sparse encoder model.
Use a k-nearest neighbor (kNN) search to retrieve documents containing indexed vectors, such as those added through an inference processor.
This type of search finds the k nearest vectors to a query vector, as measured by a similarity metric. You will receive the top n documents that are closest in meaning to the query, sorted by their proximity to the query.
_search API with the
knn request body parameter to specify the kNN query to run.
Elasticsearch allows you to combine any of the above retrieval methods within a single search request.
Reciprocal rank fusion (RRF) is a method for combining multiple result sets with different relevance indicators into a single result set. RRF requires no tuning, and the different relevance indicators do not have to be related to each other to achieve high-quality results.
_search API with the
rank request body parameter, and any combination of the
sub_searches to specify multiple queries using the
As an alternative to RRF, use the
_search API with the
knn request body parameters—without
rank—to combine vector and text search.
boost parameter to manage the weight of each query type.
This is known as linear combination.
Security and data privacyedit
Whether implementing an internal knowledge base or integrating with an external LLM service, you may be concerned about the privacy and access of your private application data.
Use Elastic’s security features to manage which people and systems have access.
Use role-based access control, or rely on document- or field-level security for more granular controls.
Role-based access control (RBAC)edit
Role-based access control enables you to authorize users by assigning privileges to roles and assigning roles to users or groups.
You can use built-in roles or define your own roles using
Document and field level securityedit
Document level security restricts the documents that users have read access to, while field level security restricts the fields that users have read access to. In particular, these solutions restrict which documents or fields can be accessed from document-based read APIs.
Implement document and field level security using the Elasticsearch
Application development toolsedit
Elastic also provides a variety of tools for general purpose application development and integrations.
Ingest data from a variety of sources, build a search experience using your preferred programming language, avoid query injection attacks, and ship and view analytics related to user behavior.
Also use these tools to integrate with third party services, including LangChain and OpenAI or other large language models (LLMs).
Use Elastic ingestion tools to index and synchronize data from various sources, including applications, databases, web pages, and content services.
Or implement your own integrations using Elasticsearch’s Index and Document APIs.
Language clients provide Elasticsearch APIs in various programming languages, packaged as libraries.
Add the relevant library to your application to build custom integrations in your preferred programming language.
Documentation: Elasticsearch clients
Elastic Search UI provides state management and components for React applications. Use Search UI to quickly prototype a search experience or build a production-quality UI.
Search UI relies on various "connector" libraries to interface with Elasticsearch and other search engines. Use the Elasticsearch connector for the greatest compatibility with Elasticsearch queries, including semantic search and vector search.
Behavioral analytics is a general purpose analytics platform to analyze user behavior. Send event data, such as search queries and clicks, to Elasticsearch.
Use default dashboards to analyze these events, or create your own visualizations. Use this analysis to improve your search relevance and other application functions.
Documentation: Behavioral analytics
A search application is an Elasticsearch endpoint that corresponds to one or more indices and restricts queries to predefined templates.
Use a search application with an untrusted client, like a web application, where you may be exposed to query injection attacks or other abuses.
Documentation: Search Applications