In a previous post, we talked about synonyms and their importance for providing a great search experience. Using synonyms improves search results by:
- Finding documents that use similar words to the search query
- Making domain specific vocabulary more user friendly, so users find results using familiar words
- Correcting common misspelling or typos
Search results need to evolve over time. New items go on sale, new trends change what users search for, and new terms become part of a search domain. Our search experience must evolve as well.
As part of evolving our search experience, it's important to keep our synonyms updated. A new synonyms API has been introduced in Elasticsearch® to help manage synonyms and update them seamlessly.
This API simplifies your workflow in updating synonyms and provides better integration with your processes and tools.
Previous synonym updating process
As explained in detail in this blog post, synonyms in Elasticsearch are defined using the synonym and synonym graph token filters. These token filters are then included as part of the analysis for your text fields.
We can already update synonyms for search analyzers by configuring synonym files in the synonym token filters — for example:
PUT /synonym_test
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym_analyzer": {
"tokenizer": "whitespace",
"filter": ["my_synonyms"]
}
},
"filter": {
"my_synonyms": {
"type": "synonym",
"synonyms_path": "my_synonyms.txt",
"updateable": true
}
}
}
}
}
}
synonyms_path defines the file path (relative to Elasticsearch configuration file) where the synonym file is stored. The synonyms file contains the synonym rules and must be distributed to all the Elasticsearch nodes in the cluster.
To update the synonyms, we need to update the synonyms file on every cluster node and then reload the search analyzers using the reload search analyzers API for each index that uses the synonym file for its synonym token filters.
Why add a synonyms API?
There are a few steps involved in the current way of updating synonyms:
- We need to upload the synonyms file to each node in the Elasticsearch cluster. Elastic Cloud users can upload a custom bundle for doing this.
- Our synonym token filters must be configured with the correct path (the path can be absolute or relative to the Elasticsearch config directory).
- The synonym files must be updated on every node and kept in sync.
- Reload search analyzers API needs to be invoked for every index that uses the synonyms file.
This is doable, but it involves infrastructure work like uploading files, maintaining them up to date and in sync, and understanding where each synonym file is used.
Enter the synonyms API
Using the synonyms API provides a number of advantages over the previous file-based synonym update method:
- Provides an API based mechanism for defining synonyms
- Provides an automatic reloading mechanism for the analysis process
- Allows for fine-grained synonym management — you can replace all rules on a synonyms set or individual synonym rules
Define synonyms sets
A synonyms set is a group of synonyms to be applied. You can add as many synonyms sets as you need.
Each synonyms set defines synonyms using synonym rules. Each rule defines a group of words that are synonyms, and the explicit equivalence between them, using the Solr format.
Creating a synonyms set is done using the create or update synonyms set API:
PUT _synonyms/my-synonyms-set
{
"synonyms_set": [
{
"id": "pc",
"synonyms": "pc => personal computer"
},
{
"id": "computer",
"synonyms": "computer,laptop"
}
]
}
This API request creates a new synonyms set with identifier my-synonyms-set, which defines two synonym rules:
- One synonym rule with an identifier "pc" that expands the word "pc" into "personal computer," but not the other way round
- One synonym rule with an identifier "computer" that specifies that "computer" and "laptop" are equivalent
Configuring the synonyms set
Once created, your synonyms sets can be used as part of the synonym or synonym graph token filters.
Use the synonyms_set configuration option for specifying your synonyms set identifier created in the previous step:
PUT /synonym_set_test
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym_analyzer": {
"tokenizer": "whitespace",
"filter": ["my_synonyms"]
}
},
"filter": {
"my_synonyms": {
"type": "synonym",
"synonyms_set": "my-synonyms-set",
"updateable": true
}
}
}
}
}
}
Your synonyms are ready to be used! The analyzer will retrieve the synonyms defined in the configured synonyms set and apply them to the fields you use it on.
Updating a synonyms set
You can update a synonyms set by updating all its synonym rules:
PUT _synonyms/my-synonyms-set
{
"synonyms_set": [
{
"id": "pc",
"synonyms": "pc => personal computer"
},
{
"id": "computer",
"synonyms": "computer, pc, laptop, desktop"
}
]
}
Or, you can manage individual synonym rules instead. As every rule has an identifier, you can create, delete, or update individual synonym rules:
PUT _synonyms/my-synonyms-set/computer
{
"synonyms": "computer, pc, laptop, desktop"
}
nd that's it! The indices that use your synonyms set will automatically reload the analyzers. Your updated synonyms will be accessible to your search experience with no further steps to perform.
Try it out!
Managing synonyms for your search experience has never been easier! Instead of using files and updating both each file and the associated index analyzers, you can now use the new synonyms API for defining synonyms and update them with automatic reloading of the analyzers needed.
Check it out! Create an Elastic Cloud cluster today and start defining synonyms.
We’d love to hear your feedback — join the conversation in our Discuss forums or community Slack channel.
The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.
Ready to try this out on your own? Start a free trial.
Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!
Related content
October 11, 2024
Which job is the best for you? Using LLMs and semantic_text to match resumes to jobs
Learn how to use Elastic's LLM Inference API to process job descriptions, and run a double hybrid search to find the most suitable job for your resume.
October 10, 2024
How to ingest data from AWS S3 into Elastic Cloud - Part 2 : Elastic Agent
Learn about different options to ingest data from AWS S3 into Elastic Cloud.
October 9, 2024
Building a search app with Blazor and Elasticsearch
Learn how to build a search application using Blazor and Elasticsearch, and how to use the Elasticsearch .NET client for hybrid search.
October 8, 2024
LangChain4j with Elasticsearch as the embedding store
LangChain4j (LangChain for Java) has Elasticsearch as an embedding store. Discover how to use it to build your RAG application in plain Java.
October 4, 2024
Using Eland on Elasticsearch Serverless
Learn how to use Eland on Elasticsearch Serverless