Enhancing chatbot capabilities with NLP and vector search in Elasticsearch

Conversational interfaces have been around for a while and are becoming increasingly popular as a means of assisting with various tasks, such as customer service, information retrieval, and task automation. Typically accessed through voice assistants or messaging apps, these interfaces simulate human conversation in order to help users resolve their queries more efficiently.

As technology advances, chatbots are used to handle more complex tasks — and quickly — while still providing a personalized experience for users. Natural language processing (NLP) enables chatbots to process the user's language, identifies the intent behind their message, and extracts relevant information from it. For example, Named Entity Recognition extracts key information in a text by classifying them into a set of categories. Sentiment Analysis identifies the emotional tone, and Question Answering the “answer” to a query. The goal of NLP is to enable algorithms to process human language and perform tasks that historically only humans were capable of, such as finding relevant passages among large amounts of text, summarizing text, and generating new, original content.

These advanced NLP capabilities are built upon a technology known as vector search. Elastic has native support for vector search, performing exact and approximate k-nearest neighbor (kNN) search, and for NLP, enabling the use of custom or third-party models directly in Elasticsearch.

In this blog post, we will explore how vector search and NLP work to enhance chatbot capabilities and demonstrate how Elasticsearch facilitates the process. Let's begin with a brief overview of vector search.

Vector search

Although humans can comprehend the meaning and context of written language, machines cannot do the same. This is where vectors come in. By converting text into vector representations (numerical representations of the meaning of the text), machines can overcome this limitation. Compared to a traditional search, instead of relying on keywords and lexical search based on frequencies, vectors enable the process of text data using operations defined for numerical values.

This allows vector search to locate data that shares similar concepts or contexts by using distances in the "embedding space" to represent similarity given a query vector. When the data is similar, the corresponding vectors will be alike.

Vector search is not only utilized in NLP applications, but it’s also used in various other domains where unstructured data is involved, including image and video processing.

In a chatbot flow, there can be several approaches to users' queries, and as a result, there are different ways to improve information retrieval for a better user experience. Since each alternative has its own set of advantages and possible disadvantages, it is essential to take into account the available data and resources, as well as the training time (when applicable) and expected accuracy. In the following section, we will cover these aspects for question-answering NLP models.

Question-answering

A question-answering (QA) model is a type of NLP model that is designed to answer questions asked in natural language. When users have questions that require inferring answers from multiple resources, without a pre-existing target answer available in the documents, generative QA models can be useful. However, these models can be computationally expensive and require large amounts of data for domain related training, which may make them less practical in some situations, even though this method can be particularly valuable to handle out-of-domain questions.

On the other hand, when users have questions on a specific topic, and the actual answer is present in the document, extractive QA models can be used. These models directly extract the answer from the source document, providing transparent and verifiable results, making them a more practical option for businesses or organizations that want to provide a simple and efficient way of answering questions.

The example below demonstrates the use of a pre-trained extractive QA model, available on Hugging Face and deployed into Elasticsearch, to extract answers from a given context:

POST _ml/trained_models/deepset__minilm-uncased-squad2/deployment/_infer
{
    "docs": [{"text_field": "Canvas is a data visualization and presentation application within Kibana. With Canvas, live data can be pulled directly from Elasticsearch and combined with colors, images, text, and other customized options to create dynamic, multi-page displays."}],
    "inference_config": {"question_answering": {"question": "What is Kibana Canvas?"}}
}


{
  "predicted_value": "a data visualization and presentation application",
  "start_offset": 10,
  "end_offset": 59,
  "prediction_probability": 0.28304219431376443
}

Deploy trained models.

Add a model to an inference ingest pipeline.

There are various ways to handle user queries and retrieve information, and using multiple language models and data sources can be an effective alternative when dealing with unstructured data. To illustrate this, we have an example of the data processing of a chatbot employed to respond to queries with answers considering data extracted from selected documents.

Chatbot data processing: NLP and vector search

As shown above, the data processing for our chatbot can be divided into three parts:

Vector processing: This part converts documents into vector representations.
User input processing: This part extracts relevant information from the user query and performs semantic search and hybrid retrieval.
Optimization: This part includes monitoring and is crucial for ensuring the chatbot's reliability, optimal performance, and great user experience.

Vector processing

For the processing part, the first step is to determine component parts of each document to then convert each element to a vector representation; these representations can be created for a wide range of data formats.

There are various methods that can be used to compute embeddings, including pre-trained models and libraries.

It's important to note that the effectiveness of search and retrieval on these representations depends on the existing data and the quality and relevance of the method used.

As the vectors are computed, they are stored in Elasticsearch with a dense_vector field type.

PUT <target>
{
  "mappings": {
    "properties": {
      "doc_part_vector": {
        "type": "dense_vector",
        "dims": 3
      },
      "doc_part" : {
        "type" : "keyword"
      }
    }
  }
}

Chatbot user input processing

For the user part, after receiving a question, it's useful to extract all possible information from it before proceeding. This helps to understand the user's intention, and in this case, we are using a Named Entity Recognition model (NER) to assist with that. NER is the process of identifying and classifying named entities into predefined entity categories.

POST _ml/trained_models/dslim__bert-base-ner/deployment/_infer
{
  "docs": { "text_field": "How many people work for Elastic?"}
}


{
  "predicted_value": "How many people work for [Elastic](ORG&Elastic)?",
  "entities": [
    {
      "entity": "Elastic",
      "class_name": "ORG",
      "class_probability": 0.4993975435876747,
      "start_pos": 25,
      "end_pos": 32
    }
  ]
}

Although not a necessary step, by using structured data or the above or another NLP model result to categorize the user's query, we can restrict the kNN search using a filter. This helps to improve performance and accuracy by reducing the amount of data that needs to be processed.

    "filter": {
      "term": {
        "org": "Elastic"
      }
    }

Semantic search and hybrid retrieval

Since the prompt originates from user queries and the chatbot needs to process human language with its variability and ambiguity, semantic search is a great fit. In Elasticsearch, you can perform semantic search in a single step by passing the query string and the ID of the embedding model into a query_vector_builder object. This will vectorize the query and perform kNN search to retrieve top k matches that are closest in meaning to the query:

POST /<target>/_search
{
  "knn": {
    "field": "doc_part_vector",
    "k": 5,
    "num_candidates": 20,
    "query_vector_builder": {
      "text_embedding": {
        "model_id": "<text-embedding-model-id>",
        "model_text": "<query_string>"
      }
    }
  }
 }

End-to-end example: How to deploy a text embedding model and use it for semantic search.
Elasticsearch uses the Lucene implementation of the Okapi BM25, a sparse model , to rank text queries for relevance, while dense models are used for semantic search. To combine the strengths of both , vector matches and matches obtained from the text query, you can perform a hybrid retrieval :

POST <target>/_search
{
  "query": {
          "match": {
            "content": {
              "query": "<query_string>"
            }
        }
  },
  "knn": {
    "field": "doc_part_vector",
    "query_vector_builder": {
      "text_embedding": {
    "model_id": "<text-embedding-model-id>",
     "model_text": "<query_string>"
      }
    },
    "filter": {
      "term": {
        "org": "Elastic"
      }
    }
  }
}

Combining both sparse and dense models often yields the best results

Sparse models generally perform better on short queries and specific terminologies, while dense models leverage context and associations. If you want to learn more about how these methods compare and complement each other, here we benchmark BM25 against two dense models that have been specifically trained for retrieval.

The most relevant result can usually be the first answer given to the user, the_score is a number used to determine the relevance of the returned document.

Chatbot optimization

To help improve the user experience, performance, and reliability of your chatbot, in addition to applying hybrid scoring, you can incorporate the following approaches:
Sentiment Analysis: To provide awareness of user comments and reactions as the dialog unfolds, you can incorporate a sentiment analysis model:

POST _ml/trained_models/distilbert-base-uncased-finetuned-sst-2-english/deployment/_infer
{
  "docs": { "text_field": "That was not my question!"}
}


{
  "predicted_value": "NEGATIVE",
  "prediction_probability": 0.980080439016437
}

GPT's capabilities : As an alternative to enhance the overall experience, you can combine Elasticsearch's search relevance with OpenAI's GPT question-answering capabilities, utilizing the Chat Completion API to return to the user model-generated responses considering these top k documents as a context. Prompt: "answer this question <user_question> using only this document <top_search_result>"

Observability: Ensuring the performance of any chatbot is crucial, and monitoring is an essential component in achieving this. In addition to logs that capture chatbot interactions, it's important to track response time, latency, and other relevant chatbot metrics. By doing so, you can identify patterns, trends, and even detect anomalies.Elastic Observability tools enable you to collect and analyze this information.

Summary

This blog post covers what NLP and vector search are and delves into an example of a chatbot employed to respond to user queries by considering data extracted from the vector representation of documents.

As demonstrated, using NLP and vector search, chatbots are capable of performing complex tasks that go beyond structured, targeted data. This includes making recommendations and answering specific product or business-related queries using multiple data sources and formats as context, while also providing a personalized user experience.

Use cases range from providing customer service by assisting customers with their inquiries to helping developers with their queries, by providing step-by-step guidance, suggesting recommendations, or even automating tasks. Depending on the goal and existing data, other models and methods can also be utilized to achieve even better results and improve the overall user experience.

Here are some links on the topic that may be useful:

By incorporating NLP and native vector search in Elasticsearch, you can take advantage of its speed, scalability, and search capabilities to create highly efficient and effective chatbots capable of handling large amounts of data, whether structured or unstructured.

Ready to get started? Begin a free trial of Elastic Cloud.

In this blog post, we may have used or we may refer to third party generative AI tools, which are owned and operated by their respective owners. Elastic does not have any control over the third party tools and we have no responsibility or liability for their content, operation or use, nor for any loss or damage that may arise from your use of such tools. Please exercise caution when using AI tools with personal, sensitive or confidential information. Any data you submit may be used for AI training or other purposes. There is no guarantee that information you provide will be kept secure or confidential. You should familiarize yourself with the privacy practices and terms of use of any generative AI tools prior to use.

Elastic, Elasticsearch and associated marks are trademarks, logos or registered trademarks of Elasticsearch N.V. in the United States and other countries. All other company and product names are trademarks, logos or registered trademarks of their respective owners.

Ready to build RAG into your apps? Want to try different LLMs with a vector database?
Check out our sample notebooks for LangChain, Cohere and more on Github, and join the Elasticsearch Engineer training starting soon!