Overview of NLP with Elastic machine learning

Serverless Stack

Natural language processing (NLP) refers to the way in which we can use software to understand natural language in spoken word or written text.

NLP in the Elastic Stack

Elastic offers a wide range of possibilities to leverage natural language processing.

You can integrate NLP models from different providers such as Cohere, HuggingFace, or OpenAI and use them as a service through the semantic_text workflow. You can also use ELSER (the retrieval model trained by Elastic) and E5 in the same way.

The inference API enables you to use the same services with a more complex workflow, for greater control over your configurations settings. This tutorial walks you through the process of using the various services with the inference API.

You can upload and manage NLP models using the Eland client and the Elastic Stack. Find the list of recommended and compatible models here. Refer to Examples to learn more about how to use machine learning models deployed in your cluster.

You can store embeddings in your Elasticsearch vector database if you generate dense vector or sparse vector model embeddings outside of Elasticsearch.

What is NLP?

Classically, NLP was performed using linguistic rules, dictionaries, regular expressions, and machine learning for specific tasks such as automatic categorization or summarization of text. In recent years, however, deep learning techniques have taken over much of the NLP landscape. Deep learning capitalizes on the availability of large scale data sets, cheap computation, and techniques for learning at scale with less human involvement. Pre-trained language models that use a transformer architecture have been particularly successful. For example, BERT is a pre-trained language model that was released by Google in 2018. Since that time, it has become the inspiration for most of today’s modern NLP techniques. The Elastic Stack machine learning features are structured around BERT and transformer models. These features support BERT’s tokenization scheme (called WordPiece) and transformer models that conform to the standard BERT model interface. For the current list of supported architectures, refer to Compatible third party models.

To incorporate transformer models and make predictions, Elasticsearch uses libtorch, which is an underlying native library for PyTorch. Trained models must be in a TorchScript representation for use with Elastic Stack machine learning features.

As in the cases of classification and regression, after you deploy a model to your cluster, you can use it to make predictions (also known as inference) against incoming data. You can perform the following NLP operations:

To delve deeper into Elastic’s machine learning research and development, explore the ML research section within Search Labs.