Elasticsearch open inference API adds support for OpenAI chat completions

We are thrilled to unveil our latest innovation in Elasticsearch: the integration of OpenAI Chat Completions in Elastic’s inference APIs. This new feature marks another milestone in our journey of integrating cutting-edge AI capabilities within Elasticsearch, offering additional easy-to-use features like generating human-like text completions.

The Essence of Continuous Innovation at Elastic

Elastic invests heavily in everything AI. We’ve recently released a lot of new features and exciting integrations:

Elasticsearch open inference API adds support for Cohere Embeddings
Introducing Elasticsearch vector database to Azure OpenAI Service On Your Data (preview)
Speeding Up Multi- graph Vector Search
...explore more of Elastic Search Labs to learn about recent developments

The new completion task type inside our inference API with OpenAI as the first backing provider is already available in our stateless offering on Elastic Cloud. It’ll be soon available to everyone in our next release.

Using the new completion API

In this short guide we’ll show a simple example on how to use the new completion task type in the inference API during document ingestion. Please refer to the Elastic Search Labs GitHub repository for more in-depth guides and interactive notebooks.

For the following guide to work you'll need to have an active OpenAI account and obtain an API key. Refer to OpenAI’s quickstart guide for the steps you need to follow. You can choose from a variety of OpenAI’s models. In the following example we’ve used `gpt-3.5-turbo`.

In Kibana, you'll have access to a console for you to input these next steps in Elasticsearch without even needing to set up an IDE.

Firstly, you configure a model, which will perform the completions:

PUT _inference/completion/openai_chat_completions
{
    "service": "openai",
    "service_settings": {
        "api_key": <api-key>,
        "model_id": "gpt-3.5-turbo"
    }
}

After running this command you should see a corresponding `200 OK` status indicating that the model is properly set up for performing inference on arbitrary text.

You’re now able to call the configured model to perform inference on arbitrary text input:

POST _inference/completion/openai_chat_completions
{
    "input": "What is Elastic?"
}

You’ll get a response with status code `200 OK` looking similar to the following:

{
    "completion": [
        {
            "result": "Elastic is a software company that provides a range of products and solutions for search, logging, security, and analytics. Its flagship product, Elasticsearch, is a distributed, RESTful search and analytics engine that is used for full-text search, structured search, and analytics. Elastic also offers other products such as Logstash for log collection and parsing, Kibana for data visualization and dashboarding, and Beats for lightweight data shippers. These products can be combined to create powerful data analysis and monitoring solutions for organizations of all sizes."
        }
    ]
}

The next command creates an example document we’ll summarize using the model we’ve just configured:

POST _bulk
{ "index" : { "_index" : "docs" } }
{"content": "You know, for search (and analysis) Elasticsearch is the distributed search and analytics engine at the heart of the Elastic Stack. Logstash and Beats facilitate collecting, aggregating, and enriching your data and storing it in Elasticsearch. Kibana enables you to interactively explore, visualize, and share insights into your data and manage and monitor the stack. Elasticsearch is where the indexing, search, and analysis magic happens. Elasticsearch provides near real-time search and analytics for all types of data. Whether you have structured or unstructured text, numerical data, or geospatial data, Elasticsearch can efficiently store and index it in a way that supports fast searches. You can go far beyond simple data retrieval and aggregate information to discover trends and patterns in your data. And as your data and query volume grows, the distributed nature of Elasticsearch enables your deployment to grow seamlessly right along with it. While not every problem is a search problem, Elasticsearch offers speed and flexibility to handle data in a wide variety of use cases: Add a search box to an app or website Store and analyze logs, metrics, and security event data Use machine learning to automatically model the behavior of your data in real time Use Elasticsearch as a vector database to create, store, and search vector embeddings Automate business workflows using Elasticsearch as a storage engine Manage, integrate, and analyze spatial information using Elasticsearch as a geographic information system (GIS) Store and process genetic data using Elasticsearch as a bioinformatics research tool We’re continually amazed by the novel ways people use search. But whether your use case is similar to one of these, or you’re using Elasticsearch to tackle a new problem, the way you work with your data, documents, and indices in Elasticsearch is the same."}

To summarize multiple documents, we’ll use an ingest pipeline together with the script-, inference- and remove-processor to set up our summarization pipeline.

PUT _ingest/pipeline/summarization_pipeline
{
    "processors": [
        {
            "script": {
                "source": "ctx.prompt = 'Please summarize the following text: ' + ctx.content"
            }
        },
        {
            "inference": {
                "model_id": "openai_chat_completions",
                "input_output": {
                    "input_field": "prompt",
                    "output_field": "summary"
                }
            }
        },
        {
            "remove": {
                "field": "prompt"
            }
        }
  ]
}

This pipeline simply prefixes the content with the instruction “Please summarize the following text: “ in a temporary field so the configured model knows what to do with the text. You can change this text to anything you would like of course, which unlocks a variety of other popular use cases:

Question and Answering
Translation
…and many more!

The pipeline deletes the temporary field after performing inference.

We now send our document(s) through the summarization pipeline by calling the reindex API.

POST _reindex
{
    "source": {
        "index": "docs",
        "size": 50
    },
    "dest": {
        "index": "docs_summaries",
        "pipeline": "summarization_pipeline"
    }
}

Your document is now summarized and ready to be searched:

POST docs_summaries/_search
{
    "query": {
        "match_all": { }
    }
}

That’s basically it, you just created a powerful summarization pipeline with a few simple API calls, which can be used with any ingestion mechanism! There are a lot of use cases, where summarization comes in handy, for example by summarizing large pieces of text before generating semantic embeddings or transforming large documents into a concise summary. This can reduce your storage cost, improve time-to-value for example, if you’re only interested in a summary of large documents etc. By the way if you want to extract text from binary documents you can take a look at our open-code data-extraction service!

Exciting future ahead

But we won’t stop here. We’re already working on integrating Cohere’s chat as another provider for our `completion` task. We’re also actively exploring new retrieval and ingestion use cases in combination with the completion API. Bookmark Elastic Search Labs now to stay up to date!

Ready to try this out on your own? Start a free trial.
Elasticsearch has integrations for tools from LangChain, Cohere and more. Join our advanced semantic search webinar to build your next GenAI app!