Cohere

Cohere builds large language models and makes them accessible through a set of APIs. Cohere’s embedding models, such as embed-english-v3.0 and embed-multilingual-v3.0, transform text chunks into vector representations. These models can be accessed through their Embed API. This API features an embedding_types parameter which gives users the option to produce highly compressed embeddings to save on storage costs.

Cohere’s generative models, such as command-r and command--r-plus, receive user instructions and generate useful text. These models can be accessed through their Chat API, enabling users to create multi-turn conversational experiences. This API features a documents parameter which allows users to provide the model with their own documents directly in the message; these can be used to ground model outputs.

Cohere’s reranking models, such as rerank-english-v3.0 and rerank-multilingual-v3.0, improve search results by re-organizing retrieved results based on certain parameters. These models can be accessed through their Rerank API. These models offer a “low lift, last mile” improvement to search algorithms. Together, these models can be used to build state-of-the-art retrieval-augmented generation (RAG) systems - transform your text into embeddings with Embed v3, store them with Elasticsearch, rerank retrieved results for maximum relevancy, and dynamically pass retrieved documents to the Chat API for grounded conversation.

Available Now!

8.13 and Serverless: Cohere embedding support

Serverless: Cohere reranking support

Tutorials

Notebooks

Semantic Search using the Inference API with the Cohere service

Example Chatbot Application

Configure the chatbot application to explore building RAG with Elasticsearch, Cohere and LangChain