Retrieval Phase

When a question is received, the application first searches the Elasticsearch index for relevant documents. This is achieved by generating a sparse vector embedding for the question, and then searching the index for the closest embeddings to it, each associated with a passage of a document.

As in the ingest phase, the Elasticsearch index is managed through the ElasticsearchStore integration with Langchain:

store = ElasticsearchStore(
    es_connection=elasticsearch_client,
    index_name=INDEX,
    strategy=ElasticsearchStore.SparseVectorRetrievalStrategy(model_id=ELSER_MODEL),
)

Generating an embedding for the question, and then searching for it is all nicely abstracted away by the invoke() method of Langchain's retriever interface, which performs all of these tasks and returns the list of most relevant documents found:

docs = store.as_retriever().invoke(question)
for doc in docs:
    doc_source = {**doc.metadata, 'page_content': doc.page_content}
    yield f'data: {SOURCE_TAG} {json.dumps(doc_source)}\n\n'

You can see here how the returned passages are sent to the client as sources. The React application will show these as "Search Results" below the answer.

It is important to note that the strategy argument that is used in the ElasticsearchStore class must match the strategy used during ingest. In this example, the SparseVectorRetrievalStrategy creates and searches sparse vectors from Elastic's ELSER model. Another interesting option you can evaluate is to use the ApproxRetrievalStrategy, which uses dense vector embeddings.