Jina Reranker v3

Modern search pipelines rarely return their final answers directly from retrieval. Even when your search system is powered by high-quality indexes or semantic retrieval methods, the top results often include candidates that are relevant, but not necessarily the best possible answers.

This is where reranking becomes essential.

A reranker takes a query and a set of candidate passages, reads them together, and produces a new ordering that reflects deeper semantic relevance. Instead of scoring documents in isolation, it compares them against one another, determining which passages most directly answer the user’s intent.

In this tutorial, we focus on how to use Jina Reranker v3 inside Elasticsearch. Our goal is simple: given a query and an array of documents, show how Elasticsearch can call the jina-reranker-v3

Everything happens inside Dev Tools → Console, using Elasticsearch’s open Inference API.

Step 1 — Register the jina-reranker-v3 in Elasticsearch

Elasticsearch integrates with external AI models through the Inference API. To use jina-reranker-v3, we define an inference endpoint and supply our Jina API key. Find your API key on the website.

When this completes successfully, Elasticsearch is now able to send text and documents directly to the Jina model and receive reranked results.

Step 2 — Index some documents for demonstration

(Any text corpus will work — we only need documents stored in Elasticsearch so we can pass them to the reranker.)

Here we index a small multilingual literary dataset.There is no embedding logic and no vector fields — we keep the index intentionally simple:

The _inference/rerank API expects a query string and an input array of candidate documents. Each element of input is a plain-text passage to be ranked. The response returns a rerank array, where each entry contains the original index of the document, its semantic relevance score, and optionally the document text.
If everything works well, you should see the following results:

Conceptually, this is all the reranker does: given a query and a small set of candidate texts, it decides which ones are most relevant and returns them in order. It does not care how you obtained those candidates—whether they came from BM25, vector search, filters, or a hand-picked list—so you can drop it into almost any existing search flow.

1.1 Create the embedding model endpoint

This endpoint allows Elasticsearch to send text to jina-embeddings-v3 and store the returned vectors.

If it runs successfully, you should see a 200 OK confirmation.

1.2 Create the reranker model endpoint

Next, we register the jina-reranker-v3 endpoint.

Step 3 — Create an index with text + vector fields

We will now create a small index that holds:

metadata (id, lang, title)
the document text (content)
the dense vector (embedding) generated by the Jina embedding endpoint

Run:

jina-embeddings-v3 can generate vectors in several sizes, but for this tutorial, we use 1024 dimensions because it offers the strongest retrieval accuracy and aligns with the model’s default configuration, allowing us to focus on the search flow rather than tuning embedding parameters. In practical deployments, you can choose a smaller size, such as 256 or 512, if you need to optimize for storage or latency, but 1024 provides a clean baseline for demonstrating the pipeline end-to-end.

Step 4 — Add an ingest pipeline to generate embeddings automatically

Every time you index a document, Elasticsearch can call the jina_embeddings endpoint, get the vector back, and store it directly in the embedding field.

Create the pipeline:

Step 5 — Index a small multilingual dataset

We will index four short paragraphs in four languages. They describe the same concept (emerging music styles), making it easy to see how vector search and reranking behave.

Make sure to include the pipeline parameter:

These four passages express the same central idea, but do so with entirely different surface forms. For BM25, they have almost no overlapping keywords. For dense retrieval and reranking, they form a clean semantic cluster. This contrast will allow us to see precisely where each retrieval method excels, where it fails, and how the reranker resolves ambiguities that neither keyword search nor vector search can solve alone.

Step 6— Compare BM25 and vector search for a philosophical query

Now that the index is populated, we can see how classic keyword search and dense retrieval behave on a more abstract question.

Imagine a user asking:

We will first run a BM25-only query on the content field:

BM25 will focus on literal overlap: “suffering”, “truth”, maybe “hidden”. Depending on how Elasticsearch tokenizes and translates those terms, it may give reasonable results, but it is very sensitive to exact words and to the language of the documents. Russian or Italian passages might rank lower simply because they share no tokens with the English query, even though they express the same idea.

To contrast that, we run a vector search using jina-embeddings-v3. First, we ask the embedding endpoint for a search embedding of the query:

This returns a 1024-dimensional vector. Copy the embedding array from the response and plug it into a kNN search:

Now the model is free to ignore literal wording and focus purely on semantic proximity. Documents that talk about suffering, truth, conscience, and inner struggle in any language tend to surface near the top. This already feels much more “intelligent” than BM25, but notice that the ranking is still driven by a single similarity score per document: it does not consider how the documents relate to each other or which one is the best answer among several good ones.

That is exactly what reranking will fix.

Step 7 — Build a hybrid retriever with Reciprocal Rank Fusion (RRF)

In practice, we rarely want to choose between BM25 and vector search; we want both. Elasticsearch provides an RRF retriever that merges multiple retrieval strategies into a single ranked list using reciprocal rank fusion.

Instead of writing two separate queries and merging results manually, we can express the hybrid search in one _search call using the retriever syntax.

First, get the query embedding again (as in Step 6). Then plug the vector into this request:

You will typically see a mix of languages near the top: English, Russian, German, Italian, depending on how the words “suffering”, “truth”, and “conscience” are represented in the embeddings and in the text. This hybrid list is the candidate set we will pass to the reranker.

Step 8 — Add Jina Reranker v3 as a final ranking layer

Right now, the hybrid search still produces a single score per document, with no notion of “this passage is good, but that one is the clear best answer.” A reranker is designed to make exactly that judgment.

Elasticsearch exposes this as the text_similarity_reranker retriever, which wraps another retriever (here, our rrf hybrid retriever) and sends its top hits to a rerank inference endpoint.

Because we already registered jina_rerank_v3 as a rerank endpoint, we can now write:

When we run the hybrid RRF search without reranking, the English passage “Pain can lead a person back to truth, for suffering often reveals what comfort hides” appears at the top, followed by the English “shadow of guilt” line and then the Russian and French variants. This makes sense: the query is in English, so BM25 heavily favors English documents that share literal terms like “suffering” and “truth”, while the vector retriever pulls in the other languages but does not fully control the final ordering.

Once we add jina-reranker-v3 as a second stage, the ranking changes in a way that feels more deliberate. The reranker moves all four “suffering reveals truth” sentences – French, German, Russian, and English – to the top of the list and pushes the more tangential “guilt and necessity” passages further down. Instead of simply blending lexical and vector scores, the model reads the query and the candidate passages together in one context, then learns which ones most directly answer the question. The result is a ranking that is multilingual, semantically coherent, and clearly optimized for the user’s intent rather than for individual scoring heuristics.

Previously

Jina Tutorial

Jina Reader

Report an issue