Glossary

Rate Limit

A restriction on the number of API requests within a given time period. Rate limits protect the service from overload and ensure fair usage. You need to account for them when planning how to integrate an embedding API into production workflows.

Recall@k

The fraction of all relevant items that appear among the top-k results. Recall@10 of 0.8 means 80% of relevant items were found in the top 10. Recall is the primary metric for first-stage retrieval, where the goal is to catch as many relevant items as possible.

Relevance Score

A numerical score assigned by a reranker to each query-document pair, indicating how relevant the document is to the query. Results are sorted by relevance score to produce the final ranked list. Unlike cosine similarity scores, relevance scores are not normalized: they are meaningful for ranking candidates within a single query but cannot be compared across different queries or models, and do not represent absolute measures of semantic similarity

Representation Learning

A field of machine learning focused on automatically learning data representations that make downstream tasks easier, rather than relying on hand-engineered features. Methods range from autoencoders to contrastive learning, with embedding models being a relevant application. The key insight is that useful structure, such as semantic meaning or visual similarity, can be discovered directly from data without explicit feature specification.

Reranker

A model that takes a list of search results and reorders them by more carefully evaluating each result's relevance to the query. Rerankers are used after first-stage retrieval to improve precision. Because they evaluate each query-document pair individually, they make more nuanced relevance judgments than bi-encoder models. Jina Reranker is designed for this purpose.

Reranking

Re-scoring and reordering an initial set of search results to improve their quality. This is the second stage in a typical two-stage retrieval pipeline: the first stage finds candidates quickly; the reranker ensures the most relevant ones appear at the top.

Retrieval

Finding and returning relevant information from a database or index in response to a query. In the embedding world, retrieval means converting the query into a vector and searching for the most similar document vectors.

Retrieval-Augmented Generation (RAG)

A technique that improves the accuracy of large language models by first retrieving relevant information from a knowledge base and then providing that information to the model along with the user's question. Instead of relying solely on what the model learned during training, RAG grounds the response in retrieved, up-to-date data. Embedding models and rerankers are core components of RAG pipelines.