Glossary

Language Model

A machine learning model trained to understand and generate human language. Language models learn patterns from large amounts of text and can be adapted for many tasks, including generating embeddings. The architecture and training approach determine the quality of the embeddings it produces.

Late Chunking

A technique where a long document is first processed through the model as a whole, using the full context window, and then divided into chunks after the model has produced context-aware token representations. This preserves cross-chunk context that would be lost if the document were split before encoding. Jina Embeddings support late chunking for improved long-document retrieval.

Late Interaction

A retrieval strategy, used by ColBERT and similar models, where query and document are encoded independently into multiple vectors, but matching is performed by comparing individual token-level vectors from each side. This combines the efficiency of independent encoding with the precision of token-level matching.

Latency

The time elapsed between submitting a request and receiving a response. In a retrieval pipeline, latency accumulates across query embedding, ANN search, reranking and network round trips. In production systems latency is measured at percentiles, typically p99, meaning the worst case experience for 1 in 100 requests, rather than averages, which mask tail latency.

Latent Representation

A hidden, compressed representation of data that a model learns internally. Unlike features you can directly observe — word count, sentence length — latent representations capture abstract properties like topic, intent, or tone that the model discovers during training. Embeddings are a form of latent representation.

Lexical Search

Traditional keyword-based search that matches documents based on the words they contain. Lexical search is fast and precise for exact matches but misses relevant documents that use different terminology. It is often combined with semantic search in hybrid approaches.

Linear Probe

A simple evaluation method: train a basic linear classifier on top of frozen embeddings to test how much useful information they contain. If the probe achieves high accuracy, the embeddings encode relevant features in a readily accessible way.

Listwise Ranking

A ranking approach where the model considers the entire list of candidates together when determining the order. This can capture relationships between results, such as avoiding redundancy, but is more complex and computationally expensive than pointwise scoring.

Long-Context Embedding

An embedding model designed to handle inputs significantly longer than the typical 512-token limit, often supporting 8,192 tokens or more. Long-context models can embed entire documents in a single pass, reducing the need for chunking and preserving cross-passage context.

LoRA (Low-Rank Adaptation)

A parameter-efficient fine-tuning technique that adapts a model by adding small, trainable matrices to its layers instead of updating all parameters. LoRA dramatically reduces the resources needed for fine-tuning and allows multiple specialized adaptations to share the same base model. It is increasingly used to customize embedding models for domain-specific tasks.

Loss Function

A mathematical formula that measures how far the model's current behavior is from the desired behavior. During training, the model adjusts its parameters to minimize the loss. Different loss functions encode different learning objectives.