Glossary

This glossary describes essential terms and concepts to help you understand Elasticsearch and its related technologies.

Language Model

A machine learning model trained to understand and generate human language. Language models learn patterns from large amounts of text and can be adapted for many tasks, including generating embeddings. The architecture and training approach determine the quality of the embeddings it produces.

Late Chunking

A technique where a long document is first processed through the model as a whole, using the full context window, and then divided into chunks after the model has produced context-aware token representations. This preserves cross-chunk context that would be lost if the document were split before encoding. Jina Embeddings support late chunking for improved long-document retrieval.

Late Interaction

A retrieval strategy, used by ColBERT and similar models, where query and document are encoded independently into multiple vectors, but matching is performed by comparing individual token-level vectors from each side. This combines the efficiency of independent encoding with the precision of token-level matching.

Latency

The time elapsed between submitting a request and receiving a response. In a retrieval pipeline, latency accumulates across query embedding, ANN search, reranking and network round trips. In production systems latency is measured at percentiles, typically p99, meaning the worst case experience for 1 in 100 requests, rather than averages, which mask tail latency.

Latent Representation

A hidden, compressed representation of data that a model learns internally. Unlike features you can directly observe — word count, sentence length — latent representations capture abstract properties like topic, intent, or tone that the model discovers during training. Embeddings are a form of latent representation.

Linear Probe

A simple evaluation method: train a basic linear classifier on top of frozen embeddings to test how much useful information they contain. If the probe achieves high accuracy, the embeddings encode relevant features in a readily accessible way.

Listwise Ranking

A ranking approach where the model considers the entire list of candidates together when determining the order. This can capture relationships between results, such as avoiding redundancy, but is more complex and computationally expensive than pointwise scoring.

Long-Context Embedding

An embedding model designed to handle inputs significantly longer than the typical 512-token limit, often supporting 8,192 tokens or more. Long-context models can embed entire documents in a single pass, reducing the need for chunking and preserving cross-passage context.

LoRA (Low-Rank Adaptation)

A parameter-efficient fine-tuning technique that adapts a model by adding small, trainable matrices to its layers instead of updating all parameters. LoRA dramatically reduces the resources needed for fine-tuning and allows multiple specialized adaptations to share the same base model. It is increasingly used to customize embedding models for domain-specific tasks.

Loss Function

A mathematical formula that measures how far the model's current behavior is from the desired behavior. During training, the model adjusts its parameters to minimize the loss. Different loss functions encode different learning objectives.

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as you are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself