Glossary

Backbone (or Backbone Model)

The base model architecture on which a specialized model is built. An embedding model's backbone might be a pre-trained BERT or a decoder-only language model. The backbone provides general language understanding, which is then refined for the specific task through further training.

Batch Processing

Sending multiple inputs to a model in a single request rather than one at a time. The model can process them together, reducing per-item overhead. Most embedding APIs support batch processing for high-throughput use cases.

Benchmark

A standardized test used to evaluate and compare model performance. Benchmarks provide a common playing field for objective comparison. They typically include predefined datasets, evaluation metrics, and established baselines.

BERT (Bidirectional Encoder Representations from Transformers)

A foundational encoder-only model published by Google in 2018. BERT reads text in both directions simultaneously, giving it a deep understanding of context. It became the basis for many embedding models, and a large number of modern embedding architectures are built on top of BERT or its successors.

Bi-Encoder

An architecture where the query and document are encoded independently by the same model, producing separate embeddings that are compared using a similarity metric. Bi-encoders are fast because document embeddings can be precomputed and indexed. They are the standard architecture for first-stage retrieval.

Big O Notation

A standard way of describing how an algorithm's time or memory cost grows as input size increases, independent of hardware or implementation. O(1) means constant cost regardless of scale; O(log n) grows slowly; O(n) grows proportionally; O(n²) grows quadratically. It gives engineers a common language for reasoning about whether an algorithm will hold up at scale.

Binary Quantization

An aggressive quantization method that converts each embedding dimension to a single bit by thresholding at zero: positive values become 1, others become 0. This reduces storage by up to 32x compared to 32-bit floats and enables similarity comparisons via Hamming distance, which is orders of magnitude faster than the floating-point dot product. The accuracy drop is larger than with scalar quantization, but the approach is viable in retrieval pipelines where binary vectors are used for first-stage candidate selection and full-precision embeddings handle rescoring.

BM25

A lexical search algorithm that ranks documents based on how often query terms appear, adjusted for document length and overall word frequency. BM25 is the standard baseline for text search and remains effective for queries where exact keyword matching matters. It is widely used alongside embedding-based approaches in hybrid search systems.

BPE (Byte Pair Encoding)

A subword tokenization method that starts with individual characters and iteratively merges the most frequent pairs into new tokens. The result is a vocabulary that represents common words as single tokens and handles rare words through subword decomposition. Widely used in modern language models.