Glossary

Scalar Quantization

A quantization approach that maps each floating-point value in an embedding to a lower-precision number (such as an 8-bit integer) using a linear transformation. It can reduce embedding storage by up to 4x with minimal impact on search quality.

Second-Stage Ranking

The phase where a reranker scores and reorders the candidate set returned by the first-stage retrieval. While first-stage retrieval operates over millions of vectors, second-stage ranking evaluates only hundreds of candidates, making the higher computational cost per pair practical. This is typically the final step before results are returned to the user or passed as context to a generator in a RAG pipeline, and is the step that most directly determines result quality.

Segmentation

Dividing text into meaningful units, such as sentences, paragraphs, or sections, before embedding or indexing. Unlike fixed-size chunking, segmentation respects linguistic boundaries, ensuring that each unit passed to the embedding model carries coherent meaning. Common approaches range from rule-based methods using punctuation and whitespace to NLP-based sentence boundary detection and semantic segmentation, which uses embedding similarity to detect topic shifts. Poor segmentation fragments context across boundaries, directly degrading retrieval quality.

Self-Attention

A form of attention where each token in a sequence attends to all other tokens in the same sequence, including itself, to build context-aware representations. Unlike cross-attention, which relates two different sequences, self-attention operates within a single input. In practice, transformers apply this in parallel across multiple heads, each capturing different types of relationships between tokens.

Semantic Search

Search that finds results based on meaning rather than exact keyword matches. Instead of looking for documents containing the same words as the query, semantic search compares embeddings to find content that is conceptually relevant. It can match "affordable housing" with a document about "low-cost apartments" even if those exact words don't appear.

Semantic Similarity

How close in meaning two pieces of content are, regardless of whether they share the same words. "The dog sat on the mat" and "a canine rested on the rug" are semantically similar despite having no words in common. Embedding models are designed to capture this kind of meaning-level similarity.

Sentence Embedding

An embedding that represents the meaning of an entire sentence as a single vector. Unlike word-level embeddings, sentence embeddings capture the combined meaning of all words in context, including how they relate to each other.

SentencePiece

A language-independent tokenization library that treats input as a raw stream of characters, including spaces. Unlike BPE and WordPiece, it doesn't require pre-tokenization into words, making it useful for languages that don't use spaces, such as Chinese or Japanese.

Sequence Length

The actual number of tokens in a given input, as opposed to the model's maximum context length. A model with a context length of 8,192 tokens might receive an input with a sequence length of 500. Shorter sequences are padded internally; longer ones are truncated.

Similarity Score

A number representing how similar two pieces of content are, based on comparing their embeddings with a distance metric. Higher scores mean greater similarity. The scale depends on the metric: cosine similarity ranges from -1 to 1; other metrics use different ranges. Comparing embedding vectors can tell you about relative similarity, not absolute relevance.

Sliding Window

A chunking strategy where a fixed-size window moves across a document with overlap between consecutive chunks. A 512-token window with 128-token overlap means each chunk shares 128 tokens with the next. The overlap ensures information at chunk boundaries isn't lost.

Sparse Vector

A vector where the vast majority of values are zero, with only a few non-zero entries. Traditional keyword search methods like BM25 produce sparse vectors: each position corresponds to a word in the vocabulary, and only words that appear in the document have non-zero values. Sparse vectors work well for exact word matching, but don't capture meaning the way dense vectors do.

Student Model

The smaller model being trained to approximate the teacher's behavior. The student benefits from the teacher's learned knowledge without needing the same computational resources, making it suitable for deployment where resources are limited.

Subword Tokenization

A tokenization approach that breaks words into smaller, meaningful pieces rather than treating each word as a whole unit. This lets models handle rare or unfamiliar words by composing them from known parts. Common subword methods include BPE, WordPiece, and SentencePiece.

Symmetric vs. Asymmetric Search

Symmetric search compares items of the same type (e.g., finding similar sentences).