Glossary

Parameter-Efficient Fine-Tuning (PEFT)

A family of techniques that adapt large pretrained models to specific domains or tasks without updating all original weights. Methods like LoRA introduce small trainable components alongside frozen base parameters, while adapters insert lightweight trainable layers between existing ones. This dramatically reduces GPU memory usage and training time compared to full fine-tuning, and produces compact adapter weights that can be swapped onto a shared base model for efficient deployment.

Passage

A section of a longer document, typically a paragraph or few sentences, used as the unit of indexing in retrieval systems. Splitting documents into passages allows retrieval to return the specific excerpt most relevant to a query rather than an entire document. Chunking strategy, whether by paragraph, fixed token count or semantic boundary, directly affects retrieval quality, and passages are often created with overlapping content between adjacent chunks to avoid splitting relevant context across boundaries.

Passage Embedding

A dense vector representation of a passage, typically a paragraph or section of a document. In retrieval systems, documents are split into passages and embedded separately because most embedding models have token limits and perform better on shorter, focused inputs. This allows search to return the most relevant passage directly, improving both retrieval precision and the quality of context passed to downstream models.

Passage Retrieval

Retrieving specific passages or sections rather than whole documents. Passage-level retrieval is more precise because it identifies the exact section that addresses the query, rather than returning an entire document that may contain the answer somewhere within it.

Pointwise Scoring

A reranking approach where each query-document pair is scored independently, without considering other results. The final ranking is obtained by sorting these independent scores. Most cross-encoder rerankers use pointwise scoring.

Pooling

The method used to combine the individual token-level outputs of a model into a single vector. Transformer models produce one output vector per token; pooling reduces these into one embedding for the entire input. The pooling strategy affects the quality of the resulting embedding.

Positive Pair

Two pieces of content that should be treated as similar during training. For example, a question and its correct answer, or two paraphrases of the same sentence. The model is trained to produce embeddings that are close together for positive pairs.

Pre-Training

The initial, large-scale training phase, where a model learns general language understanding from massive amounts of text. The model develops a broad base of knowledge about language structure, word meanings, and the world. This foundation is later refined through fine-tuning for specific tasks.

Precision@k

The fraction of results in the top k that are actually relevant. Precision@10 of 0.6 means 6 out of 10 top results are relevant.

Prefix Instruction

A short text instruction prepended to the input sequence to shift an embedding model's representation toward a specific task. By becoming part of the input, the prefix influences attention patterns and the resulting vector. Models like E5 and Instructor are explicitly designed to support this, using prefixes such as 'query:' or 'Represent this sentence for classification:' to adapt a single model to retrieval, clustering, or categorization without task-specific fine-tuning.