Adapter
A small, trainable module inserted into a pre-trained model to adapt it to a new task without modifying the original weights. Adapters add only a fraction of the model's parameters. Multiple adapters can be swapped in and out for different tasks while the base model stays frozen.
API (Application Programming Interface)
A standardized way for software to communicate with other software. Embedding models are typically accessed through an API: you send text, you get back vectors. The API abstracts away the complexity of running the model.
Approximate Nearest Neighbor (ANN)
A class of algorithms that find vectors close to a query vector without guaranteeing exact results, trading a small, tunable amount of recall for dramatically faster search. This makes it practical to search through millions or billions of vectors in milliseconds. Common approaches include HNSW, which builds a hierarchical graph structure, and IVF, which partitions the vector space into clusters. The accuracy tradeoff is measured by recall, typically the fraction of true nearest neighbors returned.
Asymmetric Search
Asymmetric search compares items of different types (e.g., matching a short query against long documents). Asymmetric search is the more common real-world scenario compared to symmetric search. Asymmetric encoding aims to make documents less "spread out" in embedding space and queries more so, but it reduces size bias rather than eliminating it.
Attention Mechanism
The core component of transformer models that computes weighted relationships between all tokens in a sequence simultaneously. Each token queries the others to determine how much attention to pay to each, building context-aware representations. In the sentence 'The bank of the river,' attention helps the model link 'bank' to 'river,' resolving it as a riverbank rather than a financial institution