Quantization
Reducing the numerical precision of model weights or activations, for example converting 32-bit floating point values to 8-bit integers, to shrink model size and accelerate computation. Quantization can be applied after training or incorporated during training for better accuracy retention. In vector search, embeddings are also quantized, using scalar or binary quantization, to reduce index size and improve retrieval speed, with accuracy impact varying by precision level and quantization method.
Query
The input a user provides to a search system to express their information need. In embedding-based search, the query is converted into a vector and compared against document embeddings. Queries range from a few keywords to full natural language questions.
Query Embedding
A dense vector representation of a search query, used to retrieve semantically similar passages at inference time. In asymmetric search, query and document embeddings are generated differently, using model-specific prefixes such as 'query:' and 'passage:' in E5, because queries and documents occupy different regions of the vector space. Queries are short and express an information need; documents are longer and contain information. This contrasts with symmetric search, where both inputs are embedded identically.
Query Expansion
Enriching the original query with additional terms or rephrased versions to improve recall. For example, expanding "ML" to also search for "machine learning." Query expansion can use keyword synonyms or a language model to generate alternative phrasings.