Vector
An ordered list of numbers. In AI and search, vectors are the format that machines use to store the meaning of text, images, or other data. Each number in the vector captures some aspect of the input, and the full list together forms a kind of fingerprint that can be compared against other vectors.
Vector Index
A data structure optimized for efficient vector search. Instead of scanning every vector sequentially, an index organizes vectors so that similar ones can be found quickly. HNSW, IVF, and various tree-based structures are common types.
Vector Search
Finding the most similar vectors in a database to a given query vector. This is the mechanism behind semantic search: once content is embedded, retrieving relevant results means finding the nearest embeddings. Efficient vector search at scale requires specialized data structures and algorithms.
Vision-Language Model (VLM)
A model that accepts both images and text as input and generates text output conditioned on both modalities. VLMs typically combine a vision encoder, such as a ViT, with a language model, connected through a projection layer or cross-attention mechanism. This architecture enables image captioning, visual question answering, and multimodal embedding generation. For visual document retrieval, VLMs such as GPT-4V and LLaVA parse complex materials where text and visual layout carry meaning together, including infographics, charts, and tables, in ways that neither OCR nor conventional image models can.
Visual Document Retrieval
A retrieval approach that represents documents as images rather than extracted text, capturing layout, typography, charts, and spatial relationships that carry meaning but are lost in standard text extraction. This matters for slides, infographics, scanned pages, and tables where visual structure is inseparable from content. Models like ColPali embed page images directly using vision language models, enabling retrieval based on the full visual representation of a document rather than its text alone.
Vocabulary
The complete set of tokens a model recognizes. A typical model has a vocabulary of 30,000 to 100,000 tokens. Words not in the vocabulary are broken into smaller subword pieces that are. Larger vocabularies represent more words as single tokens but require more model parameters.