Glossary

This glossary describes essential terms and concepts to help you understand Elasticsearch and its related technologies.

Vector

An ordered list of numbers. In AI and search, vectors are the format that machines use to store the meaning of text, images, or other data. Each number in the vector captures some aspect of the input, and the full list together forms a kind of fingerprint that can be compared against other vectors.

Vector Index

A data structure optimized for efficient vector search. Instead of scanning every vector sequentially, an index organizes vectors so that similar ones can be found quickly. HNSW, IVF, and various tree-based structures are common types.

Vision-Language Model (VLM)

A model that accepts both images and text as input and generates text output conditioned on both modalities. VLMs typically combine a vision encoder, such as a ViT, with a language model, connected through a projection layer or cross-attention mechanism. This architecture enables image captioning, visual question answering, and multimodal embedding generation. For visual document retrieval, VLMs such as GPT-4V and LLaVA parse complex materials where text and visual layout carry meaning together, including infographics, charts, and tables, in ways that neither OCR nor conventional image models can.

Visual Document Retrieval

A retrieval approach that represents documents as images rather than extracted text, capturing layout, typography, charts, and spatial relationships that carry meaning but are lost in standard text extraction. This matters for slides, infographics, scanned pages, and tables where visual structure is inseparable from content. Models like ColPali embed page images directly using vision language models, enabling retrieval based on the full visual representation of a document rather than its text alone.

Vocabulary

The complete set of tokens a model recognizes. A typical model has a vocabulary of 30,000 to 100,000 tokens. Words not in the vocabulary are broken into smaller subword pieces that are. Larger vocabularies represent more words as single tokens but require more model parameters.

Prêt à créer des expériences de recherche d'exception ?

Une recherche suffisamment avancée ne se fait pas avec les efforts d'une seule personne. Elasticsearch est alimenté par des data scientists, des ML ops, des ingénieurs et bien d'autres qui sont tout aussi passionnés par la recherche que vous. Mettons-nous en relation et travaillons ensemble pour construire l'expérience de recherche magique qui vous permettra d'obtenir les résultats que vous souhaitez.

Jugez-en par vous-même