Glossary

This glossary describes essential terms and concepts to help you understand Elasticsearch and its related technologies.

Image Embedding

A dense vector representation of an image produced by a vision model, typically a vision transformer or CNN. Modern image embeddings capture both low-level visual features, such as colors and textures, and high-level semantic content, such as objects and scenes. When produced by jointly trained multimodal models like CLIP, image and text embeddings share a common vector space, enabling similarity search across both modalities.

In-Batch Negatives

A training efficiency technique where positive pairs from other examples in the same batch are reused as negatives. If a batch contains 64 query-passage pairs, each query treats the other 63 passages as negatives. This dramatically increases the number of comparisons without requiring additional data.

In-Domain

Data or evaluation scenarios that closely match the distribution a model was trained on. In-domain performance measures how well a model has learned its training task but is a weak test of generalization, as inputs in production rarely match training data precisely.

Indexing

Adding content to a search system so it can be found later. For semantic search, this means generating embeddings for all documents and storing them in a vector index. Indexing is done ahead of time so searches can be answered quickly.

Inference

Running a trained model on new inputs to produce outputs, as opposed to training the model. When you send text to an embedding API and get back a vector, the model is performing inference. Inference speed and cost are the main considerations for production deployment.

InfoNCE Loss

A contrastive loss function that frames the task as classification: given an anchor, identify the correct positive from a set of candidates (one positive and many negatives). It is the foundation of training approaches used in CLIP and many modern embedding models.

Input Length / Token Limit

The practical constraint on how much text you can send to a model in a single request. This is determined by the model's context window and may be further limited by API settings. Exceeding it typically results in truncation or an error.

Instruction Tuning

A training approach in which the model learns to follow natural-language instructions describing the desired task. For embedding models, this means the model can adjust its behavior based on a text prefix — generating different embeddings for the same text depending on whether the instruction says "retrieve a relevant passage" or "classify the topic."

Inverted Index

A data structure used in lexical search that maps each word to the list of documents containing it. When a query arrives, the inverted index quickly identifies all documents with the query terms. This is the backbone of traditional search engines and is used alongside vector indexes in hybrid search.

Prêt à créer des expériences de recherche d'exception ?

Une recherche suffisamment avancée ne se fait pas avec les efforts d'une seule personne. Elasticsearch est alimenté par des data scientists, des ML ops, des ingénieurs et bien d'autres qui sont tout aussi passionnés par la recherche que vous. Mettons-nous en relation et travaillons ensemble pour construire l'expérience de recherche magique qui vous permettra d'obtenir les résultats que vous souhaitez.

Jugez-en par vous-même