Glossary

Image Embedding

A dense vector representation of an image produced by a vision model, typically a vision transformer or CNN. Modern image embeddings capture both low-level visual features, such as colors and textures, and high-level semantic content, such as objects and scenes. When produced by jointly trained multimodal models like CLIP, image and text embeddings share a common vector space, enabling similarity search across both modalities.

In-Batch Negatives

A training efficiency technique where positive pairs from other examples in the same batch are reused as negatives. If a batch contains 64 query-passage pairs, each query treats the other 63 passages as negatives. This dramatically increases the number of comparisons without requiring additional data.

In-Domain

Data or evaluation scenarios that closely match the distribution a model was trained on. In-domain performance measures how well a model has learned its training task but is a weak test of generalization, as inputs in production rarely match training data precisely.

Indexing

Adding content to a search system so it can be found later. For semantic search, this means generating embeddings for all documents and storing them in a vector index. Indexing is done ahead of time so searches can be answered quickly.

Inference

Running a trained model on new inputs to produce outputs, as opposed to training the model. When you send text to an embedding API and get back a vector, the model is performing inference. Inference speed and cost are the main considerations for production deployment.

InfoNCE Loss

A contrastive loss function that frames the task as classification: given an anchor, identify the correct positive from a set of candidates (one positive and many negatives). It is the foundation of training approaches used in CLIP and many modern embedding models.

Input Length / Token Limit

The practical constraint on how much text you can send to a model in a single request. This is determined by the model's context window and may be further limited by API settings. Exceeding it typically results in truncation or an error.

Instruction Tuning

A training approach in which the model learns to follow natural-language instructions describing the desired task. For embedding models, this means the model can adjust its behavior based on a text prefix — generating different embeddings for the same text depending on whether the instruction says "retrieve a relevant passage" or "classify the topic."

Inverted Index

A data structure used in lexical search that maps each word to the list of documents containing it. When a query arrives, the inverted index quickly identifies all documents with the query terms. This is the backbone of traditional search engines and is used alongside vector indexes in hybrid search.