Glossary

MAP (Mean Average Precision)

A metric that evaluates ranking quality by computing precision at each position where a relevant result appears, then averaging across queries. MAP rewards systems that rank all relevant results highly, not just the first one.

Matryoshka Representation Learning (MRL)

A training technique — named after Russian nesting dolls — where a model learns to encode the most important information into the first dimensions of the embedding. You can truncate embeddings to smaller sizes, say from 1024 to 256 dimensions, while retaining most of the semantic quality. It lets users trade accuracy for efficiency without retraining the model. Jina Embeddings support Matryoshka dimension truncation.

Mean Pooling

A pooling method that averages the output vectors of all tokens to produce the final embedding. Mean pooling tends to produce more robust embeddings than CLS pooling because it incorporates information from every token rather than relying on a single position.

Model Card

Documentation that accompanies a published model, describing its intended use, training data, performance metrics, limitations, and ethical considerations. Model cards help users understand what a model is good at, where it may fall short, and how it should be used.

Model Parameters

The internal values — weights and biases — that a model learns during training. A model with 137 million parameters has 137 million adjustable values that together determine its behavior. More parameters generally mean more capacity to learn, but also more computational cost.

Model Serving

The operational layer that exposes a trained model to production traffic, handling request routing, load balancing, batching, hardware allocation, and scaling. Serving infrastructure must balance latency, throughput and cost, often applying optimizations such as quantization or ONNX export to meet production requirements. Cloud hosted inference APIs like Jina abstract this complexity, allowing users to call models without managing serving infrastructure directly.

Model Size

The number of parameters in a model, expressed in millions (M) or billions (B), used as a proxy for capacity and resource requirements. Practitioners typically choose across three tiers: small models around 100M parameters for low latency or edge deployment, medium models around 300M for balanced performance and cost, and large models above 1B for maximum quality. Quality gains diminish at larger sizes, and a smaller model fine-tuned on domain specific data can outperform a larger general model for specific tasks.

Model Weights

The learnable numerical values, weights, and biases that a model acquires during training and that fully determine its behavior at inference time. A model with 137 million parameters has 137 million such values adjusted through backpropagation. More parameters increase representational capacity but also computational and memory costs, and do not guarantee better performance without sufficient training data.

MRR (Mean Reciprocal Rank)

A metric that measures how quickly the first relevant result appears. It is the average of 1 divided by the rank of the first correct result across multiple queries. MRR of 1.0 means the first relevant result is always at position 1.

MTEB (Massive Text Embedding Benchmark)

The most widely used benchmark for evaluating text embedding models, covering 56 datasets across 8 task categories including retrieval, classification, clustering, and semantic similarity, with multilingual tracks spanning over 100 languages. A model's MTEB score is the standard metric for comparing embedding models. The main limitation is that aggregate scores can obscure weak performance on specific tasks or domains, so practitioners should evaluate task specific subtracks rather than relying on overall rankings alone.

Multi-Head Attention

A design where the attention mechanism runs multiple times in parallel, each "head" focusing on different types of relationships. One head might attend to syntactic structure; another might capture semantic similarity. The outputs of all heads are combined into a single representation.

Multi-Task Learning

A training strategy where a single model learns to perform multiple tasks, such as retrieval, classification, and clustering, through shared parameters and representations. Shared learning encourages the model to capture a more generalizable structure than single-task training. The tradeoff is that multi-task models may underperform specialized models on any individual task, making them better suited for versatility than peak performance.

Multi-Vector Representation

Representing a single piece of content with multiple vectors rather than one. Each token or segment gets its own vector, allowing more fine-grained matching. ColBERT is the most well-known multi-vector approach. Multi-vector methods offer more precision than single-vector embeddings but require more storage.

Multilingual Embedding

An embedding model trained to represent text across multiple languages, typically dozens to over 100, in a shared vector space. Training on aligned parallel corpora with contrastive objectives pulls translations close together, enabling cross-lingual retrieval: a query in English can retrieve relevant results in German or Japanese without translation. The tradeoff is that supporting more languages with a fixed parameter budget can dilute per-language performance compared to dedicated monolingual models.

Multimodal Embedding

An embedding that represents different modalities, text, images and code, in a shared vector space. This is achieved by training models with contrastive objectives that align semantically similar content across modalities, as in CLIP for text and images. The result is cross-modal retrieval: a text query can retrieve relevant images, and an image can retrieve relevant text descriptions, because all modalities occupy the same geometric space and similarity is computed uniformly across them.