Jina embeddings v3 now available on Gemini Enterprise Agent Platform Model Garden

Jina search foundation model, jina-embeddings-v3, is now self-deployable on Gemini Enterprise Agent Platform Model Garden, with more to follow. Run jina-embeddings-v3 on a single L4 GPU inside your own VPC.

Get hands-on with Elasticsearch: Dive into our sample notebooks in the Elasticsearch Labs repo, start a free cloud trial, or try Elastic on your local machine now.

Today we’re launching jina-embeddings-v3, the first Jina search foundation model to be available on Gemini Enterprise Agent Platform Model Garden as a self-deployable partner model. Self-deployment means the model runs on GPU instances inside your Google Cloud project and Virtual Private Cloud (VPC). No external API calls, no per-token metering, no rate limits.

With this integration, Elasticsearch users gain a new deployment option that keeps data inside their security perimeter, delivers predictable infrastructure costs, and runs natively on Google Cloud. At the same time, the broader Google Cloud ecosystem gains access to Jina's purpose-built, state-of-the-art search and retrieval models.

This is the first stage of a broader rollout. Together with the models coming next, the lineup will form a complete retrieval stack: Embed your data, embed queries, retrieve and rerank candidates, and extend search to images with multimodal embeddings, all on infrastructure you control. You can start today with jina-embeddings-v3, the model already powering production search pipelines across the Elasticsearch ecosystem via Elastic Inference Service (EIS).

ModelTypeParametersKey capabilityStatus on Model Garden
`jina-embeddings-v3`Text embedding572MProven multilingual workhorse, 8K context, 1024 dim output, truncatable to 32Available now
`jina-embeddings-v5-text-small`Text embedding677MState-of-the-art sub-1B multilingual, 32K context, 1024 dim output, truncatable to 32Coming soon
`jina-embeddings-v5-text-nano`Text embedding239MBest-in-class under 500M params, 8K context, 768 dim output, truncatable to 32Coming soon
`jina-reranker-v3`Reranker600MListwise reranker, 131K context, up to 64 documentsComing soon
`jina-clip-v2`Multimodal embedding900MText + image in shared space, 89 languages, and 8K text context, 512×512 imagesComing soon

Every model runs on a single NVIDIA L4 (24 GB), the most cost-efficient GPU tier on Google Cloud. Most other embedding models on Google Cloud Model Garden require an A100 80 GB or H100, roughly three times the per-hour instance cost before you even start counting tokens.

No additional commercial license is required when deployed through Vertex AI.

Why Model Garden?

Why deploy through Model Garden instead of hitting an API? It comes down to three things: control, cost, and context.

Your data never leaves the house

The biggest draw for most developers is the self-deploy architecture. When you deploy a Jina model through Model Garden, the weights run on GPU instances inside your own Google Cloud project and your own VPC. This is a game-changer for anyone working in industries with data security concerns, like finance or healthcare. Because there are no external API calls, your sensitive data stays within your security perimeter.

Scaling with prediction

Instead of paying every time you embed a sentence or rerank a document, you pay a flat hourly instance cost. And because every Jina model can run on a single NVIDIA L4, the most affordable GPU tier on Google Cloud, the barrier to entry is low. Whether you process a thousand requests or a billion, your infrastructure bill stays predictable. This is a setup that actually rewards you for growing your traffic rather than taxing you for it.

Everything under one roof

If your data is already sitting in Elasticsearch on Google Cloud, BigQuery, or Cloud Storage, it makes sense to keep your inference engines nearby. By deploying through Model Garden, Jina search foundation models inherit all the enterprise features you are already using: identity and access management (IAM) for access control, unified billing on your existing Google Cloud invoice, and the ability to plug into Vertex AI Pipelines for machine learning operations (MLOps) workflows.

While the Jina AI Cloud API and Elastic Cloud offer the fastest path for bursty traffic or existing search workflows, Model Garden is ideal for enterprise applications requiring strict data security and predictable costs at scale. Elastic wants to meet you where you are.

Jina AI models

jina-embeddings-v3

Our proven multilingual embedding model with 572M parameters and 8K token context. Scores 65.5 on Massive Text Embedding Benchmark (MTEB) English. Supports five task-specific Low-Rank Adaptation (LoRA) adapters (retrieval query/passage, text-matching, classification, clustering) and Matryoshka truncation from 1024 to 64 dimensions. Already widely adopted across the Elasticsearch ecosystem via EIS.

We’re leading with v3 because many production systems already depend on it. If you’re migrating a v3-based pipeline to Google Cloud, you can now run the same model natively without changing your embedding dimensions or reindexing.

jina-embeddings-v5-text (small and nano)

Our fifth-generation text embedding models, released February 2026, achieve top-tier performance, competing with models many times their size.

v5-text-small (677M) scores 67.0 on the Multilingual MTEB (MMTEB) benchmark suite, encompassing 131 tasks of nine task types, and 71.7 on the MTEB English benchmark. It’s the strongest sub-1B multilingual embedding model on the MTEB Leaderboard.

v5-text-nano (239M) scores 65.5 on MMTEB. No other model under 500M parameters reaches this level. At less than half the size of most comparable models, it’s the natural choice for edge and latency-sensitive deployments.

Both models support:

  • Four task-specific LoRA adapters: Retrieval, text-matching, classification, clustering. Selecting an appropriate adapter via task parameter at inference time.
  • Matryoshka dimension truncation: Reduce embedding dimensions from 1024 (or 768 for nano) down to 32. Quality loss is minimal at moderate truncation (for example, 256 dims). Halving dimensions roughly halves storage.
  • Binary quantization: Compress 1024-dim embeddings from 2KB to 128 bytes with binarization. Special training makes this compression minimal losses.
  • Multilingual: 119 languages (small) and 93 (nano).

jina-reranker-v3

A 0.6B parameter multilingual listwise reranker built using a last but not late interaction architecture. The query and up to 64 candidate matches are entered into a single 131K-token context window, and the model performs cross-document comparison before scoring. Jina Reranker v3 achieves 61.94 nDCG@10 on BEIR, outperforming the model being 6× smaller in size.This is fundamentally different from pointwise rerankers that score each document in isolation, producing better results, especially for passage retrieval from single documents.

jina-clip-v2

A 0.9B multimodal, multilingual embedding model that maps text and images into a shared 1024-dimensional space. It supports:

  • 89 languages for text-image retrieval.
  • 512×512 image resolution.
  • 8K token text input.
  • Matryoshka truncation from 1024 to 64 dimensions for both modalities.

Highly competitive on image-to-text benchmarks, including multilingual tasks.

Getting started

Jina Embeddings v3 is live on Model Garden today. Here’s how to get it running.

You need a Google Cloud project with the Vertex AI API enabled and enough GPU quota for at least one g2-standard-8 instance (NVIDIA L4). If you’re new to Google Cloud, start with the setup guide.

The Model Garden page for Jina Embeddings v3 walks you through the full flow: Upload the model, create an endpoint, pick your machine type, and deploy. Open it in your own project, and follow the guided steps. A100 and H100 machines are also available where region and quota allow, but L4 is all you need to start.

From click to first embedding, the whole process takes a few minutes.

What comes next

Jina Embeddings v3 is the starting point. In the coming weeks, we’ll bring the rest of the Jina retrieval stack to Model Garden: v5 text embeddings (small and nano), jina-reranker-v3, and jina-clip-v2 for multimodal search. All will run on a single L4 GPU with the same self-deploy model.

How helpful was this content?

Not helpful

Somewhat helpful

Very helpful

Related Content

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as you are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself