jina-clip-v2 brings text-to-image search across 89 languages to Elasticsearch, no GPU needed

Run multimodal search across 89 languages inside Elasticsearch with jina-clip-v2: one embedding space for text and images, with no separate model infrastructure to manage.

Get hands-on with Elasticsearch: Dive into our sample notebooks in the Elasticsearch Labs repo, start a free cloud trial, or try Elastic on your local machine now.

jina-clip-v2 (865M parameters) is now available on Elastic Inference Service (EIS): multilingual multimodal embeddings for text and images across 89 languages, running inside Elasticsearch with no separate model hosting or GPU infrastructure to manage.

Text queries retrieve images, screenshots retrieve documentation, and PDFs, charts, and infographics index into the same vector space. The model supports Matryoshka truncation, so you can drop from 1,024 to 512 or 256 dimensions when storage matters, with minimal quality loss.

jina-clip-v2 is one of several Jina embedding models now available on EIS. For workloads that also span video and audio, jina-embeddings-v5-omni covers all four modalities in a single index: nearly 100 languages and a 0.67B parameter base small enough to run on conventional GPU servers. jina-clip-v2 remains the focused option for cross-modal retrieval between text and images.

How multimodal search works in jina-clip-v2

jina-clip-v2 is a dual-encoder model where separate text and image encoders produce embeddings in the same vector space. This allows text and images to be retrieved interchangeably. A query like “red sports car” can return matching images, an image can surface relevant product descriptions or documentation, and screenshots can map directly to tickets, dashboards, or logs. This isn’t a stitched pipeline of models. It’s a single, shared embedding space across modalities, combining a multilingual Jina-XLM-RoBERTa text encoder with an EVA02-L vision encoder.

Multilingual and document-aware by design

Unlike traditional CLIP models that focus primarily on short English captions, jina-clip-v2 is trained on multilingual text-text and text-image pairs, across 89 languages, and on visually complex datasets at progressively higher resolutions.

EIS allows you to run managed models directly inside Elasticsearch. There’s no separate model hosting layer to provision, no GPU infrastructure to manage, and no external embedding service to maintain.

With jina-clip-v2 on EIS, you can:

  • Generate text and image embeddings where your data already lives.
  • Index multimodal vectors alongside structured and unstructured content.
  • Combine vector search with BM25 using hybrid retrieval.
  • Power multimodal retrieval augmented generation (RAG) pipelines grounded in images and documents.

How to run multimodal search with jina-clip-v2 on EIS

The jina-clip-v2 endpoint is preconfigured on Elastic Inference Service. To generate embeddings, call the inference endpoint from the Elasticsearch dev console:

This is the response:

Using jina-clip-v2 embeddings in a search query:

Get endpoint config

Basic text request

Multimodal batch (text + image as separate vectors)

The example below shows how to send both a text and an image input as separate items, each producing its own embedding:

Create custom endpoint with minimum dimensions

Multimodal search in Elasticsearch, from text to images to RAG

By making jina-clip-v2 available on EIS, multimodal search becomes a first-class capability inside Elasticsearch.

Text and images can be indexed into the same vector space. Queries can retrieve across modalities and languages. Hybrid search can combine lexical precision with multimodal semantics. RAG systems can ground responses in charts, screenshots, and document layouts, not just plain text.

All Elastic Cloud trials have access to Elastic Inference Service. Try it now on Elastic Cloud Serverless or Elastic Cloud Hosted, or use EIS via Cloud Connect with your self-managed cluster.


이 콘텐츠가 얼마나 도움이 되었습니까?

도움이 되지 않음

어느 정도 도움이 됩니다

매우 도움이 됨

관련 콘텐츠

최첨단 검색 환경을 구축할 준비가 되셨나요?

충분히 고급화된 검색은 한 사람의 노력만으로는 달성할 수 없습니다. Elasticsearch는 여러분과 마찬가지로 검색에 대한 열정을 가진 데이터 과학자, ML 운영팀, 엔지니어 등 많은 사람들이 지원합니다. 서로 연결하고 협력하여 원하는 결과를 얻을 수 있는 마법 같은 검색 환경을 구축해 보세요.

직접 사용해 보세요