Multilingual semantic search with jina-embeddings-v5-text

Get hands-on with Elasticsearch: Dive into our sample notebooks in the Elasticsearch Labs repo, start a free cloud trial, or try Elastic on your local machine now.

Today, we’re pleased to announce that semantic_text now defaults to the jina-embeddings-v5-text family on Elastic Inference Service (EIS), bringing built-in multilingual inference with no additional configuration required.

EIS provides managed, GPU-accelerated inference tightly integrated with Elasticsearch. With EIS, you don’t need to host, scale, or maintain infrastructure for embedding models.

Semantic search retrieves results based on meaning. Text is converted into vector embeddings so queries can match related concepts, even when the exact words differ.

The semantic_text field type simplifies this entire workflow, with automatic chunking, embedding generation at index time, and seamless querying via the semantic query, without building custom pipelines or managing separate model inference.

The jina-embeddings-v5-text model family just launched on EIS, giving developers powerful multilingual embeddings accessible as part of the core semantic_text workflow. So now your semantic search works across languages out of the box, and global datasets, such as support articles, product descriptions, user reviews, and multilingual websites, work without extra configuration.

This default opens up broad, globe-spanning semantic retrieval with no operational overhead.

jina-embeddings-v5-text

The jina-embeddings-v5-text models represent the latest generation of compact, high-performance multilingual embedding models on EIS.

State-of-the-art multilingual performance: Top scores on MMTEB benchmarks across hundreds of languages. jina-embeddings-v5-text-nano leads models under 500M parameters, and jina-embeddings-v5-text-small outperforms significantly larger alternatives.
Multiple task capabilities: Spanning across retrieval, semantic matching, clustering, and classification.
Flexible choices to fit your use case: Two model sizes (small, nano) let you balance speed, cost, and quality.
Long-context support: Embed long texts efficiently, ideal for document collections with extended context.

Get started

1. Create index

Define a semantic_text field with no additional configuration. Embeddings will be generated automatically at index time using the default model. For production workloads, explicitly specify the model to ensure consistent behavior and results.

2. Index multilingual documents

Add product reviews in six different languages. Each document’s review field is automatically embedded at ingest time, with no separate pipeline or preprocessing needed.

3. Search across languages with a query in English

The results show all six reviews ranked by semantic relevance to the English query:

Notice that the French review ranks first, even above the English one. That's because "très confortable pour les longs trajets en avion" ("very comfortable for long trips by plane") is a closer semantic match to the query than the English review, which splits its focus across noise cancellation, battery life, and flights. This demonstrates the jina-embeddings-v5-text-small ability to rank by meaning, not language.

4. Search across languages with a Japanese query

The results show all six reviews ranked by semantic relevance to the Japanese query (“Ideal for long-haul flights”):

The ranking is nearly identical to the English query: French and English still lead because they're the most semantically relevant to "perfect for long flights," regardless of query language. The Japanese review didn't get artificially boosted just because the query was in Japanese. It ranks fourth because it discusses wearing comfort, not flights. Semantic relevance takes priority over language matching.

Note: For English-only use cases

If you prefer a sparse representation or would like to continue to use Elastic Learned Sparse EncodeR (ELSER) for English workloads, ELSER remains available and fully supported as an option for semantic_text.

You can explicitly choose ELSER by specifying inference_id: ".elser-2-elastic in your mappings when creating an index.

Conclusion: Semantic search without borders

With semantic_text now defaulting to the jina-embeddings-v5-text family on Elastic Inference Service, multilingual semantic search becomes the standard developer experience in Elasticsearch. This means developers can build search, retrieval augmented generation (RAG), and AI applications that work across global datasets without stitching pipelines together.

Create a semantic_text field, index your data, and start searching. All Elastic Cloud trials have access to Elastic Inference Service. Try it now on Elastic Cloud Serverless or Elastic Cloud Hosted, or use EIS via Cloud Connect with your self-managed cluster.

このコンテンツはどれほど役に立ちましたか？

役に立たない

やや役に立つ

非常に役に立つ

問題を報告する

A picture is worth 1.5x the words: What we learned benchmarking product search embeddings

Vector Database Relevance+1

2026年7月16日

A picture is worth 1.5x the words: What we learned benchmarking product search embeddings

We benchmarked two embedding models on 5,000 real products and found that combining image and text beats either alone by up to 50%. Here's the data and the model that won.

による: Sofia Vasileva

How BBQ shrinks Jina v5 embeddings by 29x without losing recall in Elasticsearch

Vector Database Jina AI+1

2026年7月10日

How BBQ shrinks Jina v5 embeddings by 29x without losing recall in Elasticsearch

A hands-on test comparing BBQ and float32 vector indices in Elasticsearch, measuring memory, disk and recall@10 across five languages.

による: Jeffrey Rengifo

jina-clip-v2 brings text-to-image search across 89 languages to Elasticsearch, no GPU needed

Jina AI Hybrid Search+1

2026年6月23日

jina-clip-v2 brings text-to-image search across 89 languages to Elasticsearch, no GPU needed

Run multimodal search across 89 languages inside Elasticsearch with jina-clip-v2: one embedding space for text and images, with no separate model infrastructure to manage.

KJ RD BJ

による: Kapil Jadhav, Ranjana Devaji および Brendan Jugan

Small model, big benchmarks: how Jina-VLM beat the competition at 2.4B and what ICLR told us is coming next

Jina AI

2026年5月27日

Small model, big benchmarks: how Jina-VLM beat the competition at 2.4B and what ICLR told us is coming next

Jina-VLM is a 2.4B open multilingual VLM leading VQA benchmarks across 29 languages. Plus: five days of ICLR 2026 takeaways on RLVR, sparse embeddings and retrieval.

AK GM SM

による: Andreas Koukounas, Georgios Mastrapas および Scott Martens

One index, all media: Introducing jina-embeddings-v5-omni

Jina AI

2026年5月11日

One index, all media: Introducing jina-embeddings-v5-omni

jina-embeddings-v5-omni lets you embed text, images, video, and audio into a single Elasticsearch index and query across all of them at once.

による: Scott Martens

Semantic search, now multilingual by default