Semantic search, now multilingual by default

semantic_text now defaults to jina-embeddings-v5-text on Elastic Inference Service, enabling multilingual semantic search in Elasticsearch.

Get hands-on with Elasticsearch: Dive into our sample notebooks in the Elasticsearch Labs repo, start a free cloud trial, or try Elastic on your local machine now.

Today, we’re pleased to announce that semantic_text now defaults to the jina-embeddings-v5-text family on Elastic Inference Service (EIS), bringing built-in multilingual inference with no additional configuration required.

EIS provides managed, GPU-accelerated inference tightly integrated with Elasticsearch. With EIS, you don’t need to host, scale, or maintain infrastructure for embedding models.

Semantic search retrieves results based on meaning. Text is converted into vector embeddings so queries can match related concepts, even when the exact words differ.

The semantic_text field type simplifies this entire workflow, with automatic chunking, embedding generation at index time, and seamless querying via the semantic query, without building custom pipelines or managing separate model inference.

The jina-embeddings-v5-text model family just launched on EIS, giving developers powerful multilingual embeddings accessible as part of the core semantic_text workflow. So now your semantic search works across languages out of the box, and global datasets, such as support articles, product descriptions, user reviews, and multilingual websites, work without extra configuration.

This default opens up broad, globe-spanning semantic retrieval with no operational overhead.

jina-embeddings-v5-text

The jina-embeddings-v5-text models represent the latest generation of compact, high-performance multilingual embedding models on EIS.

  • State-of-the-art multilingual performance: Top scores on MMTEB benchmarks across hundreds of languages. jina-embeddings-v5-text-nano leads models under 500M parameters, and jina-embeddings-v5-text-small outperforms significantly larger alternatives.
  • Multiple task capabilities: Spanning across retrieval, semantic matching, clustering, and classification.
  • Flexible choices to fit your use case: Two model sizes (small, nano) let you balance speed, cost, and quality.
  • Long-context support: Embed long texts efficiently, ideal for document collections with extended context.

Get started

1. Create index

Define a semantic_text field with no additional configuration. Embeddings will be generated automatically at index time using the default model. For production workloads, explicitly specify the model to ensure consistent behavior and results.

2. Index multilingual documents

Add product reviews in six different languages. Each document’s review field is automatically embedded at ingest time, with no separate pipeline or preprocessing needed.

3. Search across languages with a query in English

The results show all six reviews ranked by semantic relevance to the English query:

Notice that the French review ranks first, even above the English one. That's because "très confortable pour les longs trajets en avion" ("very comfortable for long trips by plane") is a closer semantic match to the query than the English review, which splits its focus across noise cancellation, battery life, and flights. This demonstrates the jina-embeddings-v5-text-small ability to rank by meaning, not language.

4. Search across languages with a Japanese query

The results show all six reviews ranked by semantic relevance to the Japanese query (“Ideal for long-haul flights”):

The ranking is nearly identical to the English query: French and English still lead because they're the most semantically relevant to "perfect for long flights," regardless of query language. The Japanese review didn't get artificially boosted just because the query was in Japanese. It ranks fourth because it discusses wearing comfort, not flights. Semantic relevance takes priority over language matching.

Note: For English-only use cases

If you prefer a sparse representation or would like to continue to use Elastic Learned Sparse EncodeR (ELSER) for English workloads, ELSER remains available and fully supported as an option for semantic_text.

You can explicitly choose ELSER by specifying inference_id: "elser" in your mappings when creating an index.

Conclusion: Semantic search without borders

With semantic_text now defaulting to the jina-embeddings-v5-text family on Elastic Inference Service, multilingual semantic search becomes the standard developer experience in Elasticsearch. This means developers can build search, retrieval augmented generation (RAG), and AI applications that work across global datasets without stitching pipelines together.

Create a semantic_text field, index your data, and start searching. All Elastic Cloud trials have access to Elastic Inference Service. Try it now on Elastic Cloud Serverless or Elastic Cloud Hosted, or use EIS via Cloud Connect with your self-managed cluster.

このコンテンツはどれほど役に立ちましたか?

役に立たない

やや役に立つ

非常に役に立つ

関連記事

最先端の検索体験を構築する準備はできましたか?

十分に高度な検索は 1 人の努力だけでは実現できません。Elasticsearch は、データ サイエンティスト、ML オペレーター、エンジニアなど、あなたと同じように検索に情熱を傾ける多くの人々によって支えられています。ぜひつながり、協力して、希望する結果が得られる魔法の検索エクスペリエンスを構築しましょう。

はじめましょう