Vertex AI

Vertex AI offers a diverse suite of generative AI models through various APIs, enabling you to build intelligent applications for a wide range of use cases. These models, powered by Google's advanced research, empower you to generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.

Gemini API

Google Gemini models are designed for multimodal applications. Gemini models accept prompts that include, for example, text and images, and then return a text response. Gemini also supports function calling, which lets developers pass a description of a function and then the model returns a function and parameters that best matches the description. Developers can then call that function in external APIs and services.

Gemini 1.5 Pro: This advanced model boasts a large context window, handling up to 1 million tokens and allowing for nuanced understanding of complex prompts and generation of comprehensive responses.

Gemini 1.0 Pro & Gemini 1.0 Pro Vision: These models are perfect for natural language tasks, multi-turn conversations, and code generation. They also offer the ability to incorporate images, PDFs, and videos into your prompts, making them versatile for multimodal applications.

Gemini 1.0 Ultra & Gemini 1.0 Ultra Vision: As Google's most capable multimodal models, these are optimized for intricate tasks involving instruction understanding, code generation, and reasoning. They offer support for multiple languages and are currently available to a select group of customers.

Text embeddings

Embeddings for Text (textembedding-gecko) is the name for the model that supports text embeddings. Text embeddings are a NLP technique that converts textual data into numerical vectors that can be processed by machine learning algorithms, especially large models. These vector representations are designed to capture the semantic meaning and context of the words they represent.

There are a few versions available for embeddings. textembedding-gecko@003 is the latest stable embedding model with enhanced AI quality, and textembedding-gecko-multilingual@001 is a model optimized for a wide range of non-English languages.

Multimodal embeddings

The Embeddings for Multimodal (multimodalembedding) model generates dimension vectors (128, 256, 512, or 1408 dimensions) based on the input you provide. This input which can include any combination of text, image, or video. The embedding vectors can then be used for other subsequent tasks like image classification or content moderation.

The text, image, and video embedding vectors are in the same semantic space with the same dimensionality. Therefore, these vectors can be used interchangeably for use cases like searching images by text, or searching video by image.

Notebooks

Example Chatbot Application

Share this article