Google's NEW Multimodal Model - Gemini Embedding 2

Weaviate vector database · Beginner ·📰 AI News & Updates ·2mo ago

Skills: Multimodal LLMs90%Vector Stores80%

Google just released 𝗚𝗲𝗺𝗶𝗻𝗶 𝗘𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴 𝟮, their first fully multimodal embedding model - now also available in Weaviate. The model maps text, images, videos, audio, and PDFs into a 𝘀𝗶𝗻𝗴𝗹𝗲 𝘂𝗻𝗶𝗳𝗶𝗲𝗱 𝗲𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴 𝘀𝗽𝗮𝗰𝗲. This means you can query with text and retrieve relevant videos, or search with an image and find related documents, or any other combination - all using the same model. In this video, I've included a walkthrough of building a 𝗺𝘂𝗹𝘁𝗶𝗺𝗼𝗱𝗮𝗹 𝗣𝗗𝗙 𝗥𝗔𝗚 𝗽𝗶𝗽𝗲𝗹𝗶𝗻𝗲. We embed each PDF page as an image using Gemini Embedding 2, add it to Weaviate, then query with text to retrieve relevant PDF page images. These images are passed to Gemini Flash to generate answers using the document context. The dataset has "needles" hidden in the documents - so when we ask "what's the secret flower?", the pipeline needs to use multimodal understanding of both text and images to answer correctly. Check out the model release blog: https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-embedding-2/ PDF RAG notebook: https://github.com/weaviate/recipes/blob/main/weaviate-features/model-providers/google/multimodal_pdf_rag_gemini.ipynb 00:00 - Intro 00:31 - Multimodal embedding models 01:31 - Google's Gemini Embedding 2 02:12 - PDF RAG architecture overview 03:00 - Building a multimodal PDF RAG pipeline 04:08 - Conclusion ▬▬▬▬▬▬▬▬▬▬▬▬ CONNECT WITH US ▬▬▬▬▬▬▬▬▬▬▬▬ - Visit http://weaviate.io/ - Star us on GitHub https://github.com/weaviate/weaviate - Stay updated and subscribe to our newsletter: https://newsletter.weaviate.io/ - Try out Weaviate Cloud for free here: https://console.weaviate.cloud/ Got a question? - Forum: https://forum.weaviate.io/ Connect with us on - Twitter: https://twitter.com/weaviate_io - LinkedIn: https://www.linkedin.com/company/weaviate-io/

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: Multimodal LLMs

View skill →

INSTALL NEW UNCENSORED FaceGen Ai WebUI LOCALLY in 1 CLICK!

INSTALL NEW UNCENSORED FaceGen Ai WebUI LOCALLY in 1 CLICK!

Google Veo 3 Tutorial: How to create AI Videos in Flow, Gemini or Google Vids?

Google Veo 3 Tutorial: How to create AI Videos in Flow, Gemini or Google Vids?

AI Tool Journey

NVIDIA Clara Guardian Virtual Patient Assistant

NVIDIA Clara Guardian Virtual Patient Assistant

NVIDIA Developer

Building Multimodal Search and RAG

Building Multimodal Search and RAG

Midjourney Trick: Consistent Character in Different Images

Midjourney Trick: Consistent Character in Different Images

Ollama Multimodal: EASILY setup Llava locally & Integrate API

Ollama Multimodal: EASILY setup Llava locally & Integrate API

Related AI Lessons

Grok’s federal stall is undercutting SpaceX’s IPO growth story

SpaceX's IPO growth story is threatened by Grok's declining performance, including decreased downloads and stalled federal deals

The Next Web AI

Taiwan moves to detain three over alleged illegal high-end AI server exports to China

Taiwan investigates alleged illegal exports of high-end AI servers to China, highlighting the importance of semiconductor export controls

The Next Web AI

Top 10 AI Development Companies in Leicester UK (2026)

Discover top AI development companies in Leicester, UK, and learn how they're transforming businesses

China blocks NVIDIA’s RTX 5090D V2 imports while Jensen Huang was in Beijing

China blocks NVIDIA's RTX 5090D V2 imports, affecting AI buyers who used it as a workaround, and understand the implications of this move on the AI industry

The Next Web AI

Chapters (6)

Intro

0:31 Multimodal embedding models

1:31 Google's Gemini Embedding 2

2:12 PDF RAG architecture overview

3:00 Building a multimodal PDF RAG pipeline

4:08 Conclusion

GOOGLE Virtual Internship with AICTE !