Google's NEW Multimodal Model - Gemini Embedding 2
Google just released ๐๐ฒ๐บ๐ถ๐ป๐ถ ๐๐บ๐ฏ๐ฒ๐ฑ๐ฑ๐ถ๐ป๐ด ๐ฎ, their first fully multimodal embedding model - now also available in Weaviate.
The model maps text, images, videos, audio, and PDFs into a ๐๐ถ๐ป๐ด๐น๐ฒ ๐๐ป๐ถ๐ณ๐ถ๐ฒ๐ฑ ๐ฒ๐บ๐ฏ๐ฒ๐ฑ๐ฑ๐ถ๐ป๐ด ๐๐ฝ๐ฎ๐ฐ๐ฒ. This means you can query with text and retrieve relevant videos, or search with an image and find related documents, or any other combination - all using the same model.
In this video, I've included a walkthrough of building a ๐บ๐๐น๐๐ถ๐บ๐ผ๐ฑ๐ฎ๐น ๐ฃ๐๐ ๐ฅ๐๐ ๐ฝ๐ถ๐ฝ๐ฒ๐น๐ถ๐ป๐ฒ.
We embed each PDF page as an image using Gemini Embedding 2, add it to Weaviate, then query with text to retrieve relevant PDF page images. These images are passed to Gemini Flash to generate answers using the document context. The dataset has "needles" hidden in the documents - so when we ask "what's the secret flower?", the pipeline needs to use multimodal understanding of both text and images to answer correctly.
Check out the model release blog: https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-embedding-2/
PDF RAG notebook: https://github.com/weaviate/recipes/blob/main/weaviate-features/model-providers/google/multimodal_pdf_rag_gemini.ipynb
00:00 - Intro
00:31 - Multimodal embedding models
01:31 - Google's Gemini Embedding 2
02:12 - PDF RAG architecture overview
03:00 - Building a multimodal PDF RAG pipeline
04:08 - Conclusion
โฌโฌโฌโฌโฌโฌโฌโฌโฌโฌโฌโฌ CONNECT WITH US โฌโฌโฌโฌโฌโฌโฌโฌโฌโฌโฌโฌ
- Visit http://weaviate.io/
- Star us on GitHub https://github.com/weaviate/weaviate
- Stay updated and subscribe to our newsletter: https://newsletter.weaviate.io/
- Try out Weaviate Cloud for free here: https://console.weaviate.cloud/
Got a question?
- Forum: https://forum.weaviate.io/
Connect with us on
- Twitter: https://twitter.com/weaviate_io
- LinkedIn: https://www.linkedin.com/company/weaviate-io/
Watch on YouTube โ
(saves to browser)
Sign in to unlock AI tutor explanation ยท โก30
More on: Multimodal LLMs
View skill โRelated AI Lessons
โก
โก
โก
โก
Grokโs federal stall is undercutting SpaceXโs IPO growth story
The Next Web AI
Taiwan moves to detain three over alleged illegal high-end AI server exports to China
The Next Web AI
Top 10 AI Development Companies in Leicester UK (2026)
Medium ยท AI
China blocks NVIDIAโs RTX 5090D V2 imports while Jensen Huang was in Beijing
The Next Web AI
Chapters (6)
Intro
0:31
Multimodal embedding models
1:31
Google's Gemini Embedding 2
2:12
PDF RAG architecture overview
3:00
Building a multimodal PDF RAG pipeline
4:08
Conclusion
๐
Tutor Explanation
DeepCamp AI