Google's NEW Multimodal Model - Gemini Embedding 2
Google just released ๐๐ฒ๐บ๐ถ๐ป๐ถ ๐๐บ๐ฏ๐ฒ๐ฑ๐ฑ๐ถ๐ป๐ด ๐ฎ, their first fully multimodal embedding model - now also available in Weaviate.
The model maps text, images, videos, audio, and PDFs into a ๐๐ถ๐ป๐ด๐น๐ฒ ๐๐ป๐ถ๐ณ๐ถ๐ฒ๐ฑ ๐ฒ๐บ๐ฏ๐ฒ๐ฑ๐ฑ๐ถ๐ป๐ด ๐๐ฝ๐ฎ๐ฐ๐ฒ. This means you can query with text and retrieve relevant videos, or search with an image and find related documents, or any other combination - all using the same model.
In this video, I've included a walkthrough of building a ๐บ๐๐น๐๐ถ๐บ๐ผ๐ฑ๐ฎ๐น ๐ฃ๐๐ ๐ฅ๐๐ ๐ฝ๐ถ๐ฝ๐ฒ๐น๐ถ๐ป๐ฒ.
We embed each PDF page as an image using Gemini Embedding 2, aโฆ
Watch on YouTube โ
(saves to browser)
Chapters (6)
Intro
0:31
Multimodal embedding models
1:31
Google's Gemini Embedding 2
2:12
PDF RAG architecture overview
3:00
Building a multimodal PDF RAG pipeline
4:08
Conclusion
DeepCamp AI