Pyversity with Thomas van Dongen - Weaviate Podcast #132!
Hey everyone! Thanks so much for watching the 132nd episode of the Weaviate Podcast with Thomas van Dongen, head of AI engineering at Springer Nature! Thomas is also the creator of Pyversity, a fast, lightweight library for diversifying retrieval results! Retrieval systems often return highly similar items. Pyversity efficiently re-ranks these results to encourage diversity, surfacing items that remain relevant but less redundant. It implements several popular diversification strategies such as MMR, MSD, DPP, and Cover with a clear, unified API.
Check out Pyversity! https://github.com/Pringle…
Watch on YouTube ↗
(saves to browser)
Chapters (12)
Welcome Thomas!
0:30
An Introduction to Diversity in Vector Space
6:32
Diversification Strategies
15:42
Evaluating Diversity
21:36
Embedding Models for Diversity
27:25
LLMs for Diversity
33:20
The most representative set
36:50
Scientific Literature Mining
39:05
Thoughts on Chunking
42:35
Synthetic Data for Information Retrieval
46:00
Chatting with Scientific Papers
51:25
Future Directions and State of AI
DeepCamp AI