Why OpenAI Embeddings Still Work Even After You Truncate Them

ExplainingAI · Advanced ·🔍 RAG & Vector Search ·2mo ago

Skills: RAG Basics80%Vector Stores70%

Why do OpenAI embeddings still work even after you truncate a large number of dimensions? In this video, we explore Matryoshka Representation Learning (MRL), a training technique that allows embeddings to remain useful even when you use only a prefix of the full vector. This makes it possible to trade off accuracy, memory usage, and retrieval latency at inference time. We first look at how embeddings are normally trained using contrastive learning, and why standard embedding models do not guarantee that truncated vectors will work well. Then we see how Matryoshka Representation Learning modifies the training loss to make smaller prefixes of the embedding independently useful. Finally, we look at results from the original MRL paper and experiments with modern embedding models to understand how truncation affects retrieval performance. This idea is especially useful for systems that rely on vector search, semantic retrieval, and RAG (Retrieval Augmented Generation). ⏱️ Timestamps: 00:00 Truncated OpenAI Embeddings Still Work 00:46 The Idea Behind Matryoshka Embeddings 02:03 How Embeddings Are Normally Trained (Contrastive Learning) 03:28 How Matryoshka Representation Learning Changes the Loss 05:16 Why Matryoshka Embeddings Are Useful for Vector Search & RAG 06:21 Results from the MRL Paper 📖 Resources: Matryoshka Representation Learning Paper - https://arxiv.org/pdf/2205.13147 Weaviate BlogPost - https://weaviate.io/blog/openais-matryoshka-embeddings-in-weaviate Weaviate Podcast on Matryoshka Representation Learning(with one of the authors) - https://www.youtube.com/watch?v=-0m2dZJ6zos 🔔 Subscribe : https://tinyurl.com/exai-channel-link 📌 Keywords: #openai #embeddings #vectorsearch #retrievalaugmentedgeneration

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: RAG Basics

View skill →

High Performance (Realtime) RAG Chains: From Basic to Advanced

High Performance (Realtime) RAG Chains: From Basic to Advanced

Coding the Ultimate RAG Engine from Zero

Coding the Ultimate RAG Engine from Zero

Build an LLM and RAG-based Chat Application using AlloyDB and LangChain

RAG Demo for Beginners: Full Hands-On Tutorial in Tamil | Build Your Own RAG AI | Karthik's Show

RAG Demo for Beginners: Full Hands-On Tutorial in Tamil | Build Your Own RAG AI | Karthik's Show

RAG with LangChain on Google Cloud

RAG with LangChain on Google Cloud

Google Cloud Tech

Build an End-to-End RAG API with AWS Bedrock & Azure OpenAI

Build an End-to-End RAG API with AWS Bedrock & Azure OpenAI

Related AI Lessons

Why StarRocks Is Better Than Elasticsearch for RAG and AI-Powered Vector Search Analytics

Learn why StarRocks outperforms Elasticsearch for RAG and AI-powered vector search analytics, and how to apply this knowledge to improve your data architecture

Production RAG: Shipping a RAG System Into an Enterprise Product

Learn how to ship a RAG system into an enterprise product, overcoming operational realities and challenges beyond the demo stage

HyDE: Search With the Answer You Wish You Had

Learn how HyDE improves search by using the answer you wish you had as a query, and why traditional question-based searches are limited

Hierarchical Indices: Find the Section First, Then Find the Sentence

Learn how hierarchical indices work by mimicking human search behavior in long documents, improving search efficiency

Chapters (6)

Truncated OpenAI Embeddings Still Work

0:46 The Idea Behind Matryoshka Embeddings

2:03 How Embeddings Are Normally Trained (Contrastive Learning)

3:28 How Matryoshka Representation Learning Changes the Loss

5:16 Why Matryoshka Embeddings Are Useful for Vector Search & RAG

6:21 Results from the MRL Paper

Watch this before applying for jobs as a developer.