The Small Model Infrastructure Nobody Built (So We Did) — Filip Makraduli, Superlinked

AI Engineer · Intermediate ·🔍 RAG & Vector Search ·1w ago
Most embedding infrastructure assumes you know exactly which model you want ahead of time. This talk starts where that assumption breaks. Filip Makraduli walks through the real profiling mistakes, infrastructure gaps, and production constraints that led to building an embedding inference engine designed for dynamic model loading, hot-swapping, and memory-aware eviction instead of brittle one-model-per-container deployments. If you're working on small-model inference, embeddings, or GPU infrastructure, this is a practical look at what breaks in the real world and how to design around it. Speaker info: - https://www.linkedin.com/in/filipmakraduli/
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Why StarRocks Is Better Than Elasticsearch for RAG and AI-Powered Vector Search Analytics
Learn why StarRocks outperforms Elasticsearch for RAG and AI-powered vector search analytics, and how to apply this knowledge to improve your data architecture
Medium · LLM
Production RAG: Shipping a RAG System Into an Enterprise Product
Learn how to ship a RAG system into an enterprise product, overcoming operational realities and challenges beyond the demo stage
Medium · RAG
HyDE: Search With the Answer You Wish You Had
Learn how HyDE improves search by using the answer you wish you had as a query, and why traditional question-based searches are limited
Medium · RAG
Hierarchical Indices: Find the Section First, Then Find the Sentence
Learn how hierarchical indices work by mimicking human search behavior in long documents, improving search efficiency
Medium · RAG
Up next
Watch this before applying for jobs as a developer.
Tech With Tim
Watch →