The Small Model Infrastructure Nobody Built (So We Did) — Filip Makraduli, Superlinked

Name: The Small Model Infrastructure Nobody Built (So We Did) — Filip Makraduli, Superlinked
Uploaded: 2026-05-05T17:00:06Z
Channel: AI Engineer
Description: Most embedding infrastructure assumes you know exactly which model you want ahead of time. This talk starts where that assumption breaks. Filip Makradul...

AI Engineer · Intermediate ·🔍 RAG & Vector Search ·1w ago

Skills: RAG Basics80%Systems Design Basics60%

Most embedding infrastructure assumes you know exactly which model you want ahead of time. This talk starts where that assumption breaks. Filip Makraduli walks through the real profiling mistakes, infrastructure gaps, and production constraints that led to building an embedding inference engine designed for dynamic model loading, hot-swapping, and memory-aware eviction instead of brittle one-model-per-container deployments. If you're working on small-model inference, embeddings, or GPU infrastructure, this is a practical look at what breaks in the real world and how to design around it. Speaker info: - https://www.linkedin.com/in/filipmakraduli/

Watch on YouTube ↗ (saves to browser)