Scaling Meta's Multi-Agent Systems to a Billion Videos
Skills:
Multi-Agent Systems90%
Aditya Gautam (Generative AI, Meta) breaks down how Meta tackles two of the hardest problems in short-form video at the scale of hundreds of millions to a billion reads per day: modality misalignment and original content theft. He's spent years on Facebook Reels integrity and recommendations, and in this talk he gets practical about why a single giant LLM is the wrong tool for the job, and what to build instead.
This is the application-layer view of multi-agent systems. Not infra, not orchestration theory, just what actually works when your input is messy user-generated video and your budget is a fraction of a cent per inference.
What you'll learn:
- The two problems Meta is solving: Modality misalignment (text says one thing, video shows another, sometimes only for a few frames) and content theft in the copycat creator economy.
- Why a single big LLM falls over: Cost at 100M+ daily inferences, modality bias in VLMs, and the inability to bring in external context like the rest of the video corpus.
- The 3-agent architecture: Perceiver (signal acquisition, scene boundaries, VLM embeddings, OCR), Retriever (KNN over vector DBs, similarity matrices, creator-to-creator graphs), and Reasoner (chain-of-thought, confidence scores, can request more context).
- Why small specialized models win: 3B to 11B fine-tuned LLMs per agent beat a 200B generalist on both quality and cost, and let each agent ship on its own CI/CD like a microservice.
- The evaluation stack: Precision/recall/F1, isolated retrieval evaluation, reasoning quality via LLM-as-judge plus human-in-the-loop, hallucination rate, and full system efficiency logging at every hop.
- Four real optimizations that drop cost 10x: Spatial and temporal frame merging (don't send 300 similar cloud frames to a VLM), semantic hashing of viral content, dynamic routing that skips reasoning for obvious copies, and metadata pruning based on creator reputation and topic safety.
- The honest hard parts: Inference latency stacks a
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: Multi-Agent Systems
View skill →Related AI Lessons
🎓
Tutor Explanation
DeepCamp AI