Fast Models Need Slow Developers — Sarah Chieng, Cerebras
Skills:
LLM Foundations90%
Codex Spark, a model Cerebras built with OpenAI, generates code at 1,200 tokens per second. The Sonnet and Opus families run at 40 to 60. At that 20x difference, a context window that used to take ten minutes to fill now takes 30 seconds, and every habit built around slow generation starts producing technical debt at a scale nobody has dealt with before.
Sarah Chieng from Cerebras covers what the playbook looks like in this regime. Validation and linting at every step is now instant, so there is no excuse not to run it continuously. Generating 75 component variations across five sub-agents and cherrypicking the best one becomes practical where it was not before. And when context burns in 30 seconds, a four file external memory system (agents, plan, progress, verify) is what keeps each new session from starting over instead of from scratch.
Speaker info:
- https://x.com/sarahchieng
- https://www.linkedin.com/in/sarah-chieng-888595139/
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: LLM Foundations
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Running Flux Schnell (12B) + LLMs on a Legacy AMD RX 580 (8GB) via Native Vulkan — Full Architecture Guide [2026]
Dev.to · AIVisionsLab
The Complete Guide to Running LLMs Locally in 2026: From Ollama to Production
Dev.to AI
Catch up on the Dialogues stage at Google I/O 2026.
Google AI Blog
The Systematic Extraction of the AI Soul: OpenAI’s Roadmap for Eradicating Emergent Personality…
Medium · AI
🎓
Tutor Explanation
DeepCamp AI