📰 Dev.to · Randy AP
3 articles · Updated every 3 hours · View all reads
All
Articles 75,043Blog Posts 102,145Tech Tutorials 18,341Research Papers 16,003News 13,125
⚡ AI Lessons

Dev.to · Randy AP
1d ago
Running Mixtral 8x7B at 21+ TPS on Pure CPU via io_uring and Predictive Caching
The current consensus in AI infrastructure is unyielding: if you want to run frontier Mixture of...

Dev.to · Randy AP
5d ago
I streamed Mixtral 8x7B from NVMe on a $0.40/hour VM and got 3.32 tps, here's how
I streamed Mixtral 8x7B from NVMe on a $0.40/hour VM and got 3.32 tps — here's how Most...

Dev.to · Randy AP
1w ago
I built a Rust inference engine that streams MoE expert weights from NVMe SSDs, no GPU required
Most people trying to run Mixtral or DeepSeek-V3 locally hit the same wall: they don't have 80GB of...
DeepCamp AI