📰 Dev.to · Dev Yadav

10 articles · Updated every 3 hours · View all reads

All Articles 88,085 Blog Posts 108,026 Tech Tutorials 21,808 Research Papers 18,907 News 14,533 ⚡ AI Lessons

The Demo Was One User. Then Batch Size Became Real.

Dev.to · Dev Yadav 2mo ago

The Demo Was One User. Then Batch Size Became Real.

The demo worked because the test was one user, one prompt, one response. Then real usage showed up,...

4-bit Quantization Does Not Make VRAM Problems Go Away

Dev.to · Dev Yadav 2mo ago

4-bit Quantization Does Not Make VRAM Problems Go Away

A lot of people hear 4-bit quantization and mentally convert that into this model should run anywhere...

KV Cache Is Why Your Model Fit Until It Did Not

Dev.to · Dev Yadav 2mo ago

KV Cache Is Why Your Model Fit Until It Did Not

The model loaded. The first prompt worked. Then longer prompts or multiple users showed up, and...

7B Parameters Does Not Mean 8GB VRAM Is Enough

Dev.to · Dev Yadav 2mo ago

7B Parameters Does Not Mean 8GB VRAM Is Enough

A lot of people see 7B and assume 8GB VRAM should be enough. Then they load the model, increase...

The Model Was Cheap. The Retries Became the Bill.

Dev.to · Dev Yadav 2mo ago

The Model Was Cheap. The Retries Became the Bill.

The hourly price did not look scary. What hurt was running the same job again, reloading the same...

The Tutorial Used Tiny Prompts. Your Real Prompts Did Not.

Dev.to · Dev Yadav 2mo ago

The Tutorial Used Tiny Prompts. Your Real Prompts Did Not.

The tutorial looked smooth because the prompt was tiny. Then you used the real prompt your app...

The Demo Worked on a 7B Model. Production Traffic Changed the Math.

Dev.to · Dev Yadav 2mo ago

The Demo Worked on a 7B Model. Production Traffic Changed the Math.

The demo looked fine on a small model with one user. Then real traffic showed up, latency got ugly,...

The Cheapest GPU Looked Smart. Then the Job Took All Night.

Dev.to · Dev Yadav 2mo ago

The Cheapest GPU Looked Smart. Then the Job Took All Night.

The hourly price looked great, so the cheapest GPU felt like the responsible choice. Then the run...

Your Model Loaded Fine. Then Context Length Broke the GPU Plan.

Dev.to · Dev Yadav 2mo ago

Your Model Loaded Fine. Then Context Length Broke the GPU Plan.

The model loaded. The notebook worked. Then you increased context length, batch size, or both, and...

Best GPU Rental for AI Training in India

Dev.to · Dev Yadav 3mo ago

Best GPU Rental for AI Training in India

Training LLMs? Fine-tuning models? Here's which GPU you actually need (and which ones are...