📰 Dev.to · Christopher Maher

8 articles · Updated every 3 hours · View all reads

All Articles 83,650 Blog Posts 106,015 Tech Tutorials 20,471 Research Papers 17,848 News 14,030 ⚡ AI Lessons

TurboQuant on a MacBook Pro: two findings the upstream discussion missed

Dev.to · Christopher Maher 1mo ago

TurboQuant on a MacBook Pro: two findings the upstream discussion missed

Built TheTom's TurboQuant fork of llama.cpp for Metal, ran the bench overnight on M5 Max, and surfaced two findings the upstream community thread didn't have: t

62.2% on Aider Polyglot from a MacBook Pro. Then the other model we tried scored 4%. Here's what actually happened, with a working cost loop attached.

Dev.to · Christopher Maher 1mo ago

62.2% on Aider Polyglot from a MacBook Pro. Then the other model we tried scored 4%. Here's what actually happened, with a working cost loop attached.

Qwen3.6-35B-A3B Q8 on a MacBook Pro M5 Max scored 62.2% on Aider Polyglot, beating Claude Sonnet 4 with 32k thinking. Then Devstral 2 scored 4% on the same harn

We ran Qwen3.6-27B on $800 of consumer GPUs, day one: llama.cpp vs vLLM

Dev.to · Christopher Maher 🧠 Large Language Models ⚡ AI Lesson 1mo ago

We ran Qwen3.6-27B on $800 of consumer GPUs, day one: llama.cpp vs vLLM

A Kubernetes-native bake-off on 2× RTX 5060 Ti, with reproducible manifests and a cost-per-token number neither cloud nor OSS FinOps tools will tell you.

I tested speculative decoding on my home GPU cluster. Here's why it didn't help.

Dev.to · Christopher Maher 2mo ago

I tested speculative decoding on my home GPU cluster. Here's why it didn't help.

I spent Saturday night testing n-gram speculative decoding on consumer GPUs. The claim: speculative...

Google Released Gemma 4 Yesterday. I Had It Fixing Real Bugs by Lunch.

Dev.to · Christopher Maher 2mo ago

Google Released Gemma 4 Yesterday. I Had It Fixing Real Bugs by Lunch.

Google released Gemma 4 yesterday. By the time I went to bed, I had it deployed on my home lab,...

I Tested TurboQuant KV Cache Compression on Consumer GPUs. Here's What Actually Happened.

Dev.to · Christopher Maher 2mo ago

I Tested TurboQuant KV Cache Compression on Consumer GPUs. Here's What Actually Happened.

I spent this weekend testing TurboQuant KV cache compression on my home lab Kubernetes cluster. The...

The $0 Problem: Why Every Tool Says Your On-Prem Inference is Free

Dev.to · Christopher Maher 2mo ago

The $0 Problem: Why Every Tool Says Your On-Prem Inference is Free

If you run LLMs on your own hardware, every cost tracking tool in the ecosystem has the same answer...

llama.cpp on Kubernetes: The Guide I Wish Existed

Dev.to · Christopher Maher 2mo ago

llama.cpp on Kubernetes: The Guide I Wish Existed

It started at my kitchen table. I was spending an evening on my laptop, fascinated by how LLMs...