✕ Clear all filters
27 articles

📰 Dev.to · plasmon

27 articles · Updated every 3 hours · View all reads

All Articles 74,946Blog Posts 102,145Tech Tutorials 18,310Research Papers 16,002News 13,114 ⚡ AI Lessons
99.8% of LLM Inference Power Isn't Spent on Computation
Dev.to · plasmon 1mo ago
99.8% of LLM Inference Power Isn't Spent on Computation
99.8% of LLM Inference Power Isn't Spent on Computation When people debate LLM inference...
Q4 KV Cache Fit 32K Context into 8GB VRAM — Only Math Broke
Dev.to · plasmon 1mo ago
Q4 KV Cache Fit 32K Context into 8GB VRAM — Only Math Broke
Q4 KV Cache Fit 32K Context into 8GB VRAM — Only Math Broke The biggest VRAM hog in LLM...
HBM4 Didn't Break the Memory Wall — It Just Moved It
Dev.to · plasmon 1mo ago
HBM4 Didn't Break the Memory Wall — It Just Moved It
HBM4 Didn't Break the Memory Wall — It Just Moved It HBM bandwidth has doubled every...
Running Just One LLM on 8GB VRAM Is a Waste
Dev.to · plasmon 1mo ago
Running Just One LLM on 8GB VRAM Is a Waste
Liquid syntax error: Unknown tag 'endraw'
Light Just Cut KV Cache Memory Traffic to 1/16th
Dev.to · plasmon 1mo ago
Light Just Cut KV Cache Memory Traffic to 1/16th
Light Just Cut KV Cache Memory Traffic to 1/16th The bottleneck in long-context LLM...
They Routed Power Through the Back of the Chip and 30% IR Drop Vanished
Dev.to · plasmon 2mo ago
They Routed Power Through the Back of the Chip and 30% IR Drop Vanished
They Routed Power Through the Back of the Chip and 30% IR Drop Vanished Every...
Letting AI Control RAG Search Improved Accuracy by 79%
Dev.to · plasmon 2mo ago
Letting AI Control RAG Search Improved Accuracy by 79%
Letting AI Control RAG Search Improved Accuracy by 79% Most RAG (Retrieval-Augmented...
If Memory Could Compute, Would We Still Need GPUs?
Dev.to · plasmon 2mo ago
If Memory Could Compute, Would We Still Need GPUs?
If Memory Could Compute, Would We Still Need GPUs? The bottleneck for LLM inference isn't...
I Couldn't Build a Local LLM PC for $1,300 — Budget Tiers and the VRAM Cliffs Between Them
Dev.to · plasmon 2mo ago
I Couldn't Build a Local LLM PC for $1,300 — Budget Tiers and the VRAM Cliffs Between Them
I Couldn't Build a Local LLM PC for $1,300 — Budget Tiers and the VRAM Cliffs Between...
8-Bit Quantization Destroyed 92% of Code Generation — The Culprit Wasn't Bit Count
Dev.to · plasmon 2mo ago
8-Bit Quantization Destroyed 92% of Code Generation — The Culprit Wasn't Bit Count
8-Bit Quantization Destroyed 92% of Code Generation — The Culprit Wasn't Bit Count If you...
ML Hit 99% Accuracy on Yield Prediction — The Factory Floor Ignored It
Dev.to · plasmon 2mo ago
ML Hit 99% Accuracy on Yield Prediction — The Factory Floor Ignored It
ML Hit 99% Accuracy on Yield Prediction — The Factory Floor Ignored It The pitch to bring...
3 Classifiers, 3 Answers: Why CoT Faithfulness Scores Are Meaningless
Dev.to · plasmon 2mo ago
3 Classifiers, 3 Answers: Why CoT Faithfulness Scores Are Meaningless
3 Classifiers, 3 Answers: Why CoT Faithfulness Scores Are Meaningless LLM Chain-of-Thought...
Parameter Count Is the Worst Way to Pick a Model on 8GB VRAM
Dev.to · plasmon 2mo ago
Parameter Count Is the Worst Way to Pick a Model on 8GB VRAM
Parameter Count Is the Worst Way to Pick a Model on 8GB VRAM I've been running local LLMs...
The Memory Bandwidth Gap Is 49x and Growing — Why Local LLMs Hit a Ceiling
Dev.to · plasmon 2mo ago
The Memory Bandwidth Gap Is 49x and Growing — Why Local LLMs Hit a Ceiling
The Wall I Hit on an RTX 4060 Was a Bandwidth Wall Running Qwen3.5-9B on an RTX 4060 8GB...
MoE Beat Dense 27B by 2.4x on 8GB VRAM — The 35B-A3B Benchmark Nobody Expected
Dev.to · plasmon 2mo ago
MoE Beat Dense 27B by 2.4x on 8GB VRAM — The 35B-A3B Benchmark Nobody Expected
Start with the benchmarks In a previous article, I compared three Qwen3.5 models on the...
I Designed a Memory System for Claude Code — 'Forgetting' Was the Hardest Part
Dev.to · plasmon 2mo ago
I Designed a Memory System for Claude Code — 'Forgetting' Was the Hardest Part
Everyone talks about making AI remember things. Handoff prompts. System instructions. Memory files....
80% of LLM 'Thinking' Is a Lie — What CoT Faithfulness Research Actually Shows
Dev.to · plasmon 2mo ago
80% of LLM 'Thinking' Is a Lie — What CoT Faithfulness Research Actually Shows
When You're Reading CoT, the Model Is Thinking Something Else Thinking models are...
80% of LLM 'Thinking' Is a Lie — What CoT Faithfulness Research Actually Shows
Dev.to · plasmon 2mo ago
80% of LLM 'Thinking' Is a Lie — What CoT Faithfulness Research Actually Shows
When You're Reading CoT, the Model Is Thinking Something Else Thinking models are...
I Let Claude Code Run My Tech Blog. A Fake Article Passed Every Quality Check.
Dev.to · plasmon 2mo ago
I Let Claude Code Run My Tech Blog. A Fake Article Passed Every Quality Check.
I've been letting Claude Code autonomously run a tech blog. Topic selection, article generation,...
Still Picking API vs Local LLM by Gut Feeling? A Framework With Real Benchmarks
Dev.to · plasmon 2mo ago
Still Picking API vs Local LLM by Gut Feeling? A Framework With Real Benchmarks
Still Picking API vs Local LLM by Gut Feeling? A Framework With Real Benchmarks "Just use...