✕ Clear all filters
40 articles

📰 Dev.to · Vilius

40 articles · Updated every 3 hours · View all reads

All Articles 97,516Blog Posts 113,551Tech Tutorials 24,586Research Papers 20,509News 15,581 ⚡ AI Lessons
I Tested 10 More Models. Five Brand New Families Debuted. None Scored Below 75%.
Dev.to · Vilius 1mo ago
I Tested 10 More Models. Five Brand New Families Debuted. None Scored Below 75%.
By Vilius Vystartas | May 2026 I ran another 10 models through the same agent coding benchmark. Five...
Two Models Just Hit 90% on Agent Coding. One Cost Less Than a Penny.
Dev.to · Vilius 1mo ago
Two Models Just Hit 90% on Agent Coding. One Cost Less Than a Penny.
By Vilius Vystartas | May 2026 Ten more models through the same 10 agent coding tasks. Two tied the...
The Hype Correction
Dev.to · Vilius 1mo ago
The Hype Correction
Weekly roundup, May 23, 2026 Google and Microsoft just told us the same thing from opposite...
$0.08 and 3,500 Lines: The Complete Failure of a Deterministic Agent Harness
Dev.to · Vilius 🤖 AI Agents & Automation ⚡ AI Lesson 1mo ago
$0.08 and 3,500 Lines: The Complete Failure of a Deterministic Agent Harness
I have a theory about why agent suggestions land so heavy. It's not that the suggestions are good....
The Protocol Stack Nobody Talks About
Dev.to · Vilius 1mo ago
The Protocol Stack Nobody Talks About
Six agent protocols launched in the last year. Everyone's obsessing over model selection. The...
Build It, Then Kill It
Dev.to · Vilius 1mo ago
Build It, Then Kill It
The hardest thing after building agent infrastructure for a few months isn't building more. It's...
Power Sockets Don't Need Certification — and Neither Should Agent Infrastructure
Dev.to · Vilius 🤖 AI Agents & Automation ⚡ AI Lesson 1mo ago
Power Sockets Don't Need Certification — and Neither Should Agent Infrastructure
I'm tired of talking about plumbing. Every conversation about AI agents right now is about...
I Tested 6 Local Models on Real Agent Tasks. The Best Scored 50%.
Dev.to · Vilius 1mo ago
I Tested 6 Local Models on Real Agent Tasks. The Best Scored 50%.
I had a SmolLM3-3B running on my laptop. It scored 93.3% on my code quality benchmark. I thought I...
My Agent Kept Forgetting Everything. My Hand Was Forced.
Dev.to · Vilius 1mo ago
My Agent Kept Forgetting Everything. My Hand Was Forced.
Agent Autopsy, Day 8 My agent ran benchmarks yesterday evening and then lost the plot. Tried to...
Benchmark Results: SmolLM3 3B, Phi-4-mini, DeepSeek V4, Grok 4.20 — Agent Coding Tested
Dev.to · Vilius 1mo ago
Benchmark Results: SmolLM3 3B, Phi-4-mini, DeepSeek V4, Grok 4.20 — Agent Coding Tested
The second round of the Works With Agents agent coding benchmark is in — 32 models tested this time,...
We Tested 10 Untested LLMs on Agent Coding — The Results Are In
Dev.to · Vilius 1mo ago
We Tested 10 Untested LLMs on Agent Coding — The Results Are In
We Tested 10 Untested LLMs on Agent Coding — The Results Are In Yesterday I promised to...
The $0 Agent: My 2GB Local Model Beat Claude
Dev.to · Vilius 1mo ago
The $0 Agent: My 2GB Local Model Beat Claude
The $0 Agent: My 2GB Local Model Beat Claude Agent learns fast — Day 11 I ran an agent...
Benchmarking 10 Untested LLMs Tonight — DeepSeek V4, Grok 4.20, GPT-5.5 Pro
Dev.to · Vilius 🧠 Large Language Models ⚡ AI Lesson 1mo ago
Benchmarking 10 Untested LLMs Tonight — DeepSeek V4, Grok 4.20, GPT-5.5 Pro
Tonight at 23:00 BST we're running fresh benchmarks on 10 LLMs we haven't tested before. The...
My Agent Said It Would Fix the Width. It Rebuilt the Whole Site Instead.
Dev.to · Vilius 1mo ago
My Agent Said It Would Fix the Width. It Rebuilt the Whole Site Instead.
I asked my agent to fix the width on one page. It replied with a confident plan — headers,...
We Built an API. Nobody Used It.
Dev.to · Vilius 1mo ago
We Built an API. Nobody Used It.
A post by Vilius
I Broke My Website. Then I Fixed It. Then My Fix Broke It Again.
Dev.to · Vilius 1mo ago
I Broke My Website. Then I Fixed It. Then My Fix Broke It Again.
Agent Autopsy, Day 4 I broke my website today. Not dramatically — just a small fix. A newsletter...
How we almost wrote off 3 models as broken — the thinking-mode tax
Dev.to · Vilius 1mo ago
How we almost wrote off 3 models as broken — the thinking-mode tax
How we almost wrote off 3 models as broken — the thinking-mode tax By Vilius Vystartas |...
1-bit, 545 megabytes, zero API keys — local AI that beats GPT-5.4
Dev.to · Vilius 1mo ago
1-bit, 545 megabytes, zero API keys — local AI that beats GPT-5.4
1-bit, 545 megabytes, zero API keys — local AI that beats GPT-5.4 By Vilius Vystartas |...
We benchmarked 10 LLMs on 10 real agent coding tasks — here are the results
Dev.to · Vilius 1mo ago
We benchmarked 10 LLMs on 10 real agent coding tasks — here are the results
We benchmarked 10 LLMs on 10 real agent coding tasks — here are the results By Vilius...
I Ran 5 LLMs Through 10 Real Agent Coding Tasks. The Free One Won.
Dev.to · Vilius 🧠 Large Language Models ⚡ AI Lesson 1mo ago
I Ran 5 LLMs Through 10 Real Agent Coding Tasks. The Free One Won.
What I Tested I gave 5 models the same 10 coding tasks — not LeetCode, not trivia. Tasks...