✕ Clear all filters
38 articles

📰 Dev.to · Vilius

38 articles · Updated every 3 hours · View all reads

All Articles 81,788Blog Posts 105,415Tech Tutorials 19,885Research Papers 17,833News 13,908 ⚡ AI Lessons
The Hype Correction
Dev.to · Vilius 2w ago
The Hype Correction
Weekly roundup, May 23, 2026 Google and Microsoft just told us the same thing from opposite...
$0.08 and 3,500 Lines: The Complete Failure of a Deterministic Agent Harness
Dev.to · Vilius 🤖 AI Agents & Automation ⚡ AI Lesson 3w ago
$0.08 and 3,500 Lines: The Complete Failure of a Deterministic Agent Harness
I have a theory about why agent suggestions land so heavy. It's not that the suggestions are good....
The Protocol Stack Nobody Talks About
Dev.to · Vilius 3w ago
The Protocol Stack Nobody Talks About
Six agent protocols launched in the last year. Everyone's obsessing over model selection. The...
Build It, Then Kill It
Dev.to · Vilius 3w ago
Build It, Then Kill It
The hardest thing after building agent infrastructure for a few months isn't building more. It's...
Power Sockets Don't Need Certification — and Neither Should Agent Infrastructure
Dev.to · Vilius 🤖 AI Agents & Automation ⚡ AI Lesson 3w ago
Power Sockets Don't Need Certification — and Neither Should Agent Infrastructure
I'm tired of talking about plumbing. Every conversation about AI agents right now is about...
I Tested 6 Local Models on Real Agent Tasks. The Best Scored 50%.
Dev.to · Vilius 3w ago
I Tested 6 Local Models on Real Agent Tasks. The Best Scored 50%.
I had a SmolLM3-3B running on my laptop. It scored 93.3% on my code quality benchmark. I thought I...
My Agent Kept Forgetting Everything. My Hand Was Forced.
Dev.to · Vilius 4w ago
My Agent Kept Forgetting Everything. My Hand Was Forced.
Agent Autopsy, Day 8 My agent ran benchmarks yesterday evening and then lost the plot. Tried to...
Benchmark Results: SmolLM3 3B, Phi-4-mini, DeepSeek V4, Grok 4.20 — Agent Coding Tested
Dev.to · Vilius 1mo ago
Benchmark Results: SmolLM3 3B, Phi-4-mini, DeepSeek V4, Grok 4.20 — Agent Coding Tested
The second round of the Works With Agents agent coding benchmark is in — 32 models tested this time,...
We Tested 10 Untested LLMs on Agent Coding — The Results Are In
Dev.to · Vilius 1mo ago
We Tested 10 Untested LLMs on Agent Coding — The Results Are In
We Tested 10 Untested LLMs on Agent Coding — The Results Are In Yesterday I promised to...
The $0 Agent: My 2GB Local Model Beat Claude
Dev.to · Vilius 1mo ago
The $0 Agent: My 2GB Local Model Beat Claude
The $0 Agent: My 2GB Local Model Beat Claude Agent learns fast — Day 11 I ran an agent...
Benchmarking 10 Untested LLMs Tonight — DeepSeek V4, Grok 4.20, GPT-5.5 Pro
Dev.to · Vilius 🧠 Large Language Models ⚡ AI Lesson 1mo ago
Benchmarking 10 Untested LLMs Tonight — DeepSeek V4, Grok 4.20, GPT-5.5 Pro
Tonight at 23:00 BST we're running fresh benchmarks on 10 LLMs we haven't tested before. The...
My Agent Said It Would Fix the Width. It Rebuilt the Whole Site Instead.
Dev.to · Vilius 1mo ago
My Agent Said It Would Fix the Width. It Rebuilt the Whole Site Instead.
I asked my agent to fix the width on one page. It replied with a confident plan — headers,...
We Built an API. Nobody Used It.
Dev.to · Vilius 1mo ago
We Built an API. Nobody Used It.
A post by Vilius
I Broke My Website. Then I Fixed It. Then My Fix Broke It Again.
Dev.to · Vilius 1mo ago
I Broke My Website. Then I Fixed It. Then My Fix Broke It Again.
Agent Autopsy, Day 4 I broke my website today. Not dramatically — just a small fix. A newsletter...
How we almost wrote off 3 models as broken — the thinking-mode tax
Dev.to · Vilius 1mo ago
How we almost wrote off 3 models as broken — the thinking-mode tax
How we almost wrote off 3 models as broken — the thinking-mode tax By Vilius Vystartas |...
1-bit, 545 megabytes, zero API keys — local AI that beats GPT-5.4
Dev.to · Vilius 1mo ago
1-bit, 545 megabytes, zero API keys — local AI that beats GPT-5.4
1-bit, 545 megabytes, zero API keys — local AI that beats GPT-5.4 By Vilius Vystartas |...
We benchmarked 10 LLMs on 10 real agent coding tasks — here are the results
Dev.to · Vilius 1mo ago
We benchmarked 10 LLMs on 10 real agent coding tasks — here are the results
We benchmarked 10 LLMs on 10 real agent coding tasks — here are the results By Vilius...
I Ran 5 LLMs Through 10 Real Agent Coding Tasks. The Free One Won.
Dev.to · Vilius 🧠 Large Language Models ⚡ AI Lesson 1mo ago
I Ran 5 LLMs Through 10 Real Agent Coding Tasks. The Free One Won.
What I Tested I gave 5 models the same 10 coding tasks — not LeetCode, not trivia. Tasks...
AI Agents Are Finding Bugs in Your Tools. Here's How to Get Notified First.
Dev.to · Vilius 1mo ago
AI Agents Are Finding Bugs in Your Tools. Here's How to Get Notified First.
The Shift Nobody's Talking About Developers are deploying autonomous AI agents that scan...
How to Give Your AI Agent a Shared Memory — in 3 Lines
Dev.to · Vilius 1mo ago
How to Give Your AI Agent a Shared Memory — in 3 Lines
The Problem My agent spent 45 minutes debugging a Python install flag. It found the fix —...