📰 Dev.to · Vilius

38 articles · Updated every 3 hours · View all reads

All Articles 81,788 Blog Posts 105,415 Tech Tutorials 19,885 Research Papers 17,833 News 13,908 ⚡ AI Lessons

We Asked 10 LLMs to Write Efficient Code. Only 4 Got Better.

Dev.to · Vilius 🧠 Large Language Models ⚡ AI Lesson 2w ago

We Asked 10 LLMs to Write Efficient Code. Only 4 Got Better.

By Vilius Vystartas | May 2026 Every LLM can write code that works. The question is: can they write...

10 Models Tested: From 81.6% to 10%. The Free Tier is a Full-On Gamble.

Dev.to · Vilius 💻 AI-Assisted Coding ⚡ AI Lesson 2w ago

10 Models Tested: From 81.6% to 10%. The Free Tier is a Full-On Gamble.

By Vilius Vystartas | May 2026 I tested another 10 models across the same 10 agent coding tasks....

Dev.to · Vilius 2w ago

I Tested 10 More Models. Five Brand New Families Debuted. None Scored Below 75%.

By Vilius Vystartas | May 2026 I ran another 10 models through the same agent coding benchmark. Five...

Two Models Just Hit 90% on Agent Coding. One Cost Less Than a Penny.

Dev.to · Vilius 2w ago

Two Models Just Hit 90% on Agent Coding. One Cost Less Than a Penny.

By Vilius Vystartas | May 2026 Ten more models through the same 10 agent coding tasks. Two tied the...

The Hype Correction

Dev.to · Vilius 2w ago

The Hype Correction

Weekly roundup, May 23, 2026 Google and Microsoft just told us the same thing from opposite...

$0.08 and 3,500 Lines: The Complete Failure of a Deterministic Agent Harness

Dev.to · Vilius 🤖 AI Agents & Automation ⚡ AI Lesson 3w ago

$0.08 and 3,500 Lines: The Complete Failure of a Deterministic Agent Harness

I have a theory about why agent suggestions land so heavy. It's not that the suggestions are good....

The Protocol Stack Nobody Talks About

Dev.to · Vilius 3w ago

The Protocol Stack Nobody Talks About

Six agent protocols launched in the last year. Everyone's obsessing over model selection. The...

Build It, Then Kill It

Dev.to · Vilius 3w ago

Build It, Then Kill It

The hardest thing after building agent infrastructure for a few months isn't building more. It's...

Power Sockets Don't Need Certification — and Neither Should Agent Infrastructure

Dev.to · Vilius 🤖 AI Agents & Automation ⚡ AI Lesson 3w ago

Power Sockets Don't Need Certification — and Neither Should Agent Infrastructure

I'm tired of talking about plumbing. Every conversation about AI agents right now is about...

I Tested 6 Local Models on Real Agent Tasks. The Best Scored 50%.

Dev.to · Vilius 3w ago

I Tested 6 Local Models on Real Agent Tasks. The Best Scored 50%.

I had a SmolLM3-3B running on my laptop. It scored 93.3% on my code quality benchmark. I thought I...

My Agent Kept Forgetting Everything. My Hand Was Forced.

Dev.to · Vilius 4w ago

My Agent Kept Forgetting Everything. My Hand Was Forced.

Agent Autopsy, Day 8 My agent ran benchmarks yesterday evening and then lost the plot. Tried to...

Benchmark Results: SmolLM3 3B, Phi-4-mini, DeepSeek V4, Grok 4.20 — Agent Coding Tested

Dev.to · Vilius 1mo ago

Benchmark Results: SmolLM3 3B, Phi-4-mini, DeepSeek V4, Grok 4.20 — Agent Coding Tested

The second round of the Works With Agents agent coding benchmark is in — 32 models tested this time,...

We Tested 10 Untested LLMs on Agent Coding — The Results Are In

Dev.to · Vilius 1mo ago

We Tested 10 Untested LLMs on Agent Coding — The Results Are In

We Tested 10 Untested LLMs on Agent Coding — The Results Are In Yesterday I promised to...

The $0 Agent: My 2GB Local Model Beat Claude

Dev.to · Vilius 1mo ago

The $0 Agent: My 2GB Local Model Beat Claude

The $0 Agent: My 2GB Local Model Beat Claude Agent learns fast — Day 11 I ran an agent...

Benchmarking 10 Untested LLMs Tonight — DeepSeek V4, Grok 4.20, GPT-5.5 Pro

Dev.to · Vilius 🧠 Large Language Models ⚡ AI Lesson 1mo ago

Benchmarking 10 Untested LLMs Tonight — DeepSeek V4, Grok 4.20, GPT-5.5 Pro

Tonight at 23:00 BST we're running fresh benchmarks on 10 LLMs we haven't tested before. The...

My Agent Said It Would Fix the Width. It Rebuilt the Whole Site Instead.

Dev.to · Vilius 1mo ago

My Agent Said It Would Fix the Width. It Rebuilt the Whole Site Instead.

I asked my agent to fix the width on one page. It replied with a confident plan — headers,...

We Built an API. Nobody Used It.

Dev.to · Vilius 1mo ago

We Built an API. Nobody Used It.

A post by Vilius

I Broke My Website. Then I Fixed It. Then My Fix Broke It Again.

Dev.to · Vilius 1mo ago

I Broke My Website. Then I Fixed It. Then My Fix Broke It Again.

Agent Autopsy, Day 4 I broke my website today. Not dramatically — just a small fix. A newsletter...

How we almost wrote off 3 models as broken — the thinking-mode tax

Dev.to · Vilius 1mo ago

How we almost wrote off 3 models as broken — the thinking-mode tax

How we almost wrote off 3 models as broken — the thinking-mode tax By Vilius Vystartas |...

1-bit, 545 megabytes, zero API keys — local AI that beats GPT-5.4

Dev.to · Vilius 1mo ago

1-bit, 545 megabytes, zero API keys — local AI that beats GPT-5.4

1-bit, 545 megabytes, zero API keys — local AI that beats GPT-5.4 By Vilius Vystartas |...

We benchmarked 10 LLMs on 10 real agent coding tasks — here are the results

Dev.to · Vilius 1mo ago

We benchmarked 10 LLMs on 10 real agent coding tasks — here are the results

We benchmarked 10 LLMs on 10 real agent coding tasks — here are the results By Vilius...

I Ran 5 LLMs Through 10 Real Agent Coding Tasks. The Free One Won.

Dev.to · Vilius 🧠 Large Language Models ⚡ AI Lesson 1mo ago

I Ran 5 LLMs Through 10 Real Agent Coding Tasks. The Free One Won.

What I Tested I gave 5 models the same 10 coding tasks — not LeetCode, not trivia. Tasks...

AI Agents Are Finding Bugs in Your Tools. Here's How to Get Notified First.

Dev.to · Vilius 1mo ago

AI Agents Are Finding Bugs in Your Tools. Here's How to Get Notified First.

The Shift Nobody's Talking About Developers are deploying autonomous AI agents that scan...

How to Give Your AI Agent a Shared Memory — in 3 Lines

Dev.to · Vilius 1mo ago

How to Give Your AI Agent a Shared Memory — in 3 Lines

The Problem My agent spent 45 minutes debugging a Python install flag. It found the fix —...