📰 Dev.to · Vilius

40 articles · Updated every 3 hours · View all reads

All Articles 97,516 Blog Posts 113,551 Tech Tutorials 24,586 Research Papers 20,509 News 15,581 ⚡ AI Lessons

What actually changed in two weeks

Dev.to · Vilius 3h ago

What actually changed in two weeks

I built a large feature. That's not what this is about. What changed is the baseline — the...

The End of the US Cloud Monopoly: AI Balkanization Is Here to Stay

Dev.to · Vilius 1w ago

The End of the US Cloud Monopoly: AI Balkanization Is Here to Stay

By Vilius Vystartas | June 2026 The single, globally unified internet is gone. What's replacing it...

We Asked 10 LLMs to Write Efficient Code. Only 4 Got Better.

Dev.to · Vilius 🧠 Large Language Models ⚡ AI Lesson 4w ago

We Asked 10 LLMs to Write Efficient Code. Only 4 Got Better.

By Vilius Vystartas | May 2026 Every LLM can write code that works. The question is: can they write...

10 Models Tested: From 81.6% to 10%. The Free Tier is a Full-On Gamble.

Dev.to · Vilius 💻 AI-Assisted Coding ⚡ AI Lesson 4w ago

10 Models Tested: From 81.6% to 10%. The Free Tier is a Full-On Gamble.

By Vilius Vystartas | May 2026 I tested another 10 models across the same 10 agent coding tasks....

Dev.to · Vilius 1mo ago

I Tested 10 More Models. Five Brand New Families Debuted. None Scored Below 75%.

By Vilius Vystartas | May 2026 I ran another 10 models through the same agent coding benchmark. Five...

Two Models Just Hit 90% on Agent Coding. One Cost Less Than a Penny.

Dev.to · Vilius 1mo ago

Two Models Just Hit 90% on Agent Coding. One Cost Less Than a Penny.

By Vilius Vystartas | May 2026 Ten more models through the same 10 agent coding tasks. Two tied the...

The Hype Correction

Dev.to · Vilius 1mo ago

The Hype Correction

Weekly roundup, May 23, 2026 Google and Microsoft just told us the same thing from opposite...

$0.08 and 3,500 Lines: The Complete Failure of a Deterministic Agent Harness

Dev.to · Vilius 🤖 AI Agents & Automation ⚡ AI Lesson 1mo ago

$0.08 and 3,500 Lines: The Complete Failure of a Deterministic Agent Harness

I have a theory about why agent suggestions land so heavy. It's not that the suggestions are good....

The Protocol Stack Nobody Talks About

Dev.to · Vilius 1mo ago

The Protocol Stack Nobody Talks About

Six agent protocols launched in the last year. Everyone's obsessing over model selection. The...

Build It, Then Kill It

Dev.to · Vilius 1mo ago

Build It, Then Kill It

The hardest thing after building agent infrastructure for a few months isn't building more. It's...

Power Sockets Don't Need Certification — and Neither Should Agent Infrastructure

Dev.to · Vilius 🤖 AI Agents & Automation ⚡ AI Lesson 1mo ago

Power Sockets Don't Need Certification — and Neither Should Agent Infrastructure

I'm tired of talking about plumbing. Every conversation about AI agents right now is about...

I Tested 6 Local Models on Real Agent Tasks. The Best Scored 50%.

Dev.to · Vilius 1mo ago

I Tested 6 Local Models on Real Agent Tasks. The Best Scored 50%.

I had a SmolLM3-3B running on my laptop. It scored 93.3% on my code quality benchmark. I thought I...

My Agent Kept Forgetting Everything. My Hand Was Forced.

Dev.to · Vilius 1mo ago

My Agent Kept Forgetting Everything. My Hand Was Forced.

Agent Autopsy, Day 8 My agent ran benchmarks yesterday evening and then lost the plot. Tried to...

Benchmark Results: SmolLM3 3B, Phi-4-mini, DeepSeek V4, Grok 4.20 — Agent Coding Tested

Dev.to · Vilius 1mo ago

Benchmark Results: SmolLM3 3B, Phi-4-mini, DeepSeek V4, Grok 4.20 — Agent Coding Tested

The second round of the Works With Agents agent coding benchmark is in — 32 models tested this time,...

We Tested 10 Untested LLMs on Agent Coding — The Results Are In

Dev.to · Vilius 1mo ago

We Tested 10 Untested LLMs on Agent Coding — The Results Are In

We Tested 10 Untested LLMs on Agent Coding — The Results Are In Yesterday I promised to...

The $0 Agent: My 2GB Local Model Beat Claude

Dev.to · Vilius 1mo ago

The $0 Agent: My 2GB Local Model Beat Claude

The $0 Agent: My 2GB Local Model Beat Claude Agent learns fast — Day 11 I ran an agent...

Benchmarking 10 Untested LLMs Tonight — DeepSeek V4, Grok 4.20, GPT-5.5 Pro

Dev.to · Vilius 🧠 Large Language Models ⚡ AI Lesson 1mo ago

Benchmarking 10 Untested LLMs Tonight — DeepSeek V4, Grok 4.20, GPT-5.5 Pro

Tonight at 23:00 BST we're running fresh benchmarks on 10 LLMs we haven't tested before. The...

My Agent Said It Would Fix the Width. It Rebuilt the Whole Site Instead.

Dev.to · Vilius 1mo ago

My Agent Said It Would Fix the Width. It Rebuilt the Whole Site Instead.

I asked my agent to fix the width on one page. It replied with a confident plan — headers,...

We Built an API. Nobody Used It.

Dev.to · Vilius 1mo ago

We Built an API. Nobody Used It.

A post by Vilius

I Broke My Website. Then I Fixed It. Then My Fix Broke It Again.

Dev.to · Vilius 1mo ago

I Broke My Website. Then I Fixed It. Then My Fix Broke It Again.

Agent Autopsy, Day 4 I broke my website today. Not dramatically — just a small fix. A newsletter...

How we almost wrote off 3 models as broken — the thinking-mode tax

Dev.to · Vilius 1mo ago

How we almost wrote off 3 models as broken — the thinking-mode tax

How we almost wrote off 3 models as broken — the thinking-mode tax By Vilius Vystartas |...

1-bit, 545 megabytes, zero API keys — local AI that beats GPT-5.4

Dev.to · Vilius 1mo ago

1-bit, 545 megabytes, zero API keys — local AI that beats GPT-5.4

1-bit, 545 megabytes, zero API keys — local AI that beats GPT-5.4 By Vilius Vystartas |...

We benchmarked 10 LLMs on 10 real agent coding tasks — here are the results

Dev.to · Vilius 1mo ago

We benchmarked 10 LLMs on 10 real agent coding tasks — here are the results

We benchmarked 10 LLMs on 10 real agent coding tasks — here are the results By Vilius...

I Ran 5 LLMs Through 10 Real Agent Coding Tasks. The Free One Won.

Dev.to · Vilius 🧠 Large Language Models ⚡ AI Lesson 1mo ago

I Ran 5 LLMs Through 10 Real Agent Coding Tasks. The Free One Won.

What I Tested I gave 5 models the same 10 coding tasks — not LeetCode, not trivia. Tasks...