📰 Dev.to · João André Gomes Marques
Articles from Dev.to · João André Gomes Marques · 47 articles · Updated every 3 hours · View all reads
All
⚡ AI Lessons (11821)
ArXiv cs.AIDev.to · FORUM WEBDev.to AIForbes InnovationOpenAI NewsHugging Face Blog

Dev.to · João André Gomes Marques
1d ago
Replay what your AI agent did, step by step
If you're running AI agents in production, you probably have some form of audit trail already. Maybe...

Dev.to · João André Gomes Marques
2d ago
One-click compliance bundles for AI agent audits
An auditor walks in and asks for evidence that your AI agents are governed. You have signing data...

Dev.to · João André Gomes Marques
🤖 AI Agents & Automation
⚡ AI Lesson
2d ago
Test AI agent governance without touching production
You built an AI agent pipeline. It works. Users depend on it. Now someone asks you to add governance...

Dev.to · João André Gomes Marques
3d ago
Layer 1 is identity, Layer 2 is attestation
AI agents are getting identity systems. DIDs, Ed25519 signatures, certificate-based auth - the...

Dev.to · João André Gomes Marques
5d ago
SDK v0.2.9: Output Verification, Attestations, Preflight and Budgets
v0.2.9 is out on PyPI. Four new things, all driven by what people asked for after shipping agents to...

Dev.to · João André Gomes Marques
1w ago
Scan MCP tool definitions for prompt injection before your agent calls them
MCP servers expose tools to AI agents. But those tool definitions can contain prompt injection,...

Dev.to · João André Gomes Marques
1w ago
Three tiers of enforcement for AI agents - strong, bounded, detectable
Most AI agent frameworks give you zero enforcement. Your agent can call any tool, take any action,...

Dev.to · João André Gomes Marques
1w ago
asqav-mcp is now on Docker Hub
asqav-mcp is now on Docker Hub. The MCP server that gives AI agents governance capabilities - policy...

Dev.to · João André Gomes Marques
1w ago
Asqav vs Microsoft Agent Governance Toolkit - what is the difference
Microsoft released the Agent Governance Toolkit (AGT) on April 2, 2026. I built Asqav, an open source...

Dev.to · João André Gomes Marques
1w ago
Why the E8 lattice is the perfect quantizer for KV caches
Most quantizers are chosen for convenience. E8 was chosen because the math demanded it — and then it...

Dev.to · João André Gomes Marques
1w ago
Running 1M-token context on a single GPU (the math)
Most people dismiss million-token context windows as a hardware problem. It is not. It is a math...

Dev.to · João André Gomes Marques
1w ago
NexusQuant is now on PyPI, HuggingFace, and 9 awesome lists
This week we shipped everything. Here is the full list. What went out the door PyPI...

Dev.to · João André Gomes Marques
1w ago
Why attention-aware eviction beats random eviction (with data)
At high eviction rates, choosing which tokens to drop matters enormously. Here is what the numbers...

Dev.to · João André Gomes Marques
1w ago
One line of Python to extend your LLM's context window 10x
Your LLM is running out of memory at 128K tokens. Here is the fix. from nexusquant import...

Dev.to · João André Gomes Marques
1w ago
The 12 approaches I tested before finding one that works
I keep seeing ML papers that only show the final method. No dead ends, no "we tried X and it was a...

Dev.to · João André Gomes Marques
1w ago
NexusQuant: compressão de memória para LLMs — guia prático
NexusQuant: compressão de memória para LLMs — guia prático Neste guia vamos explorar os...

Dev.to · João André Gomes Marques
1w ago
Como comprimir o KV cache do seu LLM em 33x sem treino
Como comprimir o KV cache do seu LLM em 33x sem treino Se alguma vez tentaste correr um...

Dev.to · João André Gomes Marques
1w ago
KV cache memory calculator: how much does your LLM actually use?
Before you can compress something, you need to know how big it is. Most engineers know the KV cache...

Dev.to · João André Gomes Marques
1w ago
How to benchmark NexusQuant on your own model
Running benchmarks on someone else's hardware tells you very little. This guide shows you how to...

Dev.to · João André Gomes Marques
1w ago
What I Learned Testing 12 Compression Approaches That Failed
What I Learned Testing 12 Compression Approaches That Failed The most useful research I've...

Dev.to · João André Gomes Marques
1w ago
The Math Behind E8 Lattice Quantization (with Code)
The Math Behind E8 Lattice Quantization (with Code) Standard scalar quantization — what...

Dev.to · João André Gomes Marques
1w ago
How Much GPU Memory Does NexusQuant Actually Save?
How Much GPU Memory Does NexusQuant Actually Save? KV cache compression numbers like "10x"...

Dev.to · João André Gomes Marques
1w ago
How to deploy NexusQuant in production (and what's missing)
This post is a practical deployment guide. Install, configuration, how to pick the right eviction...

Dev.to · João André Gomes Marques
1w ago
NexusQuant vs KVTC vs TurboQuant vs CommVQ — honest comparison
There are now enough KV cache compression papers that "we beat the competition" is meaningless...
DeepCamp AI