📰 Dev.to · João André Gomes Marques

Articles from Dev.to · João André Gomes Marques · 47 articles · Updated every 3 hours · View all reads

All ⚡ AI Lessons (11494) ArXiv cs.AI Dev.to · FORUM WEB Dev.to AI Forbes Innovation OpenAI News Hugging Face Blog

Replay what your AI agent did, step by step

Dev.to · João André Gomes Marques 23h ago

Replay what your AI agent did, step by step

If you're running AI agents in production, you probably have some form of audit trail already. Maybe...

One-click compliance bundles for AI agent audits

Dev.to · João André Gomes Marques 1d ago

One-click compliance bundles for AI agent audits

An auditor walks in and asks for evidence that your AI agents are governed. You have signing data...

Test AI agent governance without touching production

Dev.to · João André Gomes Marques 🤖 AI Agents & Automation ⚡ AI Lesson 1d ago

Test AI agent governance without touching production

You built an AI agent pipeline. It works. Users depend on it. Now someone asks you to add governance...

Layer 1 is identity, Layer 2 is attestation

Dev.to · João André Gomes Marques 2d ago

Layer 1 is identity, Layer 2 is attestation

AI agents are getting identity systems. DIDs, Ed25519 signatures, certificate-based auth - the...

SDK v0.2.9: Output Verification, Attestations, Preflight and Budgets

Dev.to · João André Gomes Marques 4d ago

SDK v0.2.9: Output Verification, Attestations, Preflight and Budgets

v0.2.9 is out on PyPI. Four new things, all driven by what people asked for after shipping agents to...

Scan MCP tool definitions for prompt injection before your agent calls them

Dev.to · João André Gomes Marques 6d ago

Scan MCP tool definitions for prompt injection before your agent calls them

MCP servers expose tools to AI agents. But those tool definitions can contain prompt injection,...

Three tiers of enforcement for AI agents - strong, bounded, detectable

Dev.to · João André Gomes Marques 1w ago

Three tiers of enforcement for AI agents - strong, bounded, detectable

Most AI agent frameworks give you zero enforcement. Your agent can call any tool, take any action,...

asqav-mcp is now on Docker Hub

Dev.to · João André Gomes Marques 1w ago

asqav-mcp is now on Docker Hub

asqav-mcp is now on Docker Hub. The MCP server that gives AI agents governance capabilities - policy...

Asqav vs Microsoft Agent Governance Toolkit - what is the difference

Dev.to · João André Gomes Marques 1w ago

Asqav vs Microsoft Agent Governance Toolkit - what is the difference

Microsoft released the Agent Governance Toolkit (AGT) on April 2, 2026. I built Asqav, an open source...

Why the E8 lattice is the perfect quantizer for KV caches

Dev.to · João André Gomes Marques 1w ago

Why the E8 lattice is the perfect quantizer for KV caches

Most quantizers are chosen for convenience. E8 was chosen because the math demanded it — and then it...

Running 1M-token context on a single GPU (the math)

Dev.to · João André Gomes Marques 1w ago

Running 1M-token context on a single GPU (the math)

Most people dismiss million-token context windows as a hardware problem. It is not. It is a math...

NexusQuant is now on PyPI, HuggingFace, and 9 awesome lists

Dev.to · João André Gomes Marques 1w ago

NexusQuant is now on PyPI, HuggingFace, and 9 awesome lists

This week we shipped everything. Here is the full list. What went out the door PyPI...

Why attention-aware eviction beats random eviction (with data)

Dev.to · João André Gomes Marques 1w ago

Why attention-aware eviction beats random eviction (with data)

At high eviction rates, choosing which tokens to drop matters enormously. Here is what the numbers...

One line of Python to extend your LLM's context window 10x

Dev.to · João André Gomes Marques 1w ago

One line of Python to extend your LLM's context window 10x

Your LLM is running out of memory at 128K tokens. Here is the fix. from nexusquant import...

The 12 approaches I tested before finding one that works

Dev.to · João André Gomes Marques 1w ago

The 12 approaches I tested before finding one that works

I keep seeing ML papers that only show the final method. No dead ends, no "we tried X and it was a...

NexusQuant: compressão de memória para LLMs — guia prático

Dev.to · João André Gomes Marques 1w ago

NexusQuant: compressão de memória para LLMs — guia prático

NexusQuant: compressão de memória para LLMs — guia prático Neste guia vamos explorar os...

Como comprimir o KV cache do seu LLM em 33x sem treino

Dev.to · João André Gomes Marques 1w ago

Como comprimir o KV cache do seu LLM em 33x sem treino

Como comprimir o KV cache do seu LLM em 33x sem treino Se alguma vez tentaste correr um...

KV cache memory calculator: how much does your LLM actually use?

Dev.to · João André Gomes Marques 1w ago

KV cache memory calculator: how much does your LLM actually use?

Before you can compress something, you need to know how big it is. Most engineers know the KV cache...

How to benchmark NexusQuant on your own model

Dev.to · João André Gomes Marques 1w ago

How to benchmark NexusQuant on your own model

Running benchmarks on someone else's hardware tells you very little. This guide shows you how to...

What I Learned Testing 12 Compression Approaches That Failed

Dev.to · João André Gomes Marques 1w ago

What I Learned Testing 12 Compression Approaches That Failed

What I Learned Testing 12 Compression Approaches That Failed The most useful research I've...

The Math Behind E8 Lattice Quantization (with Code)

Dev.to · João André Gomes Marques 1w ago

The Math Behind E8 Lattice Quantization (with Code)

The Math Behind E8 Lattice Quantization (with Code) Standard scalar quantization — what...

How Much GPU Memory Does NexusQuant Actually Save?

Dev.to · João André Gomes Marques 1w ago

How Much GPU Memory Does NexusQuant Actually Save?

How Much GPU Memory Does NexusQuant Actually Save? KV cache compression numbers like "10x"...

How to deploy NexusQuant in production (and what's missing)

Dev.to · João André Gomes Marques 1w ago

How to deploy NexusQuant in production (and what's missing)

This post is a practical deployment guide. Install, configuration, how to pick the right eviction...

NexusQuant vs KVTC vs TurboQuant vs CommVQ — honest comparison

Dev.to · João André Gomes Marques 1w ago

NexusQuant vs KVTC vs TurboQuant vs CommVQ — honest comparison

There are now enough KV cache compression papers that "we beat the competition" is meaningless...