📰 Dev.to · Ingero Team

13 articles · Updated every 3 hours · View all reads

All Articles 66,559 Blog Posts 99,451 Tech Tutorials 16,028 Research Papers 13,806 News 12,499 ⚡ AI Lessons

From Kernel Scheduler to Python Source Line: Tracing a GPU Stall End to End

Dev.to · Ingero Team 1d ago

From Kernel Scheduler to Python Source Line: Tracing a GPU Stall End to End

TL;DR A GPU that reports 97% utilization can still be the slowest part of a training step,...

AllReduce Stalls Are Network Stalls. Most Tools See Neither.

Dev.to · Ingero Team 3d ago

AllReduce Stalls Are Network Stalls. Most Tools See Neither.

A slow AllReduce on rank 5 lines up against TCP retransmits on rank 5’s NIC, four ms before the...

What GitHub Uses eBPF For (and the Layer They Have Not Ported Yet)

Dev.to · Ingero Team 5d ago

What GitHub Uses eBPF For (and the Layer They Have Not Ported Yet)

Three eBPF patterns hyperscalers run in production today, mapped to the equivalent patterns on the...

GPU Observability for Workloads That Cannot Phone Home

Dev.to · Ingero Team 1w ago

GPU Observability for Workloads That Cannot Phone Home

For an air-gapped GPU host, the trace is only useful if collection, storage, and query all happen...

One Kernel, Zero Sidecars: Tracing AI Workloads Without an Agent on Every Host

Dev.to · Ingero Team 1w ago

One Kernel, Zero Sidecars: Tracing AI Workloads Without an Agent on Every Host

Per-host overhead multiplied across N hosts, vs. one kernel-level instrumentation per host. The...

Same eBPF, Different Vendor: Tracing libhip Calls on AMD ROCm

Dev.to · Ingero Team 2w ago

Same eBPF, Different Vendor: Tracing libhip Calls on AMD ROCm

libhip.so is to ROCm what libcudart.so is to CUDA: the user-mode runtime API the framework calls...

What Inference-Platform Benchmark Posts Leave Out

Dev.to · Ingero Team 2w ago

What Inference-Platform Benchmark Posts Leave Out

DCGM stops at host-level GPU counters. Kernel-side eBPF adds the per-rank, per-tenant signals...

MCP Shows What the Agent Did. eBPF Shows Why the GPU Stalled.

Dev.to · Ingero Team ☁️ DevOps & Cloud ⚡ AI Lesson 2w ago

MCP Shows What the Agent Did. eBPF Shows Why the GPU Stalled.

MCP exposes the agent’s tool calls. eBPF exposes the kernel events that explain why those tool...

MCP Tools Are New API Surfaces. eBPF Sees What They Actually Touch.

Dev.to · Ingero Team 3w ago

MCP Tools Are New API Surfaces. eBPF Sees What They Actually Touch.

An MCP tool call is a tiny line of agent code that fans out to syscalls, library calls, and kernel...

A Cluster Stall Looks Healthy on Every Host. The Cause Is in the Pattern Across Hosts.

Dev.to · Ingero Team 3w ago

A Cluster Stall Looks Healthy on Every Host. The Cause Is in the Pattern Across Hosts.

Eight ranks on two hosts. Every per-host metric reads healthy. Rank 5 enters the barrier 290ms...

26 Seconds to Find a Straggler: Fleet v0.10 End-to-End on A100 and GH200

Dev.to · Ingero Team 1mo ago

26 Seconds to Find a Straggler: Fleet v0.10 End-to-End on A100 and GH200

TL;DR Ingero Fleet v0.10 FOSS is live. We validated the full pipeline end-to-end on two...

124x Slower: What PyTorch DataLoader Actually Does at the Kernel Level

Dev.to · Ingero Team 1mo ago

124x Slower: What PyTorch DataLoader Actually Does at the Kernel Level

TL;DR: PyTorch's DataLoader can be 50-124x slower than direct tensor indexing for in-memory GPU...

Tracing a 13x PyTorch Slowdown to a Hidden NumPy Synchronization

Dev.to · Ingero Team 2mo ago

Tracing a 13x PyTorch Slowdown to a Hidden NumPy Synchronization

TL;DR: A .cpu().numpy() call buried inside a forward pass was forcing a full CPU-GPU synchronization...