▶ Videos →

📰 Dev.to · Pavan Madduri

21 articles · Updated every 3 hours · View all reads

All Articles 111,279 Blog Posts 121,416 Tech Tutorials 28,395 Research Papers 22,452 News 16,647 ⚡ AI Lessons

Serving 3 LLMs on 1 GPU - Multi-Model Inference with Docker on OKE

Dev.to · Pavan Madduri 4d ago

Serving 3 LLMs on 1 GPU - Multi-Model Inference with Docker on OKE

I had three small models I wanted to serve: Phi-3-mini for general chat, CodeLlama-7B for code...

Monitoring GPU Inference Containers on OKE with OpenTelemetry - What Prometheus Misses

Dev.to · Pavan Madduri 4d ago

Monitoring GPU Inference Containers on OKE with OpenTelemetry - What Prometheus Misses

I had Prometheus + DCGM Exporter running on my OKE cluster. It gave me GPU utilization, memory usage,...

Docker Build Cloud Cut My CI Build Times by 75%, Here's How I Wired It to OCIR

Dev.to · Pavan Madduri ☁️ DevOps & Cloud ⚡ AI Lesson 5d ago

Docker Build Cloud Cut My CI Build Times by 75%, Here's How I Wired It to OCIR

My GPU inference image was 8GB. Building it in GitHub Actions took 14 minutes on the free runner....

How I Run GPU Workloads for 70% Less on OKE Using Preemptible Instances

Dev.to · Pavan Madduri 6d ago

How I Run GPU Workloads for 70% Less on OKE Using Preemptible Instances

I was spending ~$3,300/month on three A10 GPU instances for a mix of staging inference, batch...

Running Ollama on OCI Container Instances - Private LLM API in 5 Minutes, No Kubernetes

Dev.to · Pavan Madduri 1w ago

Running Ollama on OCI Container Instances - Private LLM API in 5 Minutes, No Kubernetes

A colleague asked me to set up a private LLM endpoint their team could use for code review...

Zero-Downtime Crossplane v1 v2 Migration: Adopt-in-Place at Production Scale

Dev.to · Pavan Madduri ☁️ DevOps & Cloud ⚡ AI Lesson 1w ago

Zero-Downtime Crossplane v1 v2 Migration: Adopt-in-Place at Production Scale

How we moved a production EKS fleet from Crossplane v1 claims/composites to v2 namespaced XRs with no resource recreation, no node rotation, and no downtime — p

docker init OCIR OKE: From Empty Folder to Production in 15 Minutes

Dev.to · Pavan Madduri ☁️ DevOps & Cloud ⚡ AI Lesson 1w ago

docker init OCIR OKE: From Empty Folder to Production in 15 Minutes

I timed myself. Starting from an empty directory with a Go application idea, how fast could I get to...

I Stopped Paying for Idle GPUs - Scale-to-Zero AI Inference on OKE with KEDA

Dev.to · Pavan Madduri ⚡ AI Lesson 2w ago

I Stopped Paying for Idle GPUs - Scale-to-Zero AI Inference on OKE with KEDA

A single A10 GPU on OCI costs $1.52/hr. Running 24/7, that's $1,094/month. For a production inference...

Deploying vLLM on OKE with NVIDIA A10 GPUs: The 20-Minute Setup Nobody Talks About

Dev.to · Pavan Madduri 🧠 Large Language Models ⚡ AI Lesson 2w ago

Deploying vLLM on OKE with NVIDIA A10 GPUs: The 20-Minute Setup Nobody Talks About

Last month I needed to stand up a Llama 3 inference endpoint for an internal tool. The requirements...

Stop Downloading 8GB Models on Every Pod Restart - Use OCI Object Storage as a Model Cache

Dev.to · Pavan Madduri 🏭 MLOps & LLMOps ⚡ AI Lesson 3w ago

Stop Downloading 8GB Models on Every Pod Restart - Use OCI Object Storage as a Model Cache

The first time I deployed vLLM on OKE, the pod took 12 minutes to become ready. The model download...

Docker Model Runner Replaced My Entire Local AI Setup

Dev.to · Pavan Madduri 1mo ago

Docker Model Runner Replaced My Entire Local AI Setup

I used to have a ridiculous local AI setup. Ollama running as a service. A separate Python venv for...

Every GPU Container Bug I've Hit on OKE (and How I Fixed Them)

Dev.to · Pavan Madduri 1mo ago

Every GPU Container Bug I've Hit on OKE (and How I Fixed Them)

Running GPU containers on Kubernetes is one of those things that works perfectly in tutorials and...

From Docker Compose on My Laptop to OKE in Production — Same App, Zero Rewrites

Dev.to · Pavan Madduri ☁️ DevOps & Cloud ⚡ AI Lesson 1mo ago

From Docker Compose on My Laptop to OKE in Production — Same App, Zero Rewrites

I have a rule: if I can't run the full stack on my laptop with docker compose up, the architecture is...

I Cut My Container Image Costs 60% by Building Multi-Arch Docker Images on OCI ARM

Dev.to · Pavan Madduri 1mo ago

I Cut My Container Image Costs 60% by Building Multi-Arch Docker Images on OCI ARM

I was running all my containers on AMD64 shapes because that's what I'd always done. x86, Intel/AMD,...

The Zero-Trust Docker Pipeline: Securing GPU/AI Container Images from Build to Production

Dev.to · Pavan Madduri 🔐 Cybersecurity ⚡ AI Lesson 1mo ago

The Zero-Trust Docker Pipeline: Securing GPU/AI Container Images from Build to Production

GPU container images are the softest target in your infrastructure. A typical vLLM image is 15GB with...

GPU-Aware Autoscaling for Docker Containers: From NVML to Production

Dev.to · Pavan Madduri ☁️ DevOps & Cloud ⚡ AI Lesson 1mo ago

GPU-Aware Autoscaling for Docker Containers: From NVML to Production

Every GPU inference container has the same problem: Kubernetes HPA can't see the GPU. You scale on...

I Replaced a $3/hr GPU Dev Workflow with Docker Model Runner. Here's How

Dev.to · Pavan Madduri ⚡ AI Lesson 1mo ago

I Replaced a $3/hr GPU Dev Workflow with Docker Model Runner. Here's How

Last month I was debugging a prompt template for a vLLM inference service. The change was two lines —...

Docker + OKE: Running GPU Inference Containers on Oracle Cloud

Dev.to · Pavan Madduri ☁️ DevOps & Cloud ⚡ AI Lesson 1mo ago

Docker + OKE: Running GPU Inference Containers on Oracle Cloud

I wanted to deploy an LLM inference API without spending $1,200/month on AWS GPU instances. OCI...

Running Docker Containers on OCI Without Kubernetes

Dev.to · Pavan Madduri ☁️ DevOps & Cloud ⚡ AI Lesson 1mo ago

Running Docker Containers on OCI Without Kubernetes

I needed to run a container in the cloud. Not a microservices platform. Not a service mesh. Just one...

Deploying a Production-Ready K3s Cluster on OCI Always Free ARM Instances

Dev.to · Pavan Madduri ⚡ AI Lesson 3mo ago

Deploying a Production-Ready K3s Cluster on OCI Always Free ARM Instances

Deploying a Production-Ready K3s Cluster on OCI Always Free ARM Instances How I turned...

Why Your Kubernetes Cluster Breaks 18 Minutes After a Successful Deployment

Dev.to · Pavan Madduri ⚡ AI Lesson 4mo ago

Why Your Kubernetes Cluster Breaks 18 Minutes After a Successful Deployment

You merge the Pull Request. The CI/CD pipeline flashes green. ArgoCD reports that your application is...