How do i prevent llama.cpp from offloading on Swap?

📰 Reddit r/LocalLLaMA

I have tried preventing this issue by using llama.cpp flags. However, I still have the issue: whenever I'm close to my 96GB of RAM, llama-server / llama.cpp decides to offload the KV cache onto my swap. This usually happens when I'm at 91-92GB of RAM and I still have 4GB to spare. Is there a more aggressive way for llama.cpp to only offload when I'm at, let's say, 95GB of RAM? Specs: M2 Max 96GB Qwen 3.5 122b q4 latest llama.cpp version llama-serv

Published 11 Jun 2026
Read full article → ← Back to Reads