Executing a plan under context constraints

📰 Reddit r/LocalLLaMA

I'm running Qwen 3.6 35B-A3B via Pi harness on a 32gb unified RAM setup (Framework 13). llama.cpp, 64k context window. I worked with the model to plan through a refactor, and by the time it came time to execute the plan, I was sitting at around 66% context window usage. This isn't alarming but occasionally planning could lead to very high context window consumption, especially if I am vague and it has to do many tool calls to understand what I mean

Published 11 Jun 2026
Read full article → ← Back to Reads