On-Device Summaries: Surviving Transcript Pressure

📰 Hackernoon

In a refinement flow, a reused LanguageModelSession does not give you free memory. Old prompts and generated summaries stay in the transcript, and that transcript spends the next request’s context budget. Measure before respond(to:), catch exceededContextWindowSize(_:) when the limit is real, and recover by reseeding a new session only with state the app owns: the source note, the latest refinement intent, and the selected instructions.

Published 3 Jun 2026
Read full article → ← Back to Reads