Why do we use Flash Attention?

📰 Medium · LLM

I was about to read up on Songlin Yang’s paper on DeltNet, but hit a snag in understanding the way in which they optimized DeltaNet. They… Continue reading on KAIRI »

Published 15 Apr 2026
Read full article → ← Back to Reads