Why do we use Flash Attention?
📰 Medium · LLM
I was about to read up on Songlin Yang’s paper on DeltNet, but hit a snag in understanding the way in which they optimized DeltaNet. They… Continue reading on KAIRI »
I was about to read up on Songlin Yang’s paper on DeltNet, but hit a snag in understanding the way in which they optimized DeltaNet. They… Continue reading on KAIRI »