GraphKV, kv cache optimization based on graph embedding models
📰 Reddit r/LocalLLaMA
I've been working on a project inspired by TurboQuant, It isnt perfect but it's pretty good for a project I started today, please check it out. GraphKV Test Profile Cache bytes Compression Quality Tiny GPT-2 actual next-token forward</
DeepCamp AI