NVFP4 on llama.cpp?

📰 Reddit r/LocalLLaMA

Hey everyone, Even through I check the subreddit daily, some things are a bit hard to grasp for me due to the speed at progress is made (really impressive!). I tried doing research using deepseek v4 but it left me even more puzzled. Recently I saw NVFP4 support being merged into llama.cpp. Since I have dual RTX 5060 Ti's, I would love to make use of it but I didn't fully grasp how. I also saw someone releasing NVFP4 quants of Gemma4 QAT, see

Published 7 Jun 2026

Read full article → ← Back to Reads