Share your speculative settings for llama.cpp and Gemma4
📰 Reddit r/LocalLLaMA
I have totally missed the boat on speculative decoding. Today when generating some code again for the frontend i found myself staring down at some quite monotonic javascript code. I decided to give a go at the speculative decoding settings of llama.cpp and was pleasantly surprised as i saw a 15-30% speedup in generation for this exact usecase. The code was an arcade game on canvas (lots of simple fors and if statements for boundary checks and simple game
DeepCamp AI