Accelerating decode-heavy LLM inference with speculative decoding on AWS Trainium and vLLM
📰 AWS Machine Learning
In this post, you will learn how speculative decoding works and why it helps reduce cost per generated token on AWS Trainium2.
In this post, you will learn how speculative decoding works and why it helps reduce cost per generated token on AWS Trainium2.