Consistency LLM: converting LLMs to parallel decoders accelerates inference 3.5x
📰 Hacker News · zhisbug
Consistency LLM: converting LLMs to parallel decoders accelerates inference 3.5x. 98 comments, 461 points on Hacker News.
Consistency LLM: converting LLMs to parallel decoders accelerates inference 3.5x. 98 comments, 461 points on Hacker News.