Consistency LLM: converting LLMs to parallel decoders accelerates inference 3.5x

📰 Hacker News · zhisbug

Consistency LLM: converting LLMs to parallel decoders accelerates inference 3.5x. 98 comments, 461 points on Hacker News.

Published 8 May 2024
Read full article → ← Back to Reads