When Perplexity Lies: Generation-Focused Distillation of Hybrid Sequence Models

📰 ArXiv cs.AI

arXiv:2603.26556v1 Announce Type: cross Abstract: Converting a pretrained Transformer into a more efficient hybrid model through distillation offers a promising approach to reducing inference costs. However, achieving high-quality generation in distilled models requires careful joint design of both the student architecture and the distillation process. Many prior distillation works evaluate downstream multiple-choice benchmarks by ranking candidate answers with log-likelihood rather than requiri

Published 30 Mar 2026

Read full paper → ← Back to News