When Perplexity Lies: Generation-Focused Distillation of Hybrid Sequence Models
📰 ArXiv cs.AI
arXiv:2603.26556v1 Announce Type: cross Abstract: Converting a pretrained Transformer into a more efficient hybrid model through distillation offers a promising approach to reducing inference costs. However, achieving high-quality generation in distilled models requires careful joint design of both the student architecture and the distillation process. Many prior distillation works evaluate downstream multiple-choice benchmarks by ranking candidate answers with log-likelihood rather than requiri
DeepCamp AI