Block-Based Double Decoders

📰 ArXiv cs.AI

arXiv:2605.18807v1 Announce Type: cross Abstract: Encoder-decoder models offer substantial inference-time savings over decoder-only models, but their pretraining objectives suffer from sparse supervision and dynamic sequence lengths, keeping them out of practice at scale. We propose block-based double decoders, a novel transformer architecture that utilizes doubly-causal block-based attention masks to train with full loss supervision and static sequence packing, combining decoder-only training e

Published 20 May 2026

Full Article

Title: Block-Based Double Decoders

Abstract:
arXiv:2605.18807v1 Announce Type: cross Abstract: Encoder-decoder models offer substantial inference-time savings over decoder-only models, but their pretraining objectives suffer from sparse supervision and dynamic sequence lengths, keeping them out of practice at scale. We propose block-based double decoders, a novel transformer architecture that utilizes doubly-causal block-based attention masks to train with full loss supervision and static sequence packing, combining decoder-only training e
Read full paper → ← Back to Reads