Disposition Distillation at Small Scale: A Three-Arc Negative Result

📰 ArXiv cs.AI

arXiv:2604.11867v1 Announce Type: cross Abstract: We set out to train behavioral dispositions (self-verification, uncertainty acknowledgment, feedback integration) into small language models (0.6B to 2.3B effective parameters) through a four-stage all-MIT distillation pipeline, with follow-on experiments on inference-time attention-head interventions and a frozen-base confidence-gated sidecar. An internal draft reported +33.9-point MCAS and +15.3-point HumanEval gains on a Qwen3-0.6B student; a

Published 15 Apr 2026

Read full paper → ← Back to Reads