Document-tuning for robust alignment to animals

📰 ArXiv cs.AI

arXiv:2604.13076v1 Announce Type: cross Abstract: We investigate the robustness of value alignment via finetuning with synthetic documents, using animal compassion as a value that is both important in its own right and orthogonal to existing alignment efforts. To evaluate compassionate reasoning, we develop and publicly release the Animal Harm Benchmark (AHB), a 26-question evaluation spanning 13 ethical dimensions, publicly available as a dataset and Inspect evaluation. On the AHB, training wit

Published 16 Apr 2026
Read full paper → ← Back to Reads