Hedging and Non-Affirmation: Quantifying LLM Alignment on Questions of Human Rights

📰 ArXiv cs.AI

arXiv:2502.19463v2 Announce Type: replace-cross Abstract: Hedging and non-affirmation are behaviors exhibited by large language models (LLMs) that limit the clear endorsement of specific statements. While these behaviors are desirable in subjective contexts, they are undesirable in the context of human rights - which apply unambiguously to all groups. We present a systematic framework to measure these behaviors in unconstrained LLM responses regarding various identity groups. We evaluate six lar

Published 8 Apr 2026

Read full paper → ← Back to News