Fairness Evaluation and Inference Level Mitigation in LLMs

📰 ArXiv cs.AI

arXiv:2510.18914v3 Announce Type: replace-cross Abstract: Large language models often display undesirable behaviors embedded in their internal representations, undermining fairness, inconsistency drift, amplification of harmful content, and the propagation of unwanted patterns during extended dialogue and conversations. Although training-time or data-centric methods attempt to reduce these effects, they are computationally expensive, irreversible once deployed, and slow to adapt to new conversat

Published 8 Apr 2026

Read full paper → ← Back to News