What Models Know, How Well They Know It: Knowledge-Weighted Fine-Tuning for Learning When to Say "I Don't Know"

📰 ArXiv cs.AI

Knowledge-weighted fine-tuning helps large language models learn when to say 'I don't know' by estimating instance-level knowledge scores

advanced Published 8 Apr 2026

Action Steps

Estimate instance-level knowledge scores via multi-sampled inference
Scale the learning signal according to the model's existing knowledge
Fine-tune the model using knowledge-weighted learning signals
Evaluate the model's performance on out-of-distribution data to test its ability to say 'I don't know'

Who Needs to Know This

AI engineers and researchers on a team benefit from this approach as it improves the reliability of large language models, while product managers can utilize this to develop more transparent and trustworthy AI-powered products

Key Insight

💡 Estimating instance-level knowledge scores can help large language models learn when to say 'I don't know'