Evaluating Human-AI Safety: A Framework for Measuring Harmful Capability Uplift

📰 ArXiv cs.AI

A framework for measuring harmful capability uplift in human-AI safety evaluations is proposed, focusing on human-centered assessments

advanced Published 31 Mar 2026

Action Steps

Define harmful capability uplift as a core AI safety metric
Develop human-centered evaluation methods to measure uplift
Assess marginal increase in user's ability to cause harm with frontier models
Ground evaluations in real-world scenarios and user interactions

Who Needs to Know This

AI researchers and safety experts on a team benefit from this framework as it provides a novel approach to evaluating AI safety, while product managers and entrepreneurs can use it to inform responsible AI development

Key Insight

💡 Harmful capability uplift should be a core metric in AI safety evaluations