SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors

📰 ArXiv cs.AI

arXiv:2510.17516v4 Announce Type: replace-cross Abstract: Large language model (LLM) simulations of human behavior have the potential to revolutionize the social and behavioral sciences, if and only if they faithfully reflect real human behaviors. Current evaluations of simulation fidelity are fragmented, based on bespoke tasks and metrics, creating a patchwork of incomparable results. To address this, we introduce SimBench, the first large-scale, standardized benchmark for a robust, reproducibl

Published 14 Apr 2026
Read full paper → ← Back to Reads