FactoryBench: Evaluating Industrial Machine Understanding

📰 ArXiv cs.AI

arXiv:2605.07675v1 Announce Type: new Abstract: We introduce FactoryBench, a benchmark for evaluating time-series models and LLMs on machine understanding over industrial robotic telemetry. Q&A pairs are organized along four causal levels (state, intervention, counterfactual, decision) instantiating Pearl's ladder of causation, and span five answer formats: four structured formats are scored deterministically and free-form answers are scored by an LLM-as-judge voting protocol. We propose a scala

Published 11 May 2026

Read full paper → ← Back to Reads