FactoryBench: Evaluating Industrial Machine Understanding
📰 ArXiv cs.AI
arXiv:2605.07675v1 Announce Type: new Abstract: We introduce FactoryBench, a benchmark for evaluating time-series models and LLMs on machine understanding over industrial robotic telemetry. Q&A pairs are organized along four causal levels (state, intervention, counterfactual, decision) instantiating Pearl's ladder of causation, and span five answer formats: four structured formats are scored deterministically and free-form answers are scored by an LLM-as-judge voting protocol. We propose a scala
DeepCamp AI