Is Large Language Model Performance on Reasoning Tasks Impacted by Different Ways Questions Are Asked?
📰 ArXiv cs.AI
arXiv:2507.15707v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) have been evaluated using diverse question types, e.g., multiple-choice, true/false, and short/long answers. This study answers an unexplored question about the impact of different question types on LLM accuracy on reasoning tasks. We investigate the performance of five LLMs on three different types of questions using quantitative and deductive reasoning tasks. The performance metrics include accuracy in the r
DeepCamp AI