CresOWLve: Benchmarking Creative Problem-Solving Over Real-World Knowledge

📰 ArXiv cs.AI

CresOWLve benchmarks creative problem-solving in LLMs using real-world knowledge

advanced Published 7 Apr 2026

Action Steps

Identify the limitations of existing benchmarks for LLMs
Develop a benchmark that evaluates creative problem-solving over real-world knowledge
Use CresOWLve to assess the performance of LLMs in combining logical reasoning, lateral thinking, analogy-making, and commonsense knowledge
Apply the insights from CresOWLve to improve the creative problem-solving capabilities of LLMs

Who Needs to Know This

AI researchers and engineers benefit from this benchmark as it evaluates the creative problem-solving capabilities of LLMs, while product managers can use it to assess the potential of LLMs in real-world applications

Key Insight

💡 CresOWLve provides a comprehensive evaluation of LLMs' creative problem-solving abilities, going beyond traditional benchmarks