📰 ArXiv cs.AI

Articles from ArXiv cs.AI · 7,014 articles · Updated every 3 hours · View all reads

arXiv:2604.08712v1 Announce Type: new Abstract: The generation of planning domains from natural language descriptions remains an open problem even with the adve

ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 2w ago

Artifacts as Memory Beyond the Agent Boundary

arXiv:2604.08756v1 Announce Type: new Abstract: The situated view of cognition holds that intelligent behavior depends not only on internal memory, but on an ag

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 2w ago

Hidden in Plain Sight: Visual-to-Symbolic Analytical Solution Inference from Field Visualizations

arXiv:2604.08863v1 Announce Type: new Abstract: Recovering analytical solutions of physical fields from visual observations is a fundamental yet underexplored c

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 2w ago

SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks

arXiv:2604.08865v1 Announce Type: new Abstract: Proximal Policy Optimization (PPO) is central to aligning Large Language Models (LLMs) in reasoning tasks with v

ArXiv cs.AI 📄 Paper 2w ago

StaRPO: Stability-Augmented Reinforcement Policy Optimization

arXiv:2604.08905v1 Announce Type: new Abstract: Reinforcement learning (RL) is effective in enhancing the accuracy of large language models in complex reasoning

ArXiv cs.AI 📄 Paper 2w ago

Enhancing LLM Problem Solving via Tutor-Student Multi-Agent Interaction

arXiv:2604.08931v1 Announce Type: new Abstract: Human cognitive development is shaped not only by individual effort but by structured social interaction, where

ArXiv cs.AI 📄 Paper 2w ago

PilotBench: A Benchmark for General Aviation Agents with Safety Constraints

arXiv:2604.08987v1 Announce Type: new Abstract: As Large Language Models (LLMs) advance toward embodied AI agents operating in physical environments, a fundamen

ArXiv cs.AI 📄 Paper 2w ago

SEA-Eval: A Benchmark for Evaluating Self-Evolving Agents Beyond Episodic Assessment

arXiv:2604.08988v1 Announce Type: new Abstract: Current LLM-based agents demonstrate strong performance in episodic task execution but remain constrained by sta

ArXiv cs.AI 📄 Paper 2w ago

Hypergraph Neural Networks Accelerate MUS Enumeration

arXiv:2604.09001v1 Announce Type: new Abstract: Enumerating Minimal Unsatisfiable Subsets (MUSes) is a fundamental task in constraint satisfaction problems (CSP

ArXiv cs.AI 📄 Paper 2w ago

Advantage-Guided Diffusion for Model-Based Reinforcement Learning

arXiv:2604.09035v1 Announce Type: new Abstract: Model-based reinforcement learning (MBRL) with autoregressive world models suffers from compounding errors, wher

ArXiv cs.AI 📄 Paper 2w ago

Overhang Tower: Resource-Rational Adaptation in Sequential Physical Planning

arXiv:2604.09072v1 Announce Type: new Abstract: Humans effortlessly navigate the physical world by predicting how objects behave under gravity and contact force

ArXiv cs.AI 📄 Paper 2w ago

Camera Artist: A Multi-Agent Framework for Cinematic Language Storytelling Video Generation

arXiv:2604.09195v1 Announce Type: new Abstract: We propose Camera Artist, a multi-agent framework that models a real-world filmmaking workflow to generate narra

ArXiv cs.AI 📄 Paper 2w ago

DRBENCHER: Can Your Agent Identify the Entity, Retrieve Its Properties and Do the Math?

arXiv:2604.09251v1 Announce Type: new Abstract: Deep research agents increasingly interleave web browsing with multi-step computation, yet existing benchmarks e

ArXiv cs.AI 📄 Paper 2w ago

SAGE: A Service Agent Graph-guided Evaluation Benchmark

arXiv:2604.09285v1 Announce Type: new Abstract: The development of Large Language Models (LLMs) has catalyzed automation in customer service, yet benchmarking t

ArXiv cs.AI 📄 Paper 2w ago

Constraint-Aware Corrective Memory for Language-Based Drug Discovery Agents

arXiv:2604.09308v1 Announce Type: new Abstract: Large language models are making autonomous drug discovery agents increasingly feasible, but reliable success in

ArXiv cs.AI 📄 Paper 2w ago

Mind the Gap Between Spatial Reasoning and Acting! Step-by-Step Evaluation of Agents With Spatial-Gym

arXiv:2604.09338v1 Announce Type: new Abstract: Spatial reasoning is central to navigation and robotics, yet measuring model capabilities on these tasks remains

ArXiv cs.AI 📄 Paper 2w ago

HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?

arXiv:2604.09408v1 Announce Type: new Abstract: Frontier coding agents solve complex tasks when given complete context but collapse when specifications are inco

ArXiv cs.AI 📄 Paper 2w ago

Do We Really Need to Approach the Entire Pareto Front in Many-Objective Bayesian Optimisation?

arXiv:2604.09417v1 Announce Type: new Abstract: Many-objective optimisation, a subset of multi-objective optimisation, involves optimisation problems with more

ArXiv cs.AI 📄 Paper 2w ago

E3-TIR: Enhanced Experience Exploitation for Tool-Integrated Reasoning

arXiv:2604.09455v1 Announce Type: new Abstract: While Large Language Models (LLMs) have demonstrated significant potential in Tool-Integrated Reasoning (TIR), e

ArXiv cs.AI 📄 Paper 2w ago

Process Reward Agents for Steering Knowledge-Intensive Reasoning

arXiv:2604.09482v1 Announce Type: new Abstract: Reasoning in knowledge-intensive domains remains challenging as intermediate steps are often not locally verifia

ArXiv cs.AI 📄 Paper 2w ago

Strategic Algorithmic Monoculture:Experimental Evidence from Coordination Games

arXiv:2604.09502v1 Announce Type: new Abstract: AI agents increasingly operate in multi-agent environments where outcomes depend on coordination. We distinguish

ArXiv cs.AI 📄 Paper 2w ago

On Divergence Measures for Training GFlowNets

arXiv:2410.09355v2 Announce Type: cross Abstract: Generative Flow Networks (GFlowNets) are amortized inference models designed to sample from unnormalized distr

ArXiv cs.AI 📄 Paper 2w ago

Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces

arXiv:2604.08362v1 Announce Type: cross Abstract: The emergence of Large Language Models (LLMs) has illuminated the potential for a general-purpose user simulat

ArXiv cs.AI 📄 Paper 2w ago

VerifAI: A Verifiable Open-Source Search Engine for Biomedical Question Answering

arXiv:2604.08549v1 Announce Type: cross Abstract: We introduce VerifAI, an open-source expert system for biomedical question answering that integrates retrieval