7,014 articles

📰 ArXiv cs.AI

Articles from ArXiv cs.AI · 7,014 articles · Updated every 3 hours · View all reads

All ⚡ AI Lessons (18896) ArXiv cs.AIDev.to AIDev.to · FORUM WEBForbes InnovationMedium · ProgrammingMedium · AI
ArXiv cs.AI 📄 Paper 2w ago
StaRPO: Stability-Augmented Reinforcement Policy Optimization
arXiv:2604.08905v1 Announce Type: new Abstract: Reinforcement learning (RL) is effective in enhancing the accuracy of large language models in complex reasoning
ArXiv cs.AI 📄 Paper 2w ago
Enhancing LLM Problem Solving via Tutor-Student Multi-Agent Interaction
arXiv:2604.08931v1 Announce Type: new Abstract: Human cognitive development is shaped not only by individual effort but by structured social interaction, where
ArXiv cs.AI 📄 Paper 2w ago
PilotBench: A Benchmark for General Aviation Agents with Safety Constraints
arXiv:2604.08987v1 Announce Type: new Abstract: As Large Language Models (LLMs) advance toward embodied AI agents operating in physical environments, a fundamen
ArXiv cs.AI 📄 Paper 2w ago
SEA-Eval: A Benchmark for Evaluating Self-Evolving Agents Beyond Episodic Assessment
arXiv:2604.08988v1 Announce Type: new Abstract: Current LLM-based agents demonstrate strong performance in episodic task execution but remain constrained by sta
ArXiv cs.AI 📄 Paper 2w ago
Hypergraph Neural Networks Accelerate MUS Enumeration
arXiv:2604.09001v1 Announce Type: new Abstract: Enumerating Minimal Unsatisfiable Subsets (MUSes) is a fundamental task in constraint satisfaction problems (CSP
ArXiv cs.AI 📄 Paper 2w ago
Advantage-Guided Diffusion for Model-Based Reinforcement Learning
arXiv:2604.09035v1 Announce Type: new Abstract: Model-based reinforcement learning (MBRL) with autoregressive world models suffers from compounding errors, wher
ArXiv cs.AI 📄 Paper 2w ago
Overhang Tower: Resource-Rational Adaptation in Sequential Physical Planning
arXiv:2604.09072v1 Announce Type: new Abstract: Humans effortlessly navigate the physical world by predicting how objects behave under gravity and contact force
ArXiv cs.AI 📄 Paper 2w ago
Camera Artist: A Multi-Agent Framework for Cinematic Language Storytelling Video Generation
arXiv:2604.09195v1 Announce Type: new Abstract: We propose Camera Artist, a multi-agent framework that models a real-world filmmaking workflow to generate narra
ArXiv cs.AI 📄 Paper 2w ago
DRBENCHER: Can Your Agent Identify the Entity, Retrieve Its Properties and Do the Math?
arXiv:2604.09251v1 Announce Type: new Abstract: Deep research agents increasingly interleave web browsing with multi-step computation, yet existing benchmarks e
ArXiv cs.AI 📄 Paper 2w ago
SAGE: A Service Agent Graph-guided Evaluation Benchmark
arXiv:2604.09285v1 Announce Type: new Abstract: The development of Large Language Models (LLMs) has catalyzed automation in customer service, yet benchmarking t
ArXiv cs.AI 📄 Paper 2w ago
Constraint-Aware Corrective Memory for Language-Based Drug Discovery Agents
arXiv:2604.09308v1 Announce Type: new Abstract: Large language models are making autonomous drug discovery agents increasingly feasible, but reliable success in
ArXiv cs.AI 📄 Paper 2w ago
Mind the Gap Between Spatial Reasoning and Acting! Step-by-Step Evaluation of Agents With Spatial-Gym
arXiv:2604.09338v1 Announce Type: new Abstract: Spatial reasoning is central to navigation and robotics, yet measuring model capabilities on these tasks remains
ArXiv cs.AI 📄 Paper 2w ago
HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?
arXiv:2604.09408v1 Announce Type: new Abstract: Frontier coding agents solve complex tasks when given complete context but collapse when specifications are inco
ArXiv cs.AI 📄 Paper 2w ago
Do We Really Need to Approach the Entire Pareto Front in Many-Objective Bayesian Optimisation?
arXiv:2604.09417v1 Announce Type: new Abstract: Many-objective optimisation, a subset of multi-objective optimisation, involves optimisation problems with more
ArXiv cs.AI 📄 Paper 2w ago
E3-TIR: Enhanced Experience Exploitation for Tool-Integrated Reasoning
arXiv:2604.09455v1 Announce Type: new Abstract: While Large Language Models (LLMs) have demonstrated significant potential in Tool-Integrated Reasoning (TIR), e
ArXiv cs.AI 📄 Paper 2w ago
Process Reward Agents for Steering Knowledge-Intensive Reasoning
arXiv:2604.09482v1 Announce Type: new Abstract: Reasoning in knowledge-intensive domains remains challenging as intermediate steps are often not locally verifia
ArXiv cs.AI 📄 Paper 2w ago
Strategic Algorithmic Monoculture:Experimental Evidence from Coordination Games
arXiv:2604.09502v1 Announce Type: new Abstract: AI agents increasingly operate in multi-agent environments where outcomes depend on coordination. We distinguish
ArXiv cs.AI 📄 Paper 2w ago
On Divergence Measures for Training GFlowNets
arXiv:2410.09355v2 Announce Type: cross Abstract: Generative Flow Networks (GFlowNets) are amortized inference models designed to sample from unnormalized distr
ArXiv cs.AI 📄 Paper 2w ago
Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces
arXiv:2604.08362v1 Announce Type: cross Abstract: The emergence of Large Language Models (LLMs) has illuminated the potential for a general-purpose user simulat
ArXiv cs.AI 📄 Paper 2w ago
VerifAI: A Verifiable Open-Source Search Engine for Biomedical Question Answering
arXiv:2604.08549v1 Announce Type: cross Abstract: We introduce VerifAI, an open-source expert system for biomedical question answering that integrates retrieval