📰 ArXiv cs.AI

Articles from ArXiv cs.AI · 5,060 articles · Updated every 3 hours · View all reads

All ⚡ AI Lessons (13758) ArXiv cs.AI Dev.to · FORUM WEB Dev.to AI Forbes Innovation OpenAI News Medium · Programming

Overhang Tower: Resource-Rational Adaptation in Sequential Physical Planning

arXiv:2604.09072v1 Announce Type: new Abstract: Humans effortlessly navigate the physical world by predicting how objects behave under gravity and contact force

ArXiv cs.AI 📄 Paper 5d ago

Camera Artist: A Multi-Agent Framework for Cinematic Language Storytelling Video Generation

arXiv:2604.09195v1 Announce Type: new Abstract: We propose Camera Artist, a multi-agent framework that models a real-world filmmaking workflow to generate narra

ArXiv cs.AI 📄 Paper 5d ago

DRBENCHER: Can Your Agent Identify the Entity, Retrieve Its Properties and Do the Math?

arXiv:2604.09251v1 Announce Type: new Abstract: Deep research agents increasingly interleave web browsing with multi-step computation, yet existing benchmarks e

ArXiv cs.AI 📄 Paper 5d ago

SAGE: A Service Agent Graph-guided Evaluation Benchmark

arXiv:2604.09285v1 Announce Type: new Abstract: The development of Large Language Models (LLMs) has catalyzed automation in customer service, yet benchmarking t

ArXiv cs.AI 📄 Paper 5d ago

Constraint-Aware Corrective Memory for Language-Based Drug Discovery Agents

arXiv:2604.09308v1 Announce Type: new Abstract: Large language models are making autonomous drug discovery agents increasingly feasible, but reliable success in

ArXiv cs.AI 📄 Paper 5d ago

Mind the Gap Between Spatial Reasoning and Acting! Step-by-Step Evaluation of Agents With Spatial-Gym

arXiv:2604.09338v1 Announce Type: new Abstract: Spatial reasoning is central to navigation and robotics, yet measuring model capabilities on these tasks remains

ArXiv cs.AI 📄 Paper 5d ago

HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?

arXiv:2604.09408v1 Announce Type: new Abstract: Frontier coding agents solve complex tasks when given complete context but collapse when specifications are inco

ArXiv cs.AI 📄 Paper 5d ago

Do We Really Need to Approach the Entire Pareto Front in Many-Objective Bayesian Optimisation?

arXiv:2604.09417v1 Announce Type: new Abstract: Many-objective optimisation, a subset of multi-objective optimisation, involves optimisation problems with more

ArXiv cs.AI 📄 Paper 5d ago

E3-TIR: Enhanced Experience Exploitation for Tool-Integrated Reasoning

arXiv:2604.09455v1 Announce Type: new Abstract: While Large Language Models (LLMs) have demonstrated significant potential in Tool-Integrated Reasoning (TIR), e

ArXiv cs.AI 📄 Paper 5d ago

Process Reward Agents for Steering Knowledge-Intensive Reasoning

arXiv:2604.09482v1 Announce Type: new Abstract: Reasoning in knowledge-intensive domains remains challenging as intermediate steps are often not locally verifia

ArXiv cs.AI 📄 Paper 5d ago

Strategic Algorithmic Monoculture:Experimental Evidence from Coordination Games

arXiv:2604.09502v1 Announce Type: new Abstract: AI agents increasingly operate in multi-agent environments where outcomes depend on coordination. We distinguish

ArXiv cs.AI 📄 Paper 5d ago

On Divergence Measures for Training GFlowNets

arXiv:2410.09355v2 Announce Type: cross Abstract: Generative Flow Networks (GFlowNets) are amortized inference models designed to sample from unnormalized distr

ArXiv cs.AI 📄 Paper 5d ago

Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces

arXiv:2604.08362v1 Announce Type: cross Abstract: The emergence of Large Language Models (LLMs) has illuminated the potential for a general-purpose user simulat

ArXiv cs.AI 📄 Paper 5d ago

VerifAI: A Verifiable Open-Source Search Engine for Biomedical Question Answering

arXiv:2604.08549v1 Announce Type: cross Abstract: We introduce VerifAI, an open-source expert system for biomedical question answering that integrates retrieval

ArXiv cs.AI 📄 Paper 5d ago

Unbiased Rectification for Sequential Recommender Systems Under Fake Orders

arXiv:2604.08550v1 Announce Type: cross Abstract: Fake orders pose increasing threats to sequential recommender systems by misleading recommendation results thr

ArXiv cs.AI 📄 Paper 5d ago

Automated Standardization of Legacy Biomedical Metadata Using an Ontology-Constrained LLM Agent

arXiv:2604.08552v1 Announce Type: cross Abstract: Scientific metadata are often incomplete and noncompliant with community standards, limiting dataset findabili

ArXiv cs.AI 📄 Paper 5d ago

GNN-as-Judge: Unleashing the Power of LLMs for Graph Learning with GNN Feedback

arXiv:2604.08553v1 Announce Type: cross Abstract: Large Language Models (LLMs) have shown strong performance on text-attributed graphs (TAGs) due to their super

ArXiv cs.AI 📄 Paper 5d ago

Drift and selection in LLM text ecosystems

arXiv:2604.08554v1 Announce Type: cross Abstract: The public text record -- the material from which both people and AI systems now learn -- is increasingly shap

ArXiv cs.AI 📄 Paper 5d ago

EMA Is Not All You Need: Mapping the Boundary Between Structure and Content in Recurrent Context

arXiv:2604.08556v1 Announce Type: cross Abstract: What exactly do efficient sequence models gain over simple temporal averaging? We use exponential moving avera

ArXiv cs.AI 📄 Paper 5d ago

Re-Mask and Redirect: Exploiting Denoising Irreversibility in Diffusion Language Models

arXiv:2604.08557v1 Announce Type: cross Abstract: Diffusion-based language models (dLLMs) generate text by iteratively denoising masked token sequences. We show

ArXiv cs.AI 📄 Paper 5d ago

WAND: Windowed Attention and Knowledge Distillation for Efficient Autoregressive Text-to-Speech Models

arXiv:2604.08558v1 Announce Type: cross Abstract: Recent decoder-only autoregressive text-to-speech (AR-TTS) models produce high-fidelity speech, but their memo

ArXiv cs.AI 📄 Paper 5d ago

Medical Reasoning with Large Language Models: A Survey and MR-Bench

arXiv:2604.08559v1 Announce Type: cross Abstract: Large language models (LLMs) have achieved strong performance on medical exam-style tasks, motivating growing

ArXiv cs.AI 📄 Paper 5d ago

Uncertainty Estimation for the Open-Set Text Classification systems

arXiv:2604.08560v1 Announce Type: cross Abstract: Accurate uncertainty estimation is essential for building robust and trustworthy recognition systems. In this

ArXiv cs.AI 📄 Paper 5d ago

Neural networks for Text-to-Speech evaluation

arXiv:2604.08562v1 Announce Type: cross Abstract: Ensuring that Text-to-Speech (TTS) systems deliver human-perceived quality at scale is a central challenge for