📰 ArXiv cs.AI

Articles from ArXiv cs.AI · 6,872 articles · Updated every 3 hours · View all reads

arXiv:2604.10506v1 Announce Type: new Abstract: Vision-Language Models (VLMs) have made significant strides in static image understanding but continue to face c

ArXiv cs.AI 📄 Paper 1w ago

Beyond Compliance: A Resistance-Informed Motivation Reasoning Framework for Challenging Psychological Client Simulation

arXiv:2604.10507v1 Announce Type: new Abstract: Psychological client simulators have emerged as a scalable solution for training and evaluating counselor traine

ArXiv cs.AI 📄 Paper 1w ago

Thinking Fast, Thinking Wrong: Intuitiveness Modulates LLM Counterfactual Reasoning in Policy Evaluation

arXiv:2604.10511v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for causal and counterfactual reasoning, yet their reliabilit

ArXiv cs.AI 📄 Paper 1w ago

Agent Mentor: Framing Agent Knowledge through Semantic Trajectory Analysis

arXiv:2604.10513v1 Announce Type: new Abstract: AI agent development relies heavily on natural language prompting to define agents' tasks, knowledge, and goals.

ArXiv cs.AI 📄 Paper 1w ago

From Perception to Planning: Evolving Ego-Centric Task-Oriented Spatiotemporal Reasoning via Curriculum Learning

arXiv:2604.10517v1 Announce Type: new Abstract: Modern vision-language models achieve strong performance in static perception, but remain limited in the complex

ArXiv cs.AI 📄 Paper 1w ago

Agent^2 RL-Bench: Can LLM Agents Engineer Agentic RL Post-Training?

arXiv:2604.10547v1 Announce Type: new Abstract: We introduce Agent^2 RL-Bench, a benchmark for evaluating agentic RL post-training -- whether LLM agents can aut

ArXiv cs.AI 📄 Paper 1w ago

Failure Ontology: A Lifelong Learning Framework for Blind Spot Detection and Resilience Design

arXiv:2604.10549v1 Announce Type: new Abstract: Personalized learning systems are almost universally designed around a single objective: help people acquire kno

ArXiv cs.AI 📄 Paper 1w ago

Working Paper: Towards Schema-based Learning from a Category-Theoretic Perspective

arXiv:2604.10589v1 Announce Type: new Abstract: We introduce a hierarchical categorical framework for Schema-Based Learning (SBL) structured across four interco

ArXiv cs.AI 📄 Paper 1w ago

Enhancing Cross-Problem Vehicle Routing via Federated Learning

arXiv:2604.10652v1 Announce Type: new Abstract: Vehicle routing problems (VRPs) constitute a core optimization challenge in modern logistics and supply chain ma

ArXiv cs.AI 📄 Paper 1w ago

Governed Reasoning for Institutional AI

arXiv:2604.10658v1 Announce Type: new Abstract: Institutional decisions -- regulatory compliance, clinical triage, prior authorization appeal -- require a diffe

ArXiv cs.AI 📄 Paper 1w ago

Preference-Agile Multi-Objective Optimization for Real-time Vehicle Dispatching

arXiv:2604.10664v1 Announce Type: new Abstract: Multi-objective optimization (MOO) has been widely studied in literature because of its versatility in human-cen

ArXiv cs.AI 📄 Paper 1w ago

Principles Do Not Apply Themselves: A Hermeneutic Perspective on AI Alignment

arXiv:2604.10673v1 Announce Type: new Abstract: AI alignment is often framed as the task of ensuring that an AI system follows a set of stated principles or hum

ArXiv cs.AI 📄 Paper 1w ago

FedRio: Personalized Federated Social Bot Detection via Cooperative Reinforced Contrastive Adversarial Distillation

arXiv:2604.10678v1 Announce Type: new Abstract: Social bot detection is critical to the stability and security of online social platforms. However, current stat

ArXiv cs.AI 📄 Paper 1w ago

Do LLMs Build Spatial World Models? Evidence from Grid-World Maze Tasks

arXiv:2604.10690v1 Announce Type: new Abstract: Foundation models have shown remarkable performance across diverse tasks, yet their ability to construct interna

ArXiv cs.AI 📄 Paper 1w ago

FACT-E: Causality-Inspired Evaluation for Trustworthy Chain-of-Thought Reasoning

arXiv:2604.10693v1 Announce Type: new Abstract: Chain-of-Thought (CoT) prompting has improved LLM reasoning, but models often generate explanations that appear

ArXiv cs.AI 📄 Paper 1w ago

Camyla: Scaling Autonomous Research in Medical Image Segmentation

arXiv:2604.10696v1 Announce Type: new Abstract: We present Camyla, a system for fully autonomous research within the scientific domain of medical image segmenta

ArXiv cs.AI 📄 Paper 1w ago

SciPredict: Can LLMs Predict the Outcomes of Scientific Experiments in Natural Sciences?

arXiv:2604.10718v1 Announce Type: new Abstract: Accelerating scientific discovery requires the identification of which experiments would yield the best outcomes

ArXiv cs.AI 📄 Paper 1w ago

Teaching Language Models How to Code Like Learners: Conversational Serialization for Student Simulation

arXiv:2604.10720v1 Announce Type: new Abstract: Artificial models that simulate how learners act and respond within educational systems are a promising tool for

ArXiv cs.AI 📄 Paper 1w ago

When More Thinking Hurts: Overthinking in LLM Test-Time Compute Scaling

arXiv:2604.10739v1 Announce Type: new Abstract: Scaling test-time compute through extended chains of thought has become a dominant paradigm for improving large

ArXiv cs.AI 📄 Paper 1w ago

Learning Preference-Based Objectives from Clinical Narratives for Sequential Treatment Decision-Making

arXiv:2604.10783v1 Announce Type: new Abstract: Designing reward functions remains a central challenge in reinforcement learning (RL) for healthcare, where outc

ArXiv cs.AI 📄 Paper 1w ago

TorchUMM: A Unified Multimodal Model Codebase for Evaluation, Analysis, and Post-training

arXiv:2604.10784v1 Announce Type: new Abstract: Recent advances in unified multimodal models (UMMs) have led to a proliferation of architectures capable of unde

ArXiv cs.AI 📄 Paper 1w ago

CheeseBench: Evaluating Large Language Models on Rodent Behavioral Neuroscience Paradigms

arXiv:2604.10825v1 Announce Type: new Abstract: We introduce CheeseBench, a benchmark that evaluates large language models (LLMs) on nine classical behavioral n

ArXiv cs.AI 📄 Paper 1w ago

Your Model Diversity, Not Method, Determines Reasoning Strategy

arXiv:2604.10827v1 Announce Type: new Abstract: Compute scaling for LLM reasoning requires allocating budget between exploring solution approaches ($breadth$) a

ArXiv cs.AI 📄 Paper 1w ago

A Benchmark for Gap and Overlap Analysis as a Test of KG Task Readiness

arXiv:2604.10853v1 Announce Type: new Abstract: Task-oriented evaluation of knowledge graph (KG) quality increasingly asks whether an ontology-based representat