📰 ArXiv cs.AI

Articles from ArXiv cs.AI · 4,742 articles · Updated every 3 hours · View all reads

All ⚡ AI Lessons (12205) ArXiv cs.AI Dev.to · FORUM WEB Dev.to AI Forbes Innovation OpenAI News Hugging Face Blog

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

arXiv:2604.13016v1 Announce Type: cross Abstract: On-policy distillation (OPD) has become a core technique in the post-training of large language models, yet it

ArXiv cs.AI 📄 Paper 1d ago

Representation geometry shapes task performance in vision-language modeling for CT enterography

arXiv:2604.13021v1 Announce Type: cross Abstract: Computed tomography (CT) enterography is a primary imaging modality for assessing inflammatory bowel disease (

ArXiv cs.AI 📄 Paper 1d ago

Visual Preference Optimization with Rubric Rewards

arXiv:2604.13029v1 Announce Type: cross Abstract: The effectiveness of Direct Preference Optimization (DPO) depends on preference data that reflect the quality

ArXiv cs.AI 📄 Paper 1d ago

SmellNet: A Large-scale Dataset for Real-world Smell Recognition

arXiv:2506.00239v5 Announce Type: replace Abstract: The ability of AI to sense and identify various substances based on their smell alone can have profound impa

ArXiv cs.AI 📄 Paper 1d ago

Fragile Preferences: A Deep Dive Into Order Effects in Large Language Models

arXiv:2506.14092v3 Announce Type: replace Abstract: Large language models (LLMs) are increasingly deployed in decision-support systems for high-stakes domains s

ArXiv cs.AI 📄 Paper 1d ago

League of LLMs: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models

arXiv:2507.22359v4 Announce Type: replace Abstract: Although large language models (LLMs) have shown exceptional capabilities across a wide range of tasks, reli

ArXiv cs.AI 📄 Paper 1d ago

Synthetic POMDPs to Challenge Memory-Augmented RL: Memory Demand Structure Modeling

arXiv:2508.04282v3 Announce Type: replace Abstract: Recent benchmarks for memory-augmented reinforcement learning (RL) have introduced partially observable Mark

ArXiv cs.AI 📄 Paper 1d ago

Mantis: A Foundation Model for Mechanistic Disease Forecasting

arXiv:2508.12260v5 Announce Type: replace Abstract: Infectious disease forecasting in novel outbreaks or low-resource settings is hampered by the need for large

ArXiv cs.AI 📄 Paper 1d ago

Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training

arXiv:2509.25758v2 Announce Type: replace Abstract: The remarkable capabilities of modern large reasoning models are largely unlocked through post-training tech

ArXiv cs.AI 📄 Paper 1d ago

ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack

arXiv:2509.25843v2 Announce Type: replace Abstract: Large language models (LLMs), despite being safety-aligned, exhibit brittle refusal behaviors that can be ci

ArXiv cs.AI 📄 Paper 1d ago

The Stackelberg Speaker: Optimizing Persuasive Communication in Social Deduction Games

arXiv:2510.09087v2 Announce Type: replace Abstract: Large language model (LLM) agents have shown remarkable progress in social deduction games (SDGs). However,

ArXiv cs.AI 📄 Paper 1d ago

Mixed-Density Diffuser: Efficient Planning with Non-Uniform Temporal Resolution

arXiv:2510.23026v5 Announce Type: replace Abstract: Recent studies demonstrate that diffusion planners benefit from sparse-step planning over single-step planni

ArXiv cs.AI 📄 Paper 1d ago

JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence

arXiv:2510.23538v2 Announce Type: replace Abstract: The scope of neural code intelligence is rapidly expanding beyond text-based source code to encompass the ri

ArXiv cs.AI 📄 Paper 1d ago

Does RLVR Extend Reasoning Boundaries? Investigating Capability Expansion in Vision-Language Models

arXiv:2511.00710v4 Announce Type: replace Abstract: Recent studies posit that Reinforcement Learning with Verifiable Rewards (RLVR) primarily amplifies behavior

ArXiv cs.AI 📄 Paper 1d ago

DecompSR: A dataset for decomposed analyses of compositional multihop spatial reasoning

arXiv:2511.02627v2 Announce Type: replace Abstract: We introduce DecompSR, decomposed spatial reasoning, a large benchmark dataset (over 5m datapoints) and gene

ArXiv cs.AI 📄 Paper 1d ago

Dataset Safety in Autonomous Driving: Requirements, Risks, and Assurance

arXiv:2511.08439v2 Announce Type: replace Abstract: Dataset integrity is fundamental to the safety and reliability of AI systems, especially in autonomous drivi

ArXiv cs.AI 📄 Paper 1d ago

Learning the Value of Value Learning

arXiv:2511.17714v5 Announce Type: replace Abstract: Standard decision frameworks address uncertainty about facts but assume fixed options and values. We extend

ArXiv cs.AI 📄 Paper 1d ago

A Benchmark for Evaluating Outcome-Driven Constraint Violations in Autonomous AI Agents

arXiv:2512.20798v4 Announce Type: replace Abstract: As autonomous AI agents are deployed in high-stakes environments, ensuring their safety has become a paramou

ArXiv cs.AI 📄 Paper 1d ago

No More Stale Feedback: Co-Evolving Critics for Open-World Agent Learning

arXiv:2601.06794v2 Announce Type: replace Abstract: Critique-guided reinforcement learning (RL) has emerged as a powerful paradigm for training LLM agents by au

ArXiv cs.AI 📄 Paper 1d ago

PrivacyReasoner: Can LLM Emulate a Human-like Privacy Mind?

arXiv:2601.09152v2 Announce Type: replace Abstract: Prior work on LLM-based privacy focuses on norm judgment over synthetic vignettes, rather than how people th

ArXiv cs.AI 📄 Paper 1d ago

LatentRefusal: Latent-Signal Refusal for Unanswerable Text-to-SQL Queries

arXiv:2601.10398v3 Announce Type: replace Abstract: In LLM-based text-to-SQL systems, unanswerable and underspecified user queries may generate not only incorre

ArXiv cs.AI 📄 Paper 1d ago

WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents

arXiv:2603.05044v2 Announce Type: replace Abstract: Current paradigms for training GUI agents are fundamentally limited by a reliance on either unsafe, non-repr

ArXiv cs.AI 📄 Paper 1d ago

WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces

arXiv:2603.05295v3 Announce Type: replace Abstract: We introduce WebChain, the largest open-source dataset of human-annotated trajectories on real-world website

ArXiv cs.AI 📄 Paper 1d ago

A Survey of Multimodal Mathematical Reasoning: From Perception, Alignment to Reasoning

arXiv:2603.08291v3 Announce Type: replace Abstract: Multimodal Mathematical Reasoning (MMR) has recently attracted increasing attention for its capability to so