📰 ArXiv cs.AI

Articles from ArXiv cs.AI · 3,204 articles · Updated every 3 hours · View all reads

arXiv:2604.05681v1 Announce Type: new Abstract: We introduce LudoBench, a benchmark for evaluating LLM strategic reasoning in Ludo, a stochastic multi-agent boa

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 6d ago

QA-MoE: Towards a Continuous Reliability Spectrum with Quality-Aware Mixture of Experts for Robust Multimodal Sentiment Analysis

arXiv:2604.05704v1 Announce Type: new Abstract: Multimodal Sentiment Analysis (MSA) aims to infer human sentiment from textual, acoustic, and visual signals. In

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 6d ago

Can Large Language Models Reinvent Foundational Algorithms?

arXiv:2604.05716v1 Announce Type: new Abstract: LLMs have shown strong potential to advance scientific discovery. Whether they possess the capacity for foundati

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 6d ago

Emergent social transmission of model-based representations without inference

arXiv:2604.05777v1 Announce Type: new Abstract: How do people acquire rich, flexible knowledge about their environment from others despite limited cognitive cap

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 6d ago

Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents

arXiv:2604.05808v1 Announce Type: new Abstract: Large language model (LLM) agents have demonstrated strong capabilities in complex interactive decision-making t

ArXiv cs.AI 🛡️ AI Safety & Ethics 📄 Paper ⚡ AI Lesson 6d ago

Reciprocal Trust and Distrust in Artificial Intelligence Systems: The Hard Problem of Regulation

arXiv:2604.05826v1 Announce Type: new Abstract: Policy makers, scientists, and the public are increasingly confronted with thorny questions about the regulation

ArXiv cs.AI 💻 AI-Assisted Coding 📄 Paper ⚡ AI Lesson 6d ago

Vision-Guided Iterative Refinement for Frontend Code Generation

arXiv:2604.05839v1 Announce Type: new Abstract: Code generation with large language models often relies on multi-stage human-in-the-loop refinement, which is ef

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 6d ago

Deep Researcher Agent: An Autonomous Framework for 24/7 Deep Learning Experimentation with Zero-Cost Monitoring

arXiv:2604.05854v1 Announce Type: new Abstract: We present \textbf{Deep Researcher Agent}, an open-source framework that enables large language model (LLM) agen

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 6d ago

When Do We Need LLMs? A Diagnostic for Language-Driven Bandits

arXiv:2604.05859v1 Announce Type: new Abstract: We study Contextual Multi-Armed Bandits (CMABs) for non-episodic sequential decision making problems where the c

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 6d ago

JTON: A Token-Efficient JSON Superset with Zen Grid Tabular Encoding for Large Language Models

arXiv:2604.05865v1 Announce Type: new Abstract: When LLMs process structured data, the serialization format directly affects cost and context utilization. Stand

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 6d ago

Joint Knowledge Base Completion and Question Answering by Combining Large Language Models and Small Language Models

arXiv:2604.05875v1 Announce Type: new Abstract: Knowledge Bases (KBs) play a key role in various applications. As two representative KB-related tasks, knowledge

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 6d ago

HybridKV: Hybrid KV Cache Compression for Efficient Multimodal Large Language Model Inference

arXiv:2604.05887v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) have advanced unified reasoning over text, images, and videos, but thei

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 6d ago

Context-Value-Action Architecture for Value-Driven Large Language Model Agents

arXiv:2604.05939v1 Announce Type: new Abstract: Large Language Models (LLMs) have shown promise in simulating human behavior, yet existing agents often exhibit

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 6d ago

MARL-GPT: Foundation Model for Multi-Agent Reinforcement Learning

arXiv:2604.05943v1 Announce Type: new Abstract: Recent advances in multi-agent reinforcement learning (MARL) have demonstrated success in numerous challenging d

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 6d ago

Towards Trustworthy Report Generation: A Deep Research Agent with Progressive Confidence Estimation and Calibration

arXiv:2604.05952v1 Announce Type: new Abstract: As agent-based systems continue to evolve, deep research agents are capable of automatically generating research

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 6d ago

Beyond Compromise: Pareto-Lenient Consensus for Efficient Multi-Preference LLM Alignment

arXiv:2604.05965v1 Announce Type: new Abstract: Transcending the single-preference paradigm, aligning LLMs with diverse human values is pivotal for robust deplo

ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 6d ago

Flowr -- Scaling Up Retail Supply Chain Operations Through Agentic AI in Large Scale Supermarket Chains

arXiv:2604.05987v1 Announce Type: new Abstract: Retail supply chain operations in supermarket chains involve continuous, high-volume manual workflows spanning d

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 6d ago

Epistemic Blinding: An Inference-Time Protocol for Auditing Prior Contamination in LLM-Assisted Analysis

arXiv:2604.06013v1 Announce Type: new Abstract: This paper presents epistemic blinding in the context of an agentic system that uses large language models to re

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 6d ago

How LLMs Follow Instructions: Skillful Coordination, Not a Universal Mechanism

arXiv:2604.06015v1 Announce Type: new Abstract: Instruction tuning is commonly assumed to endow language models with a domain-general ability to follow instruct

ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 6d ago

Artificial Intelligence and the Structure of Mathematics

arXiv:2604.06107v1 Announce Type: new Abstract: Recent progress in artificial intelligence (AI) is unlocking transformative capabilities for mathematics. There

ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 6d ago

ACE-Bench: Agent Configurable Evaluation with Scalable Horizons and Controllable Difficulty under Lightweight Environments

arXiv:2604.06111v1 Announce Type: new Abstract: Existing Agent benchmarks suffer from two critical limitations: high environment interaction overhead (up to 41\

ArXiv cs.AI 🤖 AI Agents & Automation 📄 Paper ⚡ AI Lesson 6d ago

Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents

arXiv:2604.06132v1 Announce Type: new Abstract: Large language models are increasingly deployed as autonomous agents executing multi-step workflows in real-worl

ArXiv cs.AI 📄 Paper ⚡ AI Lesson 6d ago

Contextuality as an External Bookkeeping Cost under Fixed Shared-State Semantics

arXiv:2601.20167v2 Announce Type: cross Abstract: Contextuality is a central feature distinguishing quantum from classical probability theories, but its operati

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 6d ago

Web Retrieval-Aware Chunking (W-RAC) for Efficient and Cost-Effective Retrieval-Augmented Generation Systems

arXiv:2604.04936v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) systems critically depend on effective document chunking strategies to ba