📰 ArXiv cs.AI
Articles from ArXiv cs.AI · 1,258 articles · Updated every 3 hours · View all news
All
⚡ AI Lessons (4987)
ArXiv cs.AIOpenAI NewsHugging Face BlogForbes InnovationDev.to AIWeaviate Blog
ArXiv cs.AI
📄 Paper
⚡ AI Lesson
5d ago
Efficient Benchmarking of AI Agents
arXiv:2603.23749v1 Announce Type: new Abstract: Evaluating AI agents on comprehensive benchmarks is expensive because each evaluation requires interactive rollo
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
5d ago
Learning-guided Prioritized Planning for Lifelong Multi-Agent Path Finding in Warehouse Automation
arXiv:2603.23838v1 Announce Type: new Abstract: Lifelong Multi-Agent Path Finding (MAPF) is critical for modern warehouse automation, which requires multiple ro
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
5d ago
VehicleMemBench: An Executable Benchmark for Multi-User Long-Term Memory in In-Vehicle Agents
arXiv:2603.23840v1 Announce Type: new Abstract: With the growing demand for intelligent in-vehicle experiences, vehicle-based agents are evolving from simple as
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
5d ago
SCoOP: Semantic Consistent Opinion Pooling for Uncertainty Quantification in Multiple Vision-Language Model Systems
arXiv:2603.23853v1 Announce Type: new Abstract: Combining multiple Vision-Language Models (VLMs) can enhance multimodal reasoning and robustness, but aggregatin
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
5d ago
When AI output tips to bad but nobody notices: Legal implications of AI's mistakes
arXiv:2603.23857v1 Announce Type: new Abstract: The adoption of generative AI across commercial and legal professions offers dramatic efficiency gains -- yet fo
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
5d ago
The DeepXube Software Package for Solving Pathfinding Problems with Learned Heuristic Functions and Search
arXiv:2603.23873v1 Announce Type: new Abstract: DeepXube is a free and open-source Python package and command-line tool that seeks to automate the solution of p
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
5d ago
DUPLEX: Agentic Dual-System Planning via LLM-Driven Information Extraction
arXiv:2603.23909v1 Announce Type: new Abstract: While Large Language Models (LLMs) provide semantic flexibility for robotic task planning, their susceptibility
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
5d ago
AnalogAgent: Self-Improving Analog Circuit Design Automation with LLM Agents
arXiv:2603.23910v1 Announce Type: new Abstract: Recent advances in large language models (LLMs) suggest strong potential for automating analog circuit design. Y
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
5d ago
From Pixels to Digital Agents: An Empirical Study on the Taxonomy and Technological Trends of Reinforcement Learning Environments
arXiv:2603.23964v1 Announce Type: new Abstract: The remarkable progress of reinforcement learning (RL) is intrinsically tied to the environments used to train a
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
5d ago
Language-Grounded Multi-Agent Planning for Personalized and Fair Participatory Urban Sensing
arXiv:2603.24014v1 Announce Type: new Abstract: Participatory urban sensing leverages human mobility for large-scale urban data collection, yet existing methods
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
5d ago
ELITE: Experiential Learning and Intent-Aware Transfer for Self-improving Embodied Agents
arXiv:2603.24018v1 Announce Type: new Abstract: Vision-language models (VLMs) have shown remarkable general capabilities, yet embodied agents built on them fail
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
5d ago
Enhanced Mycelium of Thought (EMoT): A Bio-Inspired Hierarchical Reasoning Architecture with Strategic Dormancy and Mnemonic Encoding
arXiv:2603.24065v1 Announce Type: new Abstract: Current prompting paradigms for large language models (LLMs), including Chain-of-Thought (CoT) and Tree-of-Thoug
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
5d ago
Bridging the Evaluation Gap: Standardized Benchmarks for Multi-Objective Search
arXiv:2603.24084v1 Announce Type: new Abstract: Empirical evaluation in multi-objective search (MOS) has historically suffered from fragmentation, relying on he
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
5d ago
AI-Supervisor: Autonomous AI Research Supervision via a Persistent Research World Model
arXiv:2603.24402v2 Announce Type: new Abstract: Existing automated research systems operate as stateless, linear pipelines -- generating outputs without maintai
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
5d ago
Multi-Agent Reasoning with Consistency Verification Improves Uncertainty Calibration in Medical MCQA
arXiv:2603.24481v1 Announce Type: new Abstract: Miscalibrated confidence scores are a practical obstacle to deploying AI in clinical settings. A model that is a
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
5d ago
From Liar Paradox to Incongruent Sets: A Normal Form for Self-Reference
arXiv:2603.24527v1 Announce Type: new Abstract: We introduce incongruent normal form (INF), a structural representation for self-referential semantic sentences.
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
5d ago
Completeness of Unbounded Best-First Minimax and Descent Minimax
arXiv:2603.24572v1 Announce Type: new Abstract: In this article, we focus on search algorithms for two-player perfect information games, whose objective is to d
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
5d ago
The Stochastic Gap: A Markovian Framework for Pre-Deployment Reliability and Oversight-Cost Auditing in Agentic Artificial Intelligence
arXiv:2603.24582v1 Announce Type: new Abstract: Agentic artificial intelligence (AI) in organizations is a sequential decision problem constrained by reliabilit
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
5d ago
Inspection and Control of Self-Generated-Text Recognition Ability in Llama3-8b-Instruct
arXiv:2410.02064v3 Announce Type: cross Abstract: It has been reported that LLMs can recognize their own writing. As this has potential implications for AI safe
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
5d ago
Mitigating Many-Shot Jailbreaking
arXiv:2504.09604v3 Announce Type: cross Abstract: Many-shot jailbreaking (MSJ) is an adversarial technique that exploits the long context windows of modern LLMs
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
5d ago
Evidence for Limited Metacognition in LLMs
arXiv:2509.21545v2 Announce Type: cross Abstract: The possibility of LLM self-awareness and even sentience is gaining increasing public attention and has major
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
5d ago
Leveraging Computerized Adaptive Testing for Cost-effective Evaluation of Large Language Models in Medical Benchmarking
arXiv:2603.23506v1 Announce Type: cross Abstract: The rapid proliferation of large language models (LLMs) in healthcare creates an urgent need for scalable and
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
5d ago
Beyond Masks: Efficient, Flexible Diffusion Language Models via Deletion-Insertion Processes
arXiv:2603.23507v1 Announce Type: cross Abstract: While Masked Diffusion Language Models (MDLMs) relying on token masking and unmasking have shown promise in la
ArXiv cs.AI
🧠 Large Language Models
📄 Paper
⚡ AI Lesson
5d ago
Internal Safety Collapse in Frontier Large Language Models
arXiv:2603.23509v1 Announce Type: cross Abstract: This work identifies a critical failure mode in frontier large language models (LLMs), which we term Internal
DeepCamp AI