📰 ArXiv cs.AI

Articles from ArXiv cs.AI · 1,754 articles · Updated every 3 hours · View all news

All ⚡ AI Lessons (5071) ArXiv cs.AI OpenAI News Hugging Face Blog Forbes Innovation Dev.to AI The Verge

ArXiv cs.AI 📄 Paper ⚡ AI Lesson 4d ago

The DMA Streaming Framework: Kernel-Level Buffer Orchestration for High-Performance AI Data Paths

arXiv:2603.10030v2 Announce Type: replace-cross Abstract: AI transport libraries move bytes efficiently, but they commonly assume that buffers are already corre

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 4d ago

Evaluation format, not model capability, drives triage failure in the assessment of consumer health AI

arXiv:2603.11413v3 Announce Type: replace-cross Abstract: Ramaswamy et al. reported in Nature Medicine that ChatGPT Health under-triages 51.6% of emergencies, c

ArXiv cs.AI 📄 Paper ⚡ AI Lesson 4d ago

Theory of Dynamic Adaptive Coordination

arXiv:2603.11560v2 Announce Type: replace-cross Abstract: This paper develops a dynamical theory of adaptive coordination governed by persistent environmental m

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 4d ago

UtilityMax Prompting: A Formal Framework for Multi-Objective Large Language Model Optimization

arXiv:2603.11583v2 Announce Type: replace-cross Abstract: The success of a Large Language Model (LLM) task depends heavily on its prompt. Most use-cases specify

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 4d ago

SemBench: A Universal Semantic Framework for LLM Evaluation

arXiv:2603.11687v2 Announce Type: replace-cross Abstract: Recent progress in Natural Language Processing (NLP) has been driven by the emergence of Large Languag

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 4d ago

Seeking Physics in Diffusion Noise

arXiv:2603.14294v2 Announce Type: replace-cross Abstract: Do video diffusion models encode signals predictive of physical plausibility? We probe intermediate de

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 4d ago

360{\deg} Image Perception with MLLMs: A Comprehensive Benchmark and a Training-Free Method

arXiv:2603.16179v2 Announce Type: replace-cross Abstract: Multimodal Large Language Models (MLLMs) have shown impressive abilities in understanding and reasonin

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 4d ago

When Should a Robot Think? Resource-Aware Reasoning via Reinforcement Learning for Embodied Robotic Decision-Making

arXiv:2603.16673v2 Announce Type: replace-cross Abstract: Embodied robotic systems increasingly rely on large language model (LLM)-based agents to support high-

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 4d ago

P^2O: Joint Policy and Prompt Optimization

arXiv:2603.21877v2 Announce Type: replace-cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful paradigm for enhancing

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 5d ago

PLDR-LLMs Reason At Self-Organized Criticality

arXiv:2603.23539v1 Announce Type: new Abstract: We show that PLDR-LLMs pretrained at self-organized criticality exhibit reasoning at inference time. The charact

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 5d ago

Environment Maps: Structured Environmental Representations for Long-Horizon Agents

arXiv:2603.23610v2 Announce Type: new Abstract: Although large language models (LLMs) have advanced rapidly, robust automation of complex software workflows rem

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 5d ago

Evaluating a Multi-Agent Voice-Enabled Smart Speaker for Care Homes: A Safety-Focused Framework

arXiv:2603.23625v1 Announce Type: new Abstract: Artificial intelligence (AI) is increasingly being explored in health and social care to reduce administrative w

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 5d ago

Can LLM Agents Be CFOs? A Benchmark for Resource Allocation in Dynamic Enterprise Environments

arXiv:2603.23638v1 Announce Type: new Abstract: Large language models (LLMs) have enabled agentic systems that can reason, plan, and act across complex tasks, b

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 5d ago

GTO Wizard Benchmark

arXiv:2603.23660v1 Announce Type: new Abstract: We introduce GTO Wizard Benchmark, a public API and standardized evaluation framework for benchmarking algorithm

ArXiv cs.AI 📄 Paper ⚡ AI Lesson 5d ago

Grounding Vision and Language to 3D Masks for Long-Horizon Box Rearrangement

arXiv:2603.23676v1 Announce Type: new Abstract: We study long-horizon planning in 3D environments from under-specified natural-language goals using only visual

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 5d ago

LLMs Do Not Grade Essays Like Humans

arXiv:2603.23714v1 Announce Type: new Abstract: Large language models have recently been proposed as tools for automated essay scoring, but their agreement with

ArXiv cs.AI 📄 Paper ⚡ AI Lesson 5d ago

Efficient Benchmarking of AI Agents

arXiv:2603.23749v1 Announce Type: new Abstract: Evaluating AI agents on comprehensive benchmarks is expensive because each evaluation requires interactive rollo

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 5d ago

Learning-guided Prioritized Planning for Lifelong Multi-Agent Path Finding in Warehouse Automation

arXiv:2603.23838v1 Announce Type: new Abstract: Lifelong Multi-Agent Path Finding (MAPF) is critical for modern warehouse automation, which requires multiple ro

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 5d ago

VehicleMemBench: An Executable Benchmark for Multi-User Long-Term Memory in In-Vehicle Agents

arXiv:2603.23840v1 Announce Type: new Abstract: With the growing demand for intelligent in-vehicle experiences, vehicle-based agents are evolving from simple as

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 5d ago

SCoOP: Semantic Consistent Opinion Pooling for Uncertainty Quantification in Multiple Vision-Language Model Systems

arXiv:2603.23853v1 Announce Type: new Abstract: Combining multiple Vision-Language Models (VLMs) can enhance multimodal reasoning and robustness, but aggregatin

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 5d ago

When AI output tips to bad but nobody notices: Legal implications of AI's mistakes

arXiv:2603.23857v1 Announce Type: new Abstract: The adoption of generative AI across commercial and legal professions offers dramatic efficiency gains -- yet fo

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 5d ago

The DeepXube Software Package for Solving Pathfinding Problems with Learned Heuristic Functions and Search

arXiv:2603.23873v1 Announce Type: new Abstract: DeepXube is a free and open-source Python package and command-line tool that seeks to automate the solution of p

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 5d ago

DUPLEX: Agentic Dual-System Planning via LLM-Driven Information Extraction

arXiv:2603.23909v1 Announce Type: new Abstract: While Large Language Models (LLMs) provide semantic flexibility for robotic task planning, their susceptibility

ArXiv cs.AI 🧠 Large Language Models 📄 Paper ⚡ AI Lesson 5d ago

AnalogAgent: Self-Improving Analog Circuit Design Automation with LLM Agents

arXiv:2603.23910v1 Announce Type: new Abstract: Recent advances in large language models (LLMs) suggest strong potential for automating analog circuit design. Y