📰 ArXiv cs.AI

Articles from ArXiv cs.AI · 3,539 articles · Updated every 3 hours · View all reads

All ⚡ AI Lessons (10680) ArXiv cs.AI Dev.to · FORUM WEB Dev.to AI Forbes Innovation OpenAI News Hugging Face Blog

Listener-Rewarded Thinking in VLMs for Image Preferences

arXiv:2506.22832v3 Announce Type: replace-cross Abstract: Training robust and generalizable reward models for human visual preferences is essential for aligning

ArXiv cs.AI 📄 Paper 22h ago

Provable Post-Training Quantization: Theoretical Analysis of OPTQ and Qronos

arXiv:2508.04853v2 Announce Type: replace-cross Abstract: Post-training quantization (PTQ) has become a crucial tool for reducing the memory and compute costs o

ArXiv cs.AI 📄 Paper 22h ago

VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding

arXiv:2508.06869v4 Announce Type: replace-cross Abstract: Multimodal large language models (MLLMs) demonstrate exceptional performance in vision-language tasks,

ArXiv cs.AI 📄 Paper 22h ago

Mitigating Domain Drift in Multi Species Segmentation with DINOv2: A Cross-Domain Evaluation in Herbicide Research Trials

arXiv:2508.07514v4 Announce Type: replace-cross Abstract: Reliable plant species and damage segmentation for herbicide field research trials requires models tha

ArXiv cs.AI 📄 Paper 22h ago

Investigating Multimodal Large Language Models to Support Usability Evaluation

arXiv:2508.16165v2 Announce Type: replace-cross Abstract: Usability evaluation is an essential method to support the design of effective and intuitive user inte

ArXiv cs.AI 📄 Paper 22h ago

AR-KAN: Autoregressive-Weight-Enhanced Kolmogorov-Arnold Network for Time Series Forecasting

arXiv:2509.02967v3 Announce Type: replace-cross Abstract: Traditional neural networks struggle to capture the spectral structure of complex signals. Fourier neu

ArXiv cs.AI 📄 Paper 22h ago

STCast: Adaptive Boundary Alignment for Global and Regional Weather Forecasting

arXiv:2509.25210v3 Announce Type: replace-cross Abstract: To gain finer regional forecasts, many works have explored the regional integration from the global at

ArXiv cs.AI 📄 Paper 22h ago

On-the-Fly Adaptation to Quantization: Configuration-Aware LoRA for Efficient Fine-Tuning of Quantized LLMs

arXiv:2509.25214v3 Announce Type: replace-cross Abstract: As increasingly large pre-trained models are released, deploying them on edge devices for privacy-pres

ArXiv cs.AI 📄 Paper 22h ago

Adaptive Planning for Multi-Attribute Controllable Summarization with Monte Carlo Tree Search

arXiv:2509.26435v2 Announce Type: replace-cross Abstract: Controllable summarization moves beyond generic outputs toward human-aligned summaries guided by speci

ArXiv cs.AI 📄 Paper 22h ago

Traj2Action: A Co-Denoising Framework for Trajectory-Guided Human-to-Robot Skill Transfer

arXiv:2510.00491v3 Announce Type: replace-cross Abstract: Learning diverse manipulation skills for real-world robots is severely bottlenecked by the reliance on

ArXiv cs.AI 📄 Paper 22h ago

Unmasking Puppeteers: Leveraging Biometric Leakage to Disarm Impersonation in AI-based Videoconferencing

arXiv:2510.03548v3 Announce Type: replace-cross Abstract: AI-based talking-head videoconferencing systems reduce bandwidth by sending a compact pose-expression

ArXiv cs.AI 📄 Paper 22h ago

Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels

arXiv:2510.06499v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) have achieved remarkable success through imitation learning on vast text

ArXiv cs.AI 📄 Paper 22h ago

Dejavu: Towards Experience Feedback Learning for Embodied Intelligence

arXiv:2510.10181v3 Announce Type: replace-cross Abstract: Embodied agents face a fundamental limitation: once deployed in real-world environments, they cannot e

ArXiv cs.AI 📄 Paper 22h ago

RESample: A Robust Data Augmentation Framework via Exploratory Sampling for Robotic Manipulation

arXiv:2510.17640v3 Announce Type: replace-cross Abstract: Vision-Language-Action (VLA) models have demonstrated remarkable performance on complex tasks through

ArXiv cs.AI 📄 Paper 22h ago

LLM4Delay: Flight Delay Prediction via Cross-Modality Adaptation of Large Language Models and Aircraft Trajectory Representation

arXiv:2510.23636v3 Announce Type: replace-cross Abstract: Flight delay prediction has become a key focus in air traffic management (ATM), as delays reflect inef

ArXiv cs.AI 📄 Paper 22h ago

How Similar Are Grokipedia and Wikipedia? A Multi-Dimensional Textual and Structural Comparison

arXiv:2510.26899v5 Announce Type: replace-cross Abstract: The launch of Grokipedia, an AI-generated encyclopedia developed by Elon Musk's xAI, was presented as

ArXiv cs.AI 📄 Paper 22h ago

EGMOF: Efficient Generation of Metal-Organic Frameworks Using a Hybrid Diffusion-Transformer Architecture

arXiv:2511.03122v2 Announce Type: replace-cross Abstract: Designing materials with targeted properties remains challenging due to the vastness of chemical space

ArXiv cs.AI 📄 Paper 22h ago

Evolutionary Optimization Trumps Adam Optimization on Embedding Space Exploration

arXiv:2511.03913v2 Announce Type: replace-cross Abstract: Deep diffusion models have revolutionized image generation by producing high-quality outputs. However,

ArXiv cs.AI 📄 Paper 22h ago

Structured Uncertainty guided Clarification for LLM Agents

arXiv:2511.08798v2 Announce Type: replace-cross Abstract: LLM agents with tool-calling capabilities often fail when user instructions are ambiguous or incomplet

ArXiv cs.AI 📄 Paper 22h ago

Commanding Humanoid by Free-form Language: A Large Language Action Model with Unified Motion Vocabulary

arXiv:2511.22963v2 Announce Type: replace-cross Abstract: Enabling humanoid robots to follow free-form language commands is critical for seamless human-robot in

ArXiv cs.AI 📄 Paper 22h ago

Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding

arXiv:2511.23071v2 Announce Type: replace-cross Abstract: Reading scene text, that is, text appearing in images, has numerous application areas, including assis

ArXiv cs.AI 📄 Paper 22h ago

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

arXiv:2512.02231v2 Announce Type: replace-cross Abstract: Multimodal large language models (MLLMs) are expected to jointly interpret vision, audio, and language

ArXiv cs.AI 📄 Paper 22h ago

From Navigation to Refinement: Revealing the Two-Stage Nature of Flow-based Diffusion Models through Oracle Velocity

arXiv:2512.02826v3 Announce Type: replace-cross Abstract: Flow-based diffusion models have emerged as a leading paradigm for training generative models across i

ArXiv cs.AI 📄 Paper 22h ago

Out-of-the-box: Black-box Causal Attacks on Object Detectors

arXiv:2512.03730v2 Announce Type: replace-cross Abstract: Adversarial perturbations are a useful way to expose vulnerabilities in object detectors. Existing per