📰 ArXiv cs.AI

74 articles · Updated every 3 hours · View all reads

arXiv:2604.05070v1 Announce Type: new Abstract: Simulation is essential for autonomous driving, yet current frameworks often model vehicles as rigid assets and

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago

CRFT: Consistent-Recurrent Feature Flow Transformer for Cross-Modal Image Registration

arXiv:2604.05689v1 Announce Type: cross Abstract: We present Consistent-Recurrent Feature Flow Transformer (CRFT), a unified coarse-to-fine framework based on f

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago

A reconfigurable smart camera implementation for jet flames characterization based on an optimized segmentation model

arXiv:2604.03267v1 Announce Type: cross Abstract: In this work we present a novel framework for fire safety management in industrial settings through the implem

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago

InCaRPose: In-Cabin Relative Camera Pose Estimation Model and Dataset

arXiv:2604.03814v1 Announce Type: cross Abstract: Camera extrinsic calibration is a fundamental task in computer vision. However, precise relative pose estimati

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago

HOIGS: Human-Object Interaction Gaussian Splatting

arXiv:2604.04016v1 Announce Type: cross Abstract: Reconstructing dynamic scenes with complex human-object interactions is a fundamental challenge in computer vi

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago

Pickalo: Leveraging 6D Pose Estimation for Low-Cost Industrial Bin Picking

arXiv:2604.04690v1 Announce Type: cross Abstract: Bin picking in real industrial environments remains challenging due to severe clutter, occlusions, and the hig

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago

ContextDrag: Precise Drag-Based Image Editing via Context-Preserving Token Injection and Position-Aligned Attention

arXiv:2512.08477v2 Announce Type: replace-cross Abstract: Drag-based image editing enables intuitive visual manipulation through point-based drag operations. Ex

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago

PaveBench: A Versatile Benchmark for Pavement Distress Perception and Interactive Vision-Language Analysis

arXiv:2604.02804v1 Announce Type: cross Abstract: Pavement condition assessment is essential for road safety and maintenance. Existing research has made signifi

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago

NavCrafter: Exploring 3D Scenes from a Single Image

arXiv:2604.02828v1 Announce Type: cross Abstract: Creating flexible 3D scenes from a single image is vital when direct 3D data acquisition is costly or impracti

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago

DePT3R: Joint Dense Point Tracking and 3D Reconstruction of Dynamic Scenes in a Single Forward Pass

arXiv:2512.13122v2 Announce Type: replace-cross Abstract: Current methods for dense 3D point tracking in dynamic scenes typically rely on pairwise processing, r

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago

Cross-Camera Distracted Driver Classification through Feature Disentanglement and Contrastive Learning

arXiv:2411.13181v3 Announce Type: replace-cross Abstract: The classification of distracted drivers is pivotal for ensuring safe driving. Previous studies demons

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago

From Skeletons to Semantics: Design and Deployment of a Hybrid Edge-Based Action Detection System for Public Safety

arXiv:2603.29777v1 Announce Type: cross Abstract: Public spaces such as transport hubs, city centres, and event venues require timely and reliable detection of

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago

End-to-End Image Compression with Segmentation Guided Dual Coding for Wind Turbines

arXiv:2603.29927v1 Announce Type: cross Abstract: Transferring large volumes of high-resolution images during wind turbine inspections introduces a bottleneck i

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago

Streaming 4D Visual Geometry Transformer

arXiv:2507.11539v2 Announce Type: replace-cross Abstract: Perceiving and reconstructing 3D geometry from videos is a fundamental yet challenging computer vision

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago

An End-to-end Flight Control Network for High-speed UAV Obstacle Avoidance based on Event-Depth Fusion

arXiv:2603.27181v1 Announce Type: cross Abstract: Achieving safe, high-speed autonomous flight in complex environments with static, dynamic, or mixed obstacles

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago

Guided Lensless Polarization Imaging

arXiv:2603.27357v1 Announce Type: cross Abstract: Polarization imaging captures the polarization state of light, revealing information invisible to the human ey

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago

Dynamic LIBRAS Gesture Recognition via CNN over Spatiotemporal Matrix Representation

arXiv:2603.25863v1 Announce Type: cross Abstract: This paper proposes a method for dynamic hand gesture recognition based on the composition of two models: the

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago

DenseSwinV2: Channel Attentive Dual Branch CNN Transformer Learning for Cassava Leaf Disease Classification

arXiv:2603.25935v1 Announce Type: cross Abstract: This work presents a new Hybrid Dense SwinV2, a two-branch framework that jointly leverages densely connected

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago

Collision-Aware Vision-Language Learning for End-to-End Driving with Multimodal Infraction Datasets

arXiv:2603.25946v1 Announce Type: cross Abstract: High infraction rates remain the primary bottleneck for end-to-end (E2E) autonomous driving, as evidenced by t

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago

VLAgeBench: Benchmarking Large Vision-Language Models for Zero-Shot Human Age Estimation

arXiv:2603.26015v1 Announce Type: cross Abstract: Human age estimation from facial images represents a challenging computer vision task with significant applica

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago

R-PGA: Robust Physical Adversarial Camouflage Generation via Relightable 3D Gaussian Splatting

arXiv:2603.26067v1 Announce Type: cross Abstract: Physical adversarial camouflage poses a severe security threat to autonomous driving systems by mapping advers

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago

An Object Web Seminar: A Retrospective on a Technical Dialogue Still Reverbarating

arXiv:2603.26203v1 Announce Type: cross Abstract: Technology change happens quickly such that new trends tend to crowd out the focus on what was new just yester

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago

GeoGuide: Hierarchical Geometric Guidance for Open-Vocabulary 3D Semantic Segmentation

arXiv:2603.26260v1 Announce Type: cross Abstract: Open-vocabulary 3D semantic segmentation aims to segment arbitrary categories beyond the training set. Existin

ArXiv cs.AI 👁️ Computer Vision 📄 Paper ⚡ AI Lesson 2mo ago

Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

arXiv:2603.26551v1 Announce Type: cross Abstract: Vision backbone networks play a central role in modern computer vision. Enhancing their efficiency directly be