Foundations
Computer Vision
Object detection, segmentation, YOLO, CLIP, and vision-language models
Skills in this topic
3 skills — Sign in to track your progress
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
Towards single-shot coherent imaging via overlap-free ptychography
arXiv:2602.21361v2 Announce Type: replace-cross Abstract: Ptychographic imaging at synchrotron and XFEL sources requires dense overlapping scans, limiting throu
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
The Pulse of Motion: Measuring Physical Frame Rate from Visual Dynamics
arXiv:2603.14375v2 Announce Type: replace-cross Abstract: While recent generative video models have achieved remarkable visual realism and are being explored as

Forbes Innovation
👁️ Computer Vision
⚡ AI Lesson
3mo ago
Google Confirms High-Risk Update For 3.5 Billion Chrome Users
Nearly all 3.5 billion Chrome browser users will soon see a ‘high-risk’ security update from Google. Here’s what you need to know.

Forbes Innovation
👁️ Computer Vision
⚡ AI Lesson
3mo ago
Uh Oh—New ‘Hack Yourself’ Apple Mac Attack Can Steal Your Passwords
A newly discovered attack sandbags Apple users into hacking themselves. Here’s what all Mac users need to know.
Dev.to AI
👁️ Computer Vision
⚡ AI Lesson
3mo ago
$58.3B in Synthetic Fraud Warns Investigators: "I Eyeballed It" Won't Hold Up Much Longer
The $58 Billion Synthetic Identity Crisis For developers building computer vision pipelines, biometric authentication, or OSINT tools, the latest fraud projecti

Hackernoon
👁️ Computer Vision
⚡ AI Lesson
3mo ago
Building Ultra-Lightweight Image Classifiers with TinyVision (Part 1)
This article explores how small image classification models can get while remaining effective. Using handcrafted feature pipelines and compact CNN architectures

Hackernoon
👁️ Computer Vision
⚡ AI Lesson
3mo ago
When Verified Source Lies
I deployed a staking vault on Sepolia and got it verified on Etherscan with a green checkmark. The source code contains a storage write that does not exist in t
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
Is Geometry Enough? An Evaluation of Landmark-Based Gaze Estimation
arXiv:2603.24724v1 Announce Type: cross Abstract: Appearance-based gaze estimation frequently relies on deep Convolutional Neural Networks (CNNs). These models
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
MoireMix: A Formula-Based Data Augmentation for Improving Image Classification Robustness
arXiv:2603.25109v1 Announce Type: cross Abstract: Data augmentation is a key technique for improving the robustness of image classification models. However, man
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
Knowledge-Guided Adversarial Training for Infrared Object Detection via Thermal Radiation Modeling
arXiv:2603.25170v1 Announce Type: cross Abstract: In complex environments, infrared object detection exhibits broad applicability and stability across diverse s
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
Image Rotation Angle Estimation: Comparing Circular-Aware Methods
arXiv:2603.25351v1 Announce Type: cross Abstract: Automatic image rotation estimation is a key preprocessing step in many vision pipelines. This task is challen
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
Challenges in Hyperspectral Imaging for Autonomous Driving: The HSI-Drive Case
arXiv:2603.25510v1 Announce Type: cross Abstract: The use of hyperspectral imaging (HSI) in autonomous driving (AD), while promising, faces many challenges rela
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild
arXiv:2603.25524v1 Announce Type: cross Abstract: Long-term behavioral monitoring of individual animals is crucial for studying behavioral changes that occur ov
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
Just Zoom In: Cross-View Geo-Localization via Autoregressive Zooming
arXiv:2603.25686v1 Announce Type: cross Abstract: Cross-view geo-localization (CVGL) estimates a camera's location by matching a street-view image to geo-refere
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
PixelSmile: Toward Fine-Grained Facial Expression Editing
arXiv:2603.25728v1 Announce Type: cross Abstract: Fine-grained facial expression editing has long been limited by intrinsic semantic overlap. To address this, w
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
Generative deep learning for foundational video translation in ultrasound
arXiv:2511.03255v2 Announce Type: replace-cross Abstract: Deep learning (DL) has the potential to revolutionize image acquisition and interpretation across medi
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
IDESplat: Iterative Depth Probability Estimation for Generalizable 3D Gaussian Splatting
arXiv:2601.03824v3 Announce Type: replace-cross Abstract: Generalizable 3D Gaussian Splatting aims to directly predict Gaussian parameters using a feed-forward
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
Towards Exploratory and Focused Manipulation with Bimanual Active Perception: A New Problem, Benchmark and Strategy
arXiv:2602.01939v3 Announce Type: replace-cross Abstract: Recently, active vision has reemerged as an important concept for manipulation, since visual occlusion
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
Monocular Normal Estimation via Shading Sequence Estimation
arXiv:2602.09929v5 Announce Type: replace-cross Abstract: Monocular normal estimation aims to estimate the normal map from a single RGB image of an object under
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
From Scale to Speed: Adaptive Test-Time Scaling for Image Editing
arXiv:2603.00141v3 Announce Type: replace-cross Abstract: Image Chain-of-Thought (Image-CoT) is a test-time scaling paradigm that improves image generation by e

Microsoft Research
👁️ Computer Vision
⚡ AI Lesson
3mo ago
AsgardBench: A benchmark for visually grounded interactive planning
Imagine a robot tasked with cleaning a kitchen. It needs to observe its environment, decide what to do, and adjust when things don’t go as expected, for example
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
Estimating Individual Tree Height and Species from UAV Imagery
arXiv:2603.23669v1 Announce Type: cross Abstract: Accurate estimation of forest biomass, a major carbon sink, relies heavily on tree-level traits such as height
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
Revealing Multi-View Hallucination in Large Vision-Language Models
arXiv:2603.23934v1 Announce Type: cross Abstract: Large vision-language models (LVLMs) are increasingly being applied to multi-view image inputs captured from d
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
High-Fidelity Face Content Recovery via Tamper-Resilient Versatile Watermarking
arXiv:2603.23940v1 Announce Type: cross Abstract: The proliferation of AIGC-driven face manipulation and deepfakes poses severe threats to media provenance, int
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
Language-Guided Structure-Aware Network for Camouflaged Object Detection
arXiv:2603.24355v1 Announce Type: cross Abstract: Camouflaged Object Detection (COD) aims to segment objects that are highly integrated with the background in t
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
SEGAR: Selective Enhancement for Generative Augmented Reality
arXiv:2603.24541v1 Announce Type: cross Abstract: Generative world models offer a compelling foundation for augmented-reality (AR) applications: by predicting f
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
LensWalk: Agentic Video Understanding by Planning How You See in Videos
arXiv:2603.24558v1 Announce Type: cross Abstract: The dense, temporal nature of video presents a profound challenge for automated analysis. Despite the use of p
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models
arXiv:2603.24575v1 Announce Type: cross Abstract: Scalable Vector Graphics (SVG) are an essential format for technical illustration and digital design, offering
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
DIDLM: A SLAM Dataset for Difficult Scenarios Featuring Infrared, Depth Cameras, LIDAR, 4D Radar, and Others under Adverse Weather, Low Light Conditions, and Rough Roads
arXiv:2404.09622v3 Announce Type: replace-cross Abstract: Adverse weather conditions, low-light environments, and bumpy road surfaces pose significant challenge
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration
arXiv:2505.18047v3 Announce Type: replace-cross Abstract: The use of latent diffusion models (LDMs) such as Stable Diffusion has significantly improved the perc
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
Wideband RF Radiance Field Modeling Using Frequency-embedded 3D Gaussian Splatting
arXiv:2505.20714v3 Announce Type: replace-cross Abstract: Indoor environments typically contain diverse RF signals distributed across multiple frequency bands,
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
Pixel-level Scene Understanding in One Token: Visual States Need What-is-Where Composition
arXiv:2603.13904v2 Announce Type: replace-cross Abstract: For robotic agents operating in dynamic environments, learning visual state representations from strea
The Verge
👁️ Computer Vision
⚡ AI Lesson
3mo ago
Intel and LG Display may have beaten Apple and Qualcomm with the best laptop battery life ever
One of the coolest laptops we saw at CES in January was the new Dell XPS 16, with a unique 1-120Hz variable refresh rate display that can sip power when you don
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
PhySe-RPO: Physics and Semantics Guided Relative Policy Optimization for Diffusion-Based Surgical Smoke Removal
arXiv:2603.22844v1 Announce Type: new Abstract: Surgical smoke severely degrades intraoperative video quality, obscuring anatomical structures and limiting surg
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
Color When It Counts: Grayscale-Guided Online Triggering for Always-On Streaming Video Sensing
arXiv:2603.22466v1 Announce Type: cross Abstract: Always-on sensing is essential for next-generation edge/wearable AI systems, yet continuous high-fidelity RGB
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
UAV-DETR: DETR for Anti-Drone Target Detection
arXiv:2603.22841v1 Announce Type: cross Abstract: Drone detection is pivotal in numerous security and counter-UAV applications. However, existing deep learning-
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
UniQueR: Unified Query-based Feedforward 3D Reconstruction
arXiv:2603.22851v1 Announce Type: cross Abstract: We present UniQueR, a unified query-based feedforward framework for efficient and accurate 3D reconstruction f
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
YOLOv10 with Kolmogorov-Arnold networks and vision-language foundation models for interpretable object detection and trustworthy multimodal AI in computer vision perception
arXiv:2603.23037v1 Announce Type: cross Abstract: The interpretable object detection capabilities of a novel Kolmogorov-Arnold network framework are examined he
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
PRISM: Video Dataset Condensation with Progressive Refinement and Insertion for Sparse Motion
arXiv:2505.22564v2 Announce Type: replace-cross Abstract: Video dataset condensation aims to reduce the immense computational cost of video processing. However,
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
MS-DGCNN++: Multi-Scale Dynamic Graph Convolution with Scale-Dependent Normalization for Robust LiDAR Tree Species Classification
arXiv:2507.12602v2 Announce Type: replace-cross Abstract: Graph-based deep learning on LiDAR point clouds encodes geometry through edge features, yet standard i
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench
arXiv:2510.26865v2 Announce Type: replace-cross Abstract: Reading measurement instruments is effortless for humans and requires relatively little domain experti
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
Pedestrian Crossing Intention Prediction Using Multimodal Fusion Network
arXiv:2511.20008v2 Announce Type: replace-cross Abstract: Pedestrian crossing intention prediction is essential for the deployment of autonomous vehicles (AVs)
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
LPNSR: Prior-Enhanced Diffusion Image Super-Resolution via LR-Guided Noise Prediction
arXiv:2603.21045v2 Announce Type: replace-cross Abstract: Diffusion-based image super-resolution (SR), which aims to reconstruct high-resolution (HR) images fro
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
Beyond Matching to Tiles: Bridging Unaligned Aerial and Satellite Views for Vision-Only UAV Navigation
arXiv:2603.22153v2 Announce Type: replace-cross Abstract: Recent advances in cross-view geo-localization (CVGL) methods have shown strong potential for supporti
TechCrunch AI
👁️ Computer Vision
⚡ AI Lesson
3mo ago
Arm is releasing the first in-house chip in its 35-year history
Arm is producing its own CPU for the first time. It developed the CPU with Meta, which is also the chip's first customer.
Dev.to AI
👁️ Computer Vision
⚡ AI Lesson
3mo ago
Tinyvision:-Building Ultra-Lightweight Models for Image Tasks(Part-1)
How Small Can Image Classifiers Get? My Experiments with Ultra-Lightweight Models The repo is at https://github.com/SaptakBhoumik/TinyVision . If you find it in

Forbes Innovation
👁️ Computer Vision
⚡ AI Lesson
3mo ago
Ugreen Reveals Its New Generation Maxidok Thunderbolt 5 Docks
Ugreen's new range of Thunderbolt 5 Maxidok docking stations for Mac and PC users can leverage the 120Gbps data transfer speeds available with the latest standa
ArXiv cs.AI
👁️ Computer Vision
📄 Paper
⚡ AI Lesson
3mo ago
LoD-Loc v3: Generalized Aerial Localization in Dense Cities using Instance Silhouette Alignment
arXiv:2603.19609v1 Announce Type: cross Abstract: We present LoD-Loc v3, a novel method for generalized aerial visual localization in dense urban environments.
DeepCamp AI