Perception Language Models (PLMs) by Meta – A Fully Open SOTA VLM
In this video, we dive into Perception Language Models (PLMs), introduced in a recent paper from Meta titled PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding.
While most vision-language models (VLMs) today are either closed or trained via distillation from black-box models, PLMs are fully open-source and trained from scratch, without relying on proprietary systems.
They achieve impressive performance, even setting new state-of-the-art results on image and video benchmarks that require detailed visual understanding.
🔗 Written Review - soon :)
🔗 Paper: https://arxiv.org/abs/2504.13180
🔗 Models & Code: https://github.com/facebookresearch/perception_models
___________________
🔔 Subscribe for more AI paper reviews!
📩 Join the newsletter → https://aipapersacademy.com/newsletter/
Patreon - https://www.patreon.com/aipapersacademy
The video was edited using VideoScribe - https://tidd.ly/44TZEiX
___________________
Chapters:
0:00 Introduction
1:25 PLM Architecture
3:40 PLM Training & Data
7:30 Results
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: LLM Foundations
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
The ABCs of reading medical research and review papers these days
Medium · LLM
#1 DevLog Meta-research: I Got Tired of Tab Chaos While Reading Research Papers.
Dev.to AI
How to Set Up a Karpathy-Style Wiki for Your Research Field
Medium · AI
The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap
ArXiv cs.AI
Chapters (4)
Introduction
1:25
PLM Architecture
3:40
PLM Training & Data
7:30
Results
🎓
Tutor Explanation
DeepCamp AI