Perception Language Models (PLMs) by Meta – A Fully Open SOTA VLM
In this video, we dive into Perception Language Models (PLMs), introduced in a recent paper from Meta titled PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding.
While most vision-language models (VLMs) today are either closed or trained via distillation from black-box models, PLMs are fully open-source and trained from scratch, without relying on proprietary systems.
They achieve impressive performance, even setting new state-of-the-art results on image and video benchmarks that require detailed visual understanding.
🔗 Written Review - soon :)
🔗 Paper: https://arxiv…
Watch on YouTube ↗
(saves to browser)
Chapters (4)
Introduction
1:25
PLM Architecture
3:40
PLM Training & Data
7:30
Results
DeepCamp AI