How AI Taught Itself to See [DINOv3]
How can we train a general-purpose vision model to perceive our visual world?
This video dives into the fascinating idea of self-supervised learning. We will discuss the basic concepts of transfer learning, contrastive language-image pretraining (CLIP), and self-supervised learning methods, including masked autoencoder, contrastive methods like SimCLR, and self-distillation methods like DINOv1, v2, and v3. I hope you enjoy the video!
00:00 Introduction
00:33 Why do features matter?
01:11 Learning features using classification
02:14 Learning features using language (CLIP)
04:09 Learning fea…
Watch on YouTube ↗
(saves to browser)
Chapters (9)
Introduction
0:33
Why do features matter?
1:11
Learning features using classification
2:14
Learning features using language (CLIP)
4:09
Learning features using pretask (Self-supervised learning)
5:20
Learning features using contrast (SimCLR)
6:36
Learning features using self-distillation (DINOv1)
12:18
DINOv2
13:54
DINOv3
DeepCamp AI