Yann LeCun - Self-Supervised Learning: The Dark Matter of Intelligence (FAIR Blog Post Explained)

Yannic Kilcher · Beginner ·📄 Research Papers Explained ·5y ago

Skills: Research Methods90%Reading ML Papers80%LLM Foundations70%LLM Engineering60%RAG Basics50%

#selfsupervisedlearning #yannlecun #facebookai Deep Learning systems can achieve remarkable, even super-human performance through supervised learning on large, labeled datasets. However, there are two problems: First, collecting ever more labeled data is expensive in both time and money. Second, these deep neural networks will be high performers on their task, but cannot easily generalize to other, related tasks, or they need large amounts of data to do so. In this blog post, Yann LeCun and Ishan Misra of Facebook AI Research (FAIR) describe the current state of Self-Supervised Learning (SSL) and argue that it is the next step in the development of AI that uses fewer labels and can transfer knowledge faster than current systems. They suggest as a promising direction to build non-contrastive latent-variable predictive models, like VAEs, but ones that also provide high-quality latent representations for downstream tasks. OUTLINE: 0:00 - Intro & Overview 1:15 - Supervised Learning, Self-Supervised Learning, and Common Sense 7:35 - Predicting Hidden Parts from Observed Parts 17:50 - Self-Supervised Learning for Language vs Vision 26:50 - Energy-Based Models 30:15 - Joint-Embedding Models 35:45 - Contrastive Methods 43:45 - Latent-Variable Predictive Models and GANs 55:00 - Summary & Conclusion Paper (Blog Post): https://ai.facebook.com/blog/self-supervised-learning-the-dark-matter-of-intelligence My Video on BYOL: https://www.youtube.com/watch?v=YPfUiOMYOEE ERRATA: - The difference between loss and energy: Energy is for inference, loss is for training. - The R(z) term is a regularizer that restricts the capacity of the latent variable. I think I said both of those things, but never together. - The way I explain why BERT is contrastive is wrong. I haven't figured out why just yet, though :) Video approved by Antonio. Abstract: We believe that self-supervised learning (SSL) is one of the most promising ways to build such background knowledge and approximate a form o

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Yannic Kilcher · Yannic Kilcher · 0 of 60

← Previous Next →

Imagination-Augmented Agents for Deep Reinforcement Learning

Imagination-Augmented Agents for Deep Reinforcement Learning

Learning model-based planning from scratch

Learning model-based planning from scratch

Reinforcement Learning with Unsupervised Auxiliary Tasks

Reinforcement Learning with Unsupervised Auxiliary Tasks

Attention Is All You Need

Attention Is All You Need

git for research basics: fundamentals, commits, branches, merging

git for research basics: fundamentals, commits, branches, merging

Curiosity-driven Exploration by Self-supervised Prediction

Curiosity-driven Exploration by Self-supervised Prediction

Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations

Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations

Stochastic RNNs without Teacher-Forcing

Stochastic RNNs without Teacher-Forcing

What’s in a name? The need to nip NIPS

What’s in a name? The need to nip NIPS

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

GPT-2: Language Models are Unsupervised Multitask Learners

GPT-2: Language Models are Unsupervised Multitask Learners

Neural Ordinary Differential Equations

Neural Ordinary Differential Equations

The Odds are Odd: A Statistical Test for Detecting Adversarial Examples

The Odds are Odd: A Statistical Test for Detecting Adversarial Examples

Discriminating Systems - Gender, Race, and Power in AI

Discriminating Systems - Gender, Race, and Power in AI

Blockwise Parallel Decoding for Deep Autoregressive Models

Blockwise Parallel Decoding for Deep Autoregressive Models

S.H.E. - Search. Human. Equalizer.

S.H.E. - Search. Human. Equalizer.

Reinforcement Learning, Fast and Slow

Reinforcement Learning, Fast and Slow

Adversarial Examples Are Not Bugs, They Are Features

Adversarial Examples Are Not Bugs, They Are Features

I'm at ICML19 :)

I'm at ICML19 :)

Population-Based Search and Open-Ended Algorithms

Population-Based Search and Open-Ended Algorithms

XLNet: Generalized Autoregressive Pretraining for Language Understanding

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Conversation about Population-Based Methods (Re-upload)

Conversation about Population-Based Methods (Re-upload)

Reconciling modern machine learning and the bias-variance trade-off

Reconciling modern machine learning and the bias-variance trade-off

Learning World Graphs to Accelerate Hierarchical Reinforcement Learning

Learning World Graphs to Accelerate Hierarchical Reinforcement Learning

Manifold Mixup: Better Representations by Interpolating Hidden States

Manifold Mixup: Better Representations by Interpolating Hidden States

Processing Megapixel Images with Deep Attention-Sampling Models

Processing Megapixel Images with Deep Attention-Sampling Models

Gauge Equivariant Convolutional Networks and the Icosahedral CNN

Gauge Equivariant Convolutional Networks and the Icosahedral CNN

Auditing Radicalization Pathways on YouTube

Auditing Radicalization Pathways on YouTube

RoBERTa: A Robustly Optimized BERT Pretraining Approach

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Dynamic Routing Between Capsules

Dynamic Routing Between Capsules

DEEP LEARNING MEME REVIEW - Episode 1

DEEP LEARNING MEME REVIEW - Episode 1

Accelerating Deep Learning by Focusing on the Biggest Losers

Accelerating Deep Learning by Focusing on the Biggest Losers

[News] The Siraj Raval Controversy

[News] The Siraj Raval Controversy

LeDeepChef 👨‍🍳 Deep Reinforcement Learning Agent for Families of Text-Based Games

LeDeepChef 👨‍🍳 Deep Reinforcement Learning Agent for Families of Text-Based Games

The Visual Task Adaptation Benchmark

The Visual Task Adaptation Benchmark

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning

AlphaStar: Grandmaster level in StarCraft II using multi-agent reinforcement learning

SinGAN: Learning a Generative Model from a Single Natural Image

SinGAN: Learning a Generative Model from a Single Natural Image

A neurally plausible model learns successor representations in partially observable environments

A neurally plausible model learns successor representations in partially observable environments

MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

MuZero: Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Reinforcement Learning Upside Down: Don't Predict Rewards -- Just Map Them to Actions

Reinforcement Learning Upside Down: Don't Predict Rewards -- Just Map Them to Actions

NeurIPS 19 Poster Session

NeurIPS 19 Poster Session

Go-Explore: a New Approach for Hard-Exploration Problems

Go-Explore: a New Approach for Hard-Exploration Problems

Reformer: The Efficient Transformer

Reformer: The Efficient Transformer

[Interview] Mark Ledwich - Algorithmic Extremism: Examining YouTube's Rabbit Hole of Radicalization

[Interview] Mark Ledwich - Algorithmic Extremism: Examining YouTube's Rabbit Hole of Radicalization

Turing-NLG, DeepSpeed and the ZeRO optimizer

Turing-NLG, DeepSpeed and the ZeRO optimizer

Growing Neural Cellular Automata

Growing Neural Cellular Automata

NeurIPS 2020 Changes to Paper Submission Process

NeurIPS 2020 Changes to Paper Submission Process

Deep Learning for Symbolic Mathematics

Deep Learning for Symbolic Mathematics

Online Education - How I Make My Videos

Online Education - How I Make My Videos

[Rant] coronavirus

[Rant] coronavirus

Axial Attention & MetNet: A Neural Weather Model for Precipitation Forecasting

Axial Attention & MetNet: A Neural Weather Model for Precipitation Forecasting

Agent57: Outperforming the Atari Human Benchmark

Agent57: Outperforming the Atari Human Benchmark

State-of-Art-Reviewing: A Radical Proposal to Improve Scientific Publication

State-of-Art-Reviewing: A Radical Proposal to Improve Scientific Publication

Dream to Control: Learning Behaviors by Latent Imagination

Dream to Control: Learning Behaviors by Latent Imagination

POET: Endlessly Generating Increasingly Complex and Diverse Learning Environments and Solutions

POET: Endlessly Generating Increasingly Complex and Diverse Learning Environments and Solutions

Evaluating NLP Models via Contrast Sets

Evaluating NLP Models via Contrast Sets

[Drama] Who invented Contrast Sets?

[Drama] Who invented Contrast Sets?

More on: Research Methods

View skill →

Mechanics of Materials III: Beam Bending

Mechanics of Materials III: Beam Bending

Inaugural Lecture: Juliane Reinecke

Inaugural Lecture: Juliane Reinecke

Saïd Business School, University of Oxford

Hands-On Learning: How and Why You Should Build a Home Lab

Hands-On Learning: How and Why You Should Build a Home Lab

SANS Live Online Interactive Remote Lab and Range Demo – SEC599: Defeating Advanced Adversaries

SANS Live Online Interactive Remote Lab and Range Demo – SEC599: Defeating Advanced Adversaries

NVIDIA cuOpt Wins the 2025 COIN-OR Cup

NVIDIA cuOpt Wins the 2025 COIN-OR Cup

NVIDIA Developer

Framework for Data Collection and Analysis

Framework for Data Collection and Analysis

Related AI Lessons

The ABCs of reading medical research and review papers these days

Learn to critically evaluate medical research papers by accepting nothing at face value, believing no one blindly, and checking everything

#1 DevLog Meta-research: I Got Tired of Tab Chaos While Reading Research Papers.

Learn to manage research paper tabs efficiently and apply meta-research techniques to improve productivity

How to Set Up a Karpathy-Style Wiki for Your Research Field

Learn to set up a Karpathy-style wiki for your research field to organize and share knowledge effectively

The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap

Scientific knowledge may be stuck in a local minimum, hindering optimal progress, and understanding this concept is crucial for advancing research

Chapters (9)

Intro & Overview

1:15 Supervised Learning, Self-Supervised Learning, and Common Sense

7:35 Predicting Hidden Parts from Observed Parts

17:50 Self-Supervised Learning for Language vs Vision

26:50 Energy-Based Models

30:15 Joint-Embedding Models

35:45 Contrastive Methods

43:45 Latent-Variable Predictive Models and GANs

55:00 Summary & Conclusion

Generating novel scientific hypotheses with Co-Scientist

Google DeepMind