Why are Transformers replacing CNNs?

Julia Turc · Beginner ·👁️ Computer Vision ·3mo ago
Why does a Transformer classify this cat as a cat… while a ResNet calls it a macaw? In this video we break down one of the biggest shifts in computer vision: why Transformers replaced Convolutional Neural Networks (CNNs) — even though CNNs were designed for images and Transformers for language. We’ll compare convolution vs self-attention, explore CNNs’ inductive biases (locality, translation invariance, hierarchical features), and see why self-attention is strictly more expressive than convolution. You’ll also learn how attention can exactly implement convolutional kernels using relative pos…
Watch on YouTube ↗ (saves to browser)

Chapters (8)

Intro
1:30 The convolution operation
3:34 Convolutional Neural Networks (CNNs)
5:51 The inductive bias in CNNs
7:22 Self-attention
10:39 Self-attention can implement convolutions
14:17 Computational power & multi-modality
16:03 ChatGPT can be funny
I Gave This Fish $10,000 to Trade Stocks
Next Up
I Gave This Fish $10,000 to Trade Stocks
Coding with Lewis