RevNet: Backpropagation without Storing Activations
Key Takeaways
The RevNet architecture is presented, which allows for backpropagation without storing intermediate activations by restructuring convolutional blocks, trading off computation for memory.
Full Transcript
[Music] this video will explain the reversible ResNet architecture the red net the key idea here is to do back propagation without storing the intermediate activations so the headline is that the red net is an algorithm to restructure a convolutional blocks such that the activation doesn't need to be stored so typically as you would do a forward pass through the network you would store all those intermediate activations to use this back propagation so with the red net restructuring the convolutional block you're trading off computation for more memory so the here's the idea of the reversible block you split the input X into two inputs X 1 and X 2 along the channel dimension and then y1 and y2 are also divided in this way and they're constructed from the each of the x1 next to so this diagram shows what the forward pass this would look like the x1 and x2 are split along the channel dimension and then they're used to compute the y1 and y2 such that if you have Y you can just split along the channel access and then use this to derive X 1 X 2 and thus X so in this sense if you have Y you split it up into y 1 and y 2 and then you use these intermediate functions to get back the X 1 X 2 so this way you don't need to store the intermediate activation so you just need to store the activation from the very last layer and then you can use this propagation rule to derive the activations throughout the network so overall this is the red net back propagation algorithm and so the key difference here is that you have to do like another forward pass so usually if you needed any computation for the forward pass and to end for the back prop now you're gonna need 3n for the back prop because you're gonna have to do one more forward like computation in each of these blocks to derive the input from the activations so this is overall this is 33% more computation than the original block would be so here's a comparison with other there there are some other techniques that they mentioned I didn't really get into the details of them but this is like a state of the art in the spatial complexity of storing activations you only need a constant amount of memory for the activations in the very last layer of the network so these are some of the architectures that they tested they have the ResNet 32 compared to this modification RedNet 38 and then did you know other slight modifications and then they test on the CFR 10 and 100 and imagenet data sets so this is the big impressive performance that they present is that it's only slightly worse than the ResNet on CFR 10 CFR 108 even when you restructure it in this way and the reason that it's slightly worse and not you know exactly the same is because there's some numerical error when you're approximating it in this way but I didn't really completely get into the details of that so this is the ResNet block and so in this case it's f that we're replacing with this reversible block you can also imagine if you have like a channel wise concatenation like in the resident box usually it's the next features are X plus f of X so like you're adding it at each location you're adding the features we could also imagine like concatenating and kind of just like stacking along the previous layers features along the channel dimension so if you did that it's really straightforward to see how you could reverse it as well because if you have this layer Y these activations Y and then they're split such that it's y1 and y2 and y2 is two previous layers activations stacked onto the back of this y1 then you can easily just take the y2 and then you can pass it through the F function to get out to y1 so this is just sort of like an introduction to how this reversibility can work to allow you to go back into the network in different ways of structuring your networks so the big takeaway is that is 33% more computation required but you save yourself a ton of memory so this is really useful for a larger batch sizes networks and just the size of your intermediate feature map stick and what not a frequent problem that I encounter is running out of GPU memory and this is a really great method for avoiding that so thanks for watching this video from Henry AI labs please subscribe for more videos on deep learning
Original Description
This video presents the Reversible ResNet! This technique saves enormous amounts of memory at the trade-off of more computation!
Thanks for watching, please Subscribe!
Paper Link: https://arxiv.org/pdf/1707.04585.pdf
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Connor Shorten · Connor Shorten · 41 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
▶
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
DenseNets
Connor Shorten
DeepWalk Explained
Connor Shorten
Inception Network Explained
Connor Shorten
StackGAN
Connor Shorten
StyleGAN
Connor Shorten
Progressive Growing of GANs Explained
Connor Shorten
Improved Techniques for Training GANs
Connor Shorten
Word2Vec Explained
Connor Shorten
Must Read Papers on GANs
Connor Shorten
Unsupervised Feature Learning
Connor Shorten
Self-Supervised GANs
Connor Shorten
Embedding Graphs with Deep Learning
Connor Shorten
Transfer Learning in GANs
Connor Shorten
ReLU Activation Function
Connor Shorten
AC-GAN Explained
Connor Shorten
SimGAN Explained
Connor Shorten
DC-GAN Explained!
Connor Shorten
ResNet Explained!
Connor Shorten
Graph Convolutional Networks
Connor Shorten
Neural Architecture Search
Connor Shorten
Henry AI Labs
Connor Shorten
Video Classification with Deep Learning
Connor Shorten
BigGANs in Data Augmentation
Connor Shorten
Introduction to Deep Learning
Connor Shorten
EfficientNet Explained!
Connor Shorten
Self-Attention GAN
Connor Shorten
Curriculum Learning in Deep Neural Networks
Connor Shorten
Deep Learning Podcast #1 | Edward Dixon | Stochastic Weight Averaging
Connor Shorten
Deep Compression
Connor Shorten
Skin Cancer Classification with Deep Learning
Connor Shorten
Deep Learning Podcast #2 | Edward Peake | Deep Learning in Medical Imaging
Connor Shorten
The Lottery Ticket Hypothesis Explained!
Connor Shorten
SqueezeNet
Connor Shorten
GauGAN Explained!
Connor Shorten
AutoML with Hyperband
Connor Shorten
DL Podcast #3 | Yannic Kilcher | Population-Based Search
Connor Shorten
Weakly Supervised Pretraining
Connor Shorten
Image Data Augmentation for Deep Learning
Connor Shorten
Unsupervised Data Augmentation
Connor Shorten
Wide ResNet Explained!
Connor Shorten
RevNet: Backpropagation without Storing Activations
Connor Shorten
GANs with Fewer Labels
Connor Shorten
BigBiGAN Unsupervised Learning!
Connor Shorten
Self-Supervised Learning
Connor Shorten
Multi-Task Self-Supervised Learning
Connor Shorten
Self-Supervised GANs
Connor Shorten
Population Based Training
Connor Shorten
Show, Attend and Tell
Connor Shorten
Siamese Neural Networks
Connor Shorten
WaveGAN Explained!
Connor Shorten
VAE-GAN Explained!
Connor Shorten
Evolution in Neural Architecture Search!
Connor Shorten
AI Research Weekly Update August 18th, 2019
Connor Shorten
Weight Agnostic Neural Networks Explained!
Connor Shorten
AI Research Weekly Update August 25th, 2019
Connor Shorten
Neuroevolution of Augmenting Topologies (NEAT)
Connor Shorten
CoDeepNEAT
Connor Shorten
AI Research Weekly Update September 1st, 2019
Connor Shorten
Randomly Wired Neural Networks
Connor Shorten
Genetic CNN
Connor Shorten
More on: Reading ML Papers
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Medium · AI
ICMI 2026 Reviews [D]
Reddit r/MachineLearning
Workshop submission for main conference paper under review [D]
Reddit r/MachineLearning
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Reddit r/MachineLearning
🎓
Tutor Explanation
DeepCamp AI