RevNet: Backpropagation without Storing Activations

Connor Shorten · Advanced ·📄 Research Papers Explained ·6y ago

Key Takeaways

The RevNet architecture is presented, which allows for backpropagation without storing intermediate activations by restructuring convolutional blocks, trading off computation for memory.

Full Transcript

[Music] this video will explain the reversible ResNet architecture the red net the key idea here is to do back propagation without storing the intermediate activations so the headline is that the red net is an algorithm to restructure a convolutional blocks such that the activation doesn't need to be stored so typically as you would do a forward pass through the network you would store all those intermediate activations to use this back propagation so with the red net restructuring the convolutional block you're trading off computation for more memory so the here's the idea of the reversible block you split the input X into two inputs X 1 and X 2 along the channel dimension and then y1 and y2 are also divided in this way and they're constructed from the each of the x1 next to so this diagram shows what the forward pass this would look like the x1 and x2 are split along the channel dimension and then they're used to compute the y1 and y2 such that if you have Y you can just split along the channel access and then use this to derive X 1 X 2 and thus X so in this sense if you have Y you split it up into y 1 and y 2 and then you use these intermediate functions to get back the X 1 X 2 so this way you don't need to store the intermediate activation so you just need to store the activation from the very last layer and then you can use this propagation rule to derive the activations throughout the network so overall this is the red net back propagation algorithm and so the key difference here is that you have to do like another forward pass so usually if you needed any computation for the forward pass and to end for the back prop now you're gonna need 3n for the back prop because you're gonna have to do one more forward like computation in each of these blocks to derive the input from the activations so this is overall this is 33% more computation than the original block would be so here's a comparison with other there there are some other techniques that they mentioned I didn't really get into the details of them but this is like a state of the art in the spatial complexity of storing activations you only need a constant amount of memory for the activations in the very last layer of the network so these are some of the architectures that they tested they have the ResNet 32 compared to this modification RedNet 38 and then did you know other slight modifications and then they test on the CFR 10 and 100 and imagenet data sets so this is the big impressive performance that they present is that it's only slightly worse than the ResNet on CFR 10 CFR 108 even when you restructure it in this way and the reason that it's slightly worse and not you know exactly the same is because there's some numerical error when you're approximating it in this way but I didn't really completely get into the details of that so this is the ResNet block and so in this case it's f that we're replacing with this reversible block you can also imagine if you have like a channel wise concatenation like in the resident box usually it's the next features are X plus f of X so like you're adding it at each location you're adding the features we could also imagine like concatenating and kind of just like stacking along the previous layers features along the channel dimension so if you did that it's really straightforward to see how you could reverse it as well because if you have this layer Y these activations Y and then they're split such that it's y1 and y2 and y2 is two previous layers activations stacked onto the back of this y1 then you can easily just take the y2 and then you can pass it through the F function to get out to y1 so this is just sort of like an introduction to how this reversibility can work to allow you to go back into the network in different ways of structuring your networks so the big takeaway is that is 33% more computation required but you save yourself a ton of memory so this is really useful for a larger batch sizes networks and just the size of your intermediate feature map stick and what not a frequent problem that I encounter is running out of GPU memory and this is a really great method for avoiding that so thanks for watching this video from Henry AI labs please subscribe for more videos on deep learning

Original Description

This video presents the Reversible ResNet! This technique saves enormous amounts of memory at the trade-off of more computation! Thanks for watching, please Subscribe! Paper Link: https://arxiv.org/pdf/1707.04585.pdf
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Connor Shorten · Connor Shorten · 41 of 60

1 DenseNets
DenseNets
Connor Shorten
2 DeepWalk Explained
DeepWalk Explained
Connor Shorten
3 Inception Network Explained
Inception Network Explained
Connor Shorten
4 StackGAN
StackGAN
Connor Shorten
5 StyleGAN
StyleGAN
Connor Shorten
6 Progressive Growing of GANs Explained
Progressive Growing of GANs Explained
Connor Shorten
7 Improved Techniques for Training GANs
Improved Techniques for Training GANs
Connor Shorten
8 Word2Vec Explained
Word2Vec Explained
Connor Shorten
9 Must Read Papers on GANs
Must Read Papers on GANs
Connor Shorten
10 Unsupervised Feature Learning
Unsupervised Feature Learning
Connor Shorten
11 Self-Supervised GANs
Self-Supervised GANs
Connor Shorten
12 Embedding Graphs with Deep Learning
Embedding Graphs with Deep Learning
Connor Shorten
13 Transfer Learning in GANs
Transfer Learning in GANs
Connor Shorten
14 ReLU Activation Function
ReLU Activation Function
Connor Shorten
15 AC-GAN Explained
AC-GAN Explained
Connor Shorten
16 SimGAN Explained
SimGAN Explained
Connor Shorten
17 DC-GAN Explained!
DC-GAN Explained!
Connor Shorten
18 ResNet Explained!
ResNet Explained!
Connor Shorten
19 Graph Convolutional Networks
Graph Convolutional Networks
Connor Shorten
20 Neural Architecture Search
Neural Architecture Search
Connor Shorten
21 Henry AI Labs
Henry AI Labs
Connor Shorten
22 Video Classification with Deep Learning
Video Classification with Deep Learning
Connor Shorten
23 BigGANs in Data Augmentation
BigGANs in Data Augmentation
Connor Shorten
24 Introduction to Deep Learning
Introduction to Deep Learning
Connor Shorten
25 EfficientNet Explained!
EfficientNet Explained!
Connor Shorten
26 Self-Attention GAN
Self-Attention GAN
Connor Shorten
27 Curriculum Learning in Deep Neural Networks
Curriculum Learning in Deep Neural Networks
Connor Shorten
28 Deep Learning Podcast #1 | Edward Dixon | Stochastic Weight Averaging
Deep Learning Podcast #1 | Edward Dixon | Stochastic Weight Averaging
Connor Shorten
29 Deep Compression
Deep Compression
Connor Shorten
30 Skin Cancer Classification with Deep Learning
Skin Cancer Classification with Deep Learning
Connor Shorten
31 Deep Learning Podcast #2 | Edward Peake | Deep Learning in Medical Imaging
Deep Learning Podcast #2 | Edward Peake | Deep Learning in Medical Imaging
Connor Shorten
32 The Lottery Ticket Hypothesis Explained!
The Lottery Ticket Hypothesis Explained!
Connor Shorten
33 SqueezeNet
SqueezeNet
Connor Shorten
34 GauGAN Explained!
GauGAN Explained!
Connor Shorten
35 AutoML with Hyperband
AutoML with Hyperband
Connor Shorten
36 DL Podcast #3 | Yannic Kilcher | Population-Based Search
DL Podcast #3 | Yannic Kilcher | Population-Based Search
Connor Shorten
37 Weakly Supervised Pretraining
Weakly Supervised Pretraining
Connor Shorten
38 Image Data Augmentation for Deep Learning
Image Data Augmentation for Deep Learning
Connor Shorten
39 Unsupervised Data Augmentation
Unsupervised Data Augmentation
Connor Shorten
40 Wide ResNet Explained!
Wide ResNet Explained!
Connor Shorten
RevNet: Backpropagation without Storing Activations
RevNet: Backpropagation without Storing Activations
Connor Shorten
42 GANs with Fewer Labels
GANs with Fewer Labels
Connor Shorten
43 BigBiGAN Unsupervised Learning!
BigBiGAN Unsupervised Learning!
Connor Shorten
44 Self-Supervised Learning
Self-Supervised Learning
Connor Shorten
45 Multi-Task Self-Supervised Learning
Multi-Task Self-Supervised Learning
Connor Shorten
46 Self-Supervised GANs
Self-Supervised GANs
Connor Shorten
47 Population Based Training
Population Based Training
Connor Shorten
48 Show, Attend and Tell
Show, Attend and Tell
Connor Shorten
49 Siamese Neural Networks
Siamese Neural Networks
Connor Shorten
50 WaveGAN Explained!
WaveGAN Explained!
Connor Shorten
51 VAE-GAN Explained!
VAE-GAN Explained!
Connor Shorten
52 Evolution in Neural Architecture Search!
Evolution in Neural Architecture Search!
Connor Shorten
53 AI Research Weekly Update August 18th, 2019
AI Research Weekly Update August 18th, 2019
Connor Shorten
54 Weight Agnostic Neural Networks Explained!
Weight Agnostic Neural Networks Explained!
Connor Shorten
55 AI Research Weekly Update August 25th, 2019
AI Research Weekly Update August 25th, 2019
Connor Shorten
56 Neuroevolution of Augmenting Topologies (NEAT)
Neuroevolution of Augmenting Topologies (NEAT)
Connor Shorten
57 CoDeepNEAT
CoDeepNEAT
Connor Shorten
58 AI Research Weekly Update September 1st, 2019
AI Research Weekly Update September 1st, 2019
Connor Shorten
59 Randomly Wired Neural Networks
Randomly Wired Neural Networks
Connor Shorten
60 Genetic CNN
Genetic CNN
Connor Shorten

The RevNet architecture allows for backpropagation without storing intermediate activations, trading off computation for memory, and is useful for larger batch sizes and networks. This technique can help avoid running out of GPU memory. The key idea is to restructure convolutional blocks such that activations don't need to be stored.

Key Takeaways
  1. Split the input into two parts along the channel dimension
  2. Compute the output using the reversible block
  3. Derive the input from the output using the propagation rule
  4. Apply this technique to convolutional neural networks
  5. Test the performance of the RevNet architecture on different datasets
💡 The RevNet architecture can save a significant amount of memory by not storing intermediate activations, but requires 33% more computation.

Related Reads

📰
I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics
Medium · AI
📰
ICMI 2026 Reviews [D]
Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances
Reddit r/MachineLearning
📰
Workshop submission for main conference paper under review [D]
Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV
Reddit r/MachineLearning
📰
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it
Reddit r/MachineLearning
Up next
Indians Under House Arrest in America? 😱 Immigration Crisis Explained | SumanTV Classroom
SumanTV Classroom
Watch →