RevNet: Backpropagation without Storing Activations

Connor Shorten · Advanced ·📄 Research Papers Explained ·6y ago

Skills: Reading ML Papers90%Neural Network Basics80%ML Pipelines70%

Key Takeaways

The RevNet architecture is presented, which allows for backpropagation without storing intermediate activations by restructuring convolutional blocks, trading off computation for memory.

Full Transcript

[Music] this video will explain the reversible ResNet architecture the red net the key idea here is to do back propagation without storing the intermediate activations so the headline is that the red net is an algorithm to restructure a convolutional blocks such that the activation doesn't need to be stored so typically as you would do a forward pass through the network you would store all those intermediate activations to use this back propagation so with the red net restructuring the convolutional block you're trading off computation for more memory so the here's the idea of the reversible block you split the input X into two inputs X 1 and X 2 along the channel dimension and then y1 and y2 are also divided in this way and they're constructed from the each of the x1 next to so this diagram shows what the forward pass this would look like the x1 and x2 are split along the channel dimension and then they're used to compute the y1 and y2 such that if you have Y you can just split along the channel access and then use this to derive X 1 X 2 and thus X so in this sense if you have Y you split it up into y 1 and y 2 and then you use these intermediate functions to get back the X 1 X 2 so this way you don't need to store the intermediate activation so you just need to store the activation from the very last layer and then you can use this propagation rule to derive the activations throughout the network so overall this is the red net back propagation algorithm and so the key difference here is that you have to do like another forward pass so usually if you needed any computation for the forward pass and to end for the back prop now you're gonna need 3n for the back prop because you're gonna have to do one more forward like computation in each of these blocks to derive the input from the activations so this is overall this is 33% more computation than the original block would be so here's a comparison with other there there are some other techniques that they mentioned I didn't really get into the details of them but this is like a state of the art in the spatial complexity of storing activations you only need a constant amount of memory for the activations in the very last layer of the network so these are some of the architectures that they tested they have the ResNet 32 compared to this modification RedNet 38 and then did you know other slight modifications and then they test on the CFR 10 and 100 and imagenet data sets so this is the big impressive performance that they present is that it's only slightly worse than the ResNet on CFR 10 CFR 108 even when you restructure it in this way and the reason that it's slightly worse and not you know exactly the same is because there's some numerical error when you're approximating it in this way but I didn't really completely get into the details of that so this is the ResNet block and so in this case it's f that we're replacing with this reversible block you can also imagine if you have like a channel wise concatenation like in the resident box usually it's the next features are X plus f of X so like you're adding it at each location you're adding the features we could also imagine like concatenating and kind of just like stacking along the previous layers features along the channel dimension so if you did that it's really straightforward to see how you could reverse it as well because if you have this layer Y these activations Y and then they're split such that it's y1 and y2 and y2 is two previous layers activations stacked onto the back of this y1 then you can easily just take the y2 and then you can pass it through the F function to get out to y1 so this is just sort of like an introduction to how this reversibility can work to allow you to go back into the network in different ways of structuring your networks so the big takeaway is that is 33% more computation required but you save yourself a ton of memory so this is really useful for a larger batch sizes networks and just the size of your intermediate feature map stick and what not a frequent problem that I encounter is running out of GPU memory and this is a really great method for avoiding that so thanks for watching this video from Henry AI labs please subscribe for more videos on deep learning

Original Description

This video presents the Reversible ResNet! This technique saves enormous amounts of memory at the trade-off of more computation! Thanks for watching, please Subscribe! Paper Link: https://arxiv.org/pdf/1707.04585.pdf

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Connor Shorten · Connor Shorten · 41 of 60

← Previous Next →

DeepWalk Explained

DeepWalk Explained

Inception Network Explained

Inception Network Explained

Progressive Growing of GANs Explained

Progressive Growing of GANs Explained

Improved Techniques for Training GANs

Improved Techniques for Training GANs

Word2Vec Explained

Word2Vec Explained

Must Read Papers on GANs

Must Read Papers on GANs

Unsupervised Feature Learning

Unsupervised Feature Learning

Self-Supervised GANs

Self-Supervised GANs

Embedding Graphs with Deep Learning

Embedding Graphs with Deep Learning

Transfer Learning in GANs

Transfer Learning in GANs

ReLU Activation Function

ReLU Activation Function

AC-GAN Explained

AC-GAN Explained

SimGAN Explained

SimGAN Explained

DC-GAN Explained!

DC-GAN Explained!

ResNet Explained!

ResNet Explained!

Graph Convolutional Networks

Graph Convolutional Networks

Neural Architecture Search

Neural Architecture Search

Video Classification with Deep Learning

Video Classification with Deep Learning

BigGANs in Data Augmentation

BigGANs in Data Augmentation

Introduction to Deep Learning

Introduction to Deep Learning

EfficientNet Explained!

EfficientNet Explained!

Self-Attention GAN

Self-Attention GAN

Curriculum Learning in Deep Neural Networks

Curriculum Learning in Deep Neural Networks

Deep Learning Podcast #1 | Edward Dixon | Stochastic Weight Averaging

Deep Learning Podcast #1 | Edward Dixon | Stochastic Weight Averaging

Deep Compression

Deep Compression

Skin Cancer Classification with Deep Learning

Skin Cancer Classification with Deep Learning

Deep Learning Podcast #2 | Edward Peake | Deep Learning in Medical Imaging

Deep Learning Podcast #2 | Edward Peake | Deep Learning in Medical Imaging

The Lottery Ticket Hypothesis Explained!

The Lottery Ticket Hypothesis Explained!

GauGAN Explained!

GauGAN Explained!

AutoML with Hyperband

AutoML with Hyperband

DL Podcast #3 | Yannic Kilcher | Population-Based Search

DL Podcast #3 | Yannic Kilcher | Population-Based Search

Weakly Supervised Pretraining

Weakly Supervised Pretraining

Image Data Augmentation for Deep Learning

Image Data Augmentation for Deep Learning

Unsupervised Data Augmentation

Unsupervised Data Augmentation

Wide ResNet Explained!

Wide ResNet Explained!

RevNet: Backpropagation without Storing Activations

RevNet: Backpropagation without Storing Activations

GANs with Fewer Labels

GANs with Fewer Labels

BigBiGAN Unsupervised Learning!

BigBiGAN Unsupervised Learning!

Self-Supervised Learning

Self-Supervised Learning

Multi-Task Self-Supervised Learning

Multi-Task Self-Supervised Learning

Self-Supervised GANs

Self-Supervised GANs

Population Based Training

Population Based Training

Show, Attend and Tell

Show, Attend and Tell

Siamese Neural Networks

Siamese Neural Networks

WaveGAN Explained!

WaveGAN Explained!

VAE-GAN Explained!

VAE-GAN Explained!

Evolution in Neural Architecture Search!

Evolution in Neural Architecture Search!

AI Research Weekly Update August 18th, 2019

AI Research Weekly Update August 18th, 2019

Weight Agnostic Neural Networks Explained!

Weight Agnostic Neural Networks Explained!

AI Research Weekly Update August 25th, 2019

AI Research Weekly Update August 25th, 2019

Neuroevolution of Augmenting Topologies (NEAT)

Neuroevolution of Augmenting Topologies (NEAT)

AI Research Weekly Update September 1st, 2019

AI Research Weekly Update September 1st, 2019

Randomly Wired Neural Networks

Randomly Wired Neural Networks

The RevNet architecture allows for backpropagation without storing intermediate activations, trading off computation for memory, and is useful for larger batch sizes and networks. This technique can help avoid running out of GPU memory. The key idea is to restructure convolutional blocks such that activations don't need to be stored.

Key Takeaways

Split the input into two parts along the channel dimension
Compute the output using the reversible block
Derive the input from the output using the propagation rule
Apply this technique to convolutional neural networks
Test the performance of the RevNet architecture on different datasets

💡 The RevNet architecture can save a significant amount of memory by not storing intermediate activations, but requires 33% more computation.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Reading ML Papers

View skill →

Automatic Literature Review with GPT-3 - I embedded and indexed all of arXiv into a search engine!

Automatic Literature Review with GPT-3 - I embedded and indexed all of arXiv into a search engine!

Marcos Lopez Caniego - ESASky's JupyterLab widget| JupyterCon 2020

Marcos Lopez Caniego - ESASky's JupyterLab widget| JupyterCon 2020

Obsidian Zotero Integration Plugin | Streamline Your Research Paper Workflow 📝️

Obsidian Zotero Integration Plugin | Streamline Your Research Paper Workflow 📝️

This FULLY FREE Research Agent can BUILD Reports in Minutes!!!

This FULLY FREE Research Agent can BUILD Reports in Minutes!!!

Claude 3.7 Sonnet API | Build a Research Assistant

Claude 3.7 Sonnet API | Build a Research Assistant

I Built An Obsidian AI Research Assistant with Oz...

I Built An Obsidian AI Research Assistant with Oz...

Related Reads

I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way

Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics

ICMI 2026 Reviews [D]

Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances

Reddit r/MachineLearning

Workshop submission for main conference paper under review [D]

Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV

Reddit r/MachineLearning

Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]

Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it

Reddit r/MachineLearning

Indians Under House Arrest in America? 😱 Immigration Crisis Explained | SumanTV Classroom

SumanTV Classroom