Deep Compression

Connor Shorten · Advanced ·📄 Research Papers Explained ·7y ago

Key Takeaways

Deep Compression technique reduces the file size of deep neural networks through a pipeline of Pruning, Quantization, and Huffman Encoding, as demonstrated on AlexNet and VGG-16 models.

Full Transcript

[Music] this video will explain deep compression deep compression is a technique to dramatically reduce the file size of deep neural networks the results of this paper reduce Alex Net by 35 times this results in a reduction from 240 megabytes to 6.9 they also reduced vgg 16 by a factor of 49 from 552 megabytes to 11.3 this is really interesting for mobile developers especially because if you have an app which is over a hundred megabytes you'll get the user will get this extra notification and maybe won't download your app in addition having these large file sizes in your apps results in higher energy consumption and larger file size when people are looking to clear some space on their phones so the way that deep compression works is a three-stage compression pipeline of pruning quantization and Huffman coding pruning works by reducing the number of connections or weights by nine times thirty thirteen times in this paper the way that they do this is that they go through the weights in the neural network and they remove the weights that are below a certain threshold so for example if the neural network weight is zero or maybe zero point one or negative zero point one it would be completely masked out and they would no longer use this in matrix multiplications that compute the neural network output so after they prune the weights they store it in this compressed sparse row format in order to save space and what this does is it indexes the index difference between the weights that haven't been masked out after pruning they quantize the weights and this is the most interesting idea to me quantization is this idea of weight sharing and it can really speed up this training and storage requirement so what they would do is they would use a k-means algorithm to cluster the weights into similar groups so for example here all the weights that are labeled to green map 2-1 the weights would label is orange back to zero and so on so what they do is they now can represent all the weights in the network with like three bits because they're only using say like if you have three bits you can represent eight weights to to the three and they would set every way in the network to be one of these quantized weights from the codebook then what they would do is they would train the network and they would still do the partial derivative with respect to each way even though they all show the same way but then they're gonna aggregate each of these like the partial derivative that explains how much the codebook contributed to each of the individual losses and they're gonna aggregate this to effectively train the codebook so it would be initial centroids are derived after the full training and then they're fine-tuned after they're quantized so in quantizing an alux net they use eight bits for the convolutional layers and five bits for the fully connected layers so using five bits in the quantization means that you have thirty-two different values for the weights that can be used in the neural network and amazingly this doesn't have any loss of accuracy when they do it in this experiment so again quantization reduces the number of bits in the fully connected layers from 32 bits meaning that there are two to the 32 different values the weights can take on to five meaning 32 values they can take on so this equation right here shows how the compression rate when using quantization this plot shows how the code books to how the weights tend to be distributed and it's pretty interesting to see that they thought that they're in this bimodal structure where there are basically two gaussians next to each other for the distribution of the weights in the deep neural network so once they have quantized the weights they turn to Huffman encoding and Huffman encoding is a technique used in data compression to basically take advantage of the bias distribution values so for example in this bimodal distribution most of the values lie at the peak so what you might do is you would use like zero or zero one or something like that to encode for this really frequent occurring value and then you would use the longer bits to it to encode the rare values and this this is like a classical technique in data compression that really reduces file size and things like JPEG or you know PNG stuff like that so these are the results across different networks that they compress using the three-stage pipeline of pruning quantization at Huffman they're all pretty amazing compression conversion rates especially when you look at the number of parameters and the size of the file so these are the results on the alux net datasets ALX net model the first column shows the percentage of the ways that are pruned then the weight bits after pruning post quantization and then it shows how the pruning quantization Huffman coding interact together to really achieve massive compression rate thanks for watching this video on deep compression the paper link is provided in the description please subscribe to Henry AI labs for more deep learning videos [Music]

Original Description

This video will explain Deep Compression in Deep Neural Networks! This technique reduces the size of VGG-16 from 552 MB to 11.3 MB through a pipeline of Pruning, Quantization, and Huffman Encoding! Thanks for watching, Please Subscribe! Paper Link: https://arxiv.org/abs/1510.00149
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Connor Shorten · Connor Shorten · 29 of 60

1 DenseNets
DenseNets
Connor Shorten
2 DeepWalk Explained
DeepWalk Explained
Connor Shorten
3 Inception Network Explained
Inception Network Explained
Connor Shorten
4 StackGAN
StackGAN
Connor Shorten
5 StyleGAN
StyleGAN
Connor Shorten
6 Progressive Growing of GANs Explained
Progressive Growing of GANs Explained
Connor Shorten
7 Improved Techniques for Training GANs
Improved Techniques for Training GANs
Connor Shorten
8 Word2Vec Explained
Word2Vec Explained
Connor Shorten
9 Must Read Papers on GANs
Must Read Papers on GANs
Connor Shorten
10 Unsupervised Feature Learning
Unsupervised Feature Learning
Connor Shorten
11 Self-Supervised GANs
Self-Supervised GANs
Connor Shorten
12 Embedding Graphs with Deep Learning
Embedding Graphs with Deep Learning
Connor Shorten
13 Transfer Learning in GANs
Transfer Learning in GANs
Connor Shorten
14 ReLU Activation Function
ReLU Activation Function
Connor Shorten
15 AC-GAN Explained
AC-GAN Explained
Connor Shorten
16 SimGAN Explained
SimGAN Explained
Connor Shorten
17 DC-GAN Explained!
DC-GAN Explained!
Connor Shorten
18 ResNet Explained!
ResNet Explained!
Connor Shorten
19 Graph Convolutional Networks
Graph Convolutional Networks
Connor Shorten
20 Neural Architecture Search
Neural Architecture Search
Connor Shorten
21 Henry AI Labs
Henry AI Labs
Connor Shorten
22 Video Classification with Deep Learning
Video Classification with Deep Learning
Connor Shorten
23 BigGANs in Data Augmentation
BigGANs in Data Augmentation
Connor Shorten
24 Introduction to Deep Learning
Introduction to Deep Learning
Connor Shorten
25 EfficientNet Explained!
EfficientNet Explained!
Connor Shorten
26 Self-Attention GAN
Self-Attention GAN
Connor Shorten
27 Curriculum Learning in Deep Neural Networks
Curriculum Learning in Deep Neural Networks
Connor Shorten
28 Deep Learning Podcast #1 | Edward Dixon | Stochastic Weight Averaging
Deep Learning Podcast #1 | Edward Dixon | Stochastic Weight Averaging
Connor Shorten
Deep Compression
Deep Compression
Connor Shorten
30 Skin Cancer Classification with Deep Learning
Skin Cancer Classification with Deep Learning
Connor Shorten
31 Deep Learning Podcast #2 | Edward Peake | Deep Learning in Medical Imaging
Deep Learning Podcast #2 | Edward Peake | Deep Learning in Medical Imaging
Connor Shorten
32 The Lottery Ticket Hypothesis Explained!
The Lottery Ticket Hypothesis Explained!
Connor Shorten
33 SqueezeNet
SqueezeNet
Connor Shorten
34 GauGAN Explained!
GauGAN Explained!
Connor Shorten
35 AutoML with Hyperband
AutoML with Hyperband
Connor Shorten
36 DL Podcast #3 | Yannic Kilcher | Population-Based Search
DL Podcast #3 | Yannic Kilcher | Population-Based Search
Connor Shorten
37 Weakly Supervised Pretraining
Weakly Supervised Pretraining
Connor Shorten
38 Image Data Augmentation for Deep Learning
Image Data Augmentation for Deep Learning
Connor Shorten
39 Unsupervised Data Augmentation
Unsupervised Data Augmentation
Connor Shorten
40 Wide ResNet Explained!
Wide ResNet Explained!
Connor Shorten
41 RevNet: Backpropagation without Storing Activations
RevNet: Backpropagation without Storing Activations
Connor Shorten
42 GANs with Fewer Labels
GANs with Fewer Labels
Connor Shorten
43 BigBiGAN Unsupervised Learning!
BigBiGAN Unsupervised Learning!
Connor Shorten
44 Self-Supervised Learning
Self-Supervised Learning
Connor Shorten
45 Multi-Task Self-Supervised Learning
Multi-Task Self-Supervised Learning
Connor Shorten
46 Self-Supervised GANs
Self-Supervised GANs
Connor Shorten
47 Population Based Training
Population Based Training
Connor Shorten
48 Show, Attend and Tell
Show, Attend and Tell
Connor Shorten
49 Siamese Neural Networks
Siamese Neural Networks
Connor Shorten
50 WaveGAN Explained!
WaveGAN Explained!
Connor Shorten
51 VAE-GAN Explained!
VAE-GAN Explained!
Connor Shorten
52 Evolution in Neural Architecture Search!
Evolution in Neural Architecture Search!
Connor Shorten
53 AI Research Weekly Update August 18th, 2019
AI Research Weekly Update August 18th, 2019
Connor Shorten
54 Weight Agnostic Neural Networks Explained!
Weight Agnostic Neural Networks Explained!
Connor Shorten
55 AI Research Weekly Update August 25th, 2019
AI Research Weekly Update August 25th, 2019
Connor Shorten
56 Neuroevolution of Augmenting Topologies (NEAT)
Neuroevolution of Augmenting Topologies (NEAT)
Connor Shorten
57 CoDeepNEAT
CoDeepNEAT
Connor Shorten
58 AI Research Weekly Update September 1st, 2019
AI Research Weekly Update September 1st, 2019
Connor Shorten
59 Randomly Wired Neural Networks
Randomly Wired Neural Networks
Connor Shorten
60 Genetic CNN
Genetic CNN
Connor Shorten

This video explains the Deep Compression technique, which reduces the file size of deep neural networks through a pipeline of Pruning, Quantization, and Huffman Encoding. The technique is demonstrated on AlexNet and VGG-16 models, achieving significant compression rates without loss of accuracy.

Key Takeaways
  1. Prune neural network weights to reduce connections
  2. Quantize weights using k-means algorithm
  3. Apply Huffman encoding to compressed weights
  4. Evaluate compression rates and accuracy
💡 The Deep Compression technique can achieve significant compression rates without loss of accuracy by leveraging the structure of neural network weights.

Related AI Lessons

I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics
Medium · AI
ICMI 2026 Reviews [D]
Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances
Reddit r/MachineLearning
Workshop submission for main conference paper under review [D]
Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV
Reddit r/MachineLearning
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it
Reddit r/MachineLearning
Up next
Beyond Big Vendors: ERP Systems Explained #shorts
Digital Transformation with Eric Kimberling
Watch →