deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov

DeepLearningAI · Beginner ·📐 ML Fundamentals ·8y ago

Skills: Unsupervised Learning90%LLM Foundations80%ML Maths Basics80%Fine-tuning LLMs70%Supervised Learning70%

Key Takeaways

The video discusses Ruslan Salakhutdinov's contributions to deep learning, including his work on restricted Boltzmann machines and autoencoders, and explores topics such as generative models, supervised learning, and deep reinforcement learning. The conversation also touches on the importance of optimization, understanding how deep learning systems operate, and the exciting frontiers of research in supervised learning and deep reinforcement learning.

Full Transcript

welcome breasts I'm really glad you could join us here today thank you thank you Andrew so you know today you're the director of research at Apple and you also have a faculty and professor Rowan County mother University so I'd love to hear a bit about your personal story how did you end up doing this you know deep learning work that you do yeah it's it's actually does some extent it was I started in deploying to some extent by luck I did my master's degree at Toronto and then I took a year off I was actually working in the financial sector it's a bit surprising and at that time I wasn't quite sure where they want to go for my PhD or not and then something happened something surprising happened I was going to work one morning and I bump into Jack Hinton and Jeff told me hey I have this terrific idea come to my office I'll show you and so we basically work together and he started telling me about you know these Boltzmann machines and contrastive divergence and some of some of the tricks which I didn't at that time quite understand what he was talking about but that really really excited that was very exciting and really excited me and then basically within three months I started my PhD with Jeff so so that was that was kind of like the beginning because that was back in 2005-2006 and this is where you know some of the regional deploying algorithms using restrictive Boltzmann and supervised spec training were kind of popping up and so you know that's that's how I started it was really you know that one particular morning when I bumped into Jeff completely changed my my future career moving forward and then in fact you were a co-author on you know one of the very early papers on restricted Boltzmann machines there really helped with this resurgence of neural networks and deep learning tell me a bit more what that was like you're working on that seven oh yeah this was this was actually a really this was exciting year I was a first year it was my first year as a PhD student and Jeff and I we're trying to explore these ideas of using restricted Boltzmann's and and using pre-training tricks to train multiple layers and specifically we will try to focus on autoencoders you know how do we do an only an extension of PCA effectively and it was very exciting because we've got these systems to work on em these digits which was exciting but then the next steps for us were to really see whether we can extend these models to dealing with phases so remember we had this automated phases data set and then we started looking at can we do compression for document so we started looking in all these different data you know real-valued count binary and throughout you know a year it was I was a first-year PhD students it was a big learning experience for me but and really within six or seven months we were able to get really interesting results and really good results something that we you know we were able to train these very deep autoencoders this is something that you couldn't do at that time using sort of traditional optimization techniques and then it's you know it turns out it's a really really exciting paper for us that was that was super exciting year because it was a lot of learning for me but at the same time the results turn out to be you know really really impressive for what we were trying to do so in the early days of this resurgence of deep learning or a lot of the activity was centered on restricted Boltzmann machines and then people see machines as a there's still a lot of exciting research they're being done including some in your group but what's happening with both machines yeah that's it that's a very good question I think that in the early days the way that we were using restricted Boltzmann machines is you sort of can imagine training a stack of these restricted both machines that would allow you to learn effectively one layer at a time and there's a good theory behind you know when you add a particular layer it improves the variation bound and so forth under certain conditions so there was a theoretical justification and these models were working well in terms of being able to pre-trained these systems and then around 2009/2010 once the computer started showing up you know GPUs then a lot of us started realizing that actually directly optimizing these deep neural networks was you know was giving similar results or even better results so just standard back problems out the pre-training or restricted Boltzmann machine that's right that's right and that's sort of over you know three or four years and it was exciting to the whole community because people thought that wow you can actually train these deep models using these pre training mechanisms and then you know with more compute people start realizing that you can just basically do standard back propagation something that we couldn't do back in 2005 or you know 2004 because it would take us months to do it on CPUs and so that was that was a big change the other thing that I think that we haven't really figured out what to do with you know both machines and deep Boltzmann machines I believe they're very powerful models because you can think of them as generative models you know they try to model complex distributions in the data but when we start looking at learning algorithms learning algorithms right now they require using you know Markov chain Monte Carlo in variational learning and such which is not a scalable as back propagation algorithm so so we get have to figure out more efficient ways of training these models and also the use of convolution it's something that's fairly difficult to integrate into these models I remember some of your work on on using provost ik max pooling for sort of building these generative models of different objects and using these ideas of convolution was also very very exciting but at the same time it's still extremely hard to train these models so it's unlikely Israel yes how much these work right and so we still have to figure out water I on the on the other side some of the recent work using variational encoders for example which could be viewed as directed versions of Boltzmann machines we have figured out a ways of of training these models was a work by Maxwell and in there there Kingma on using you know we pair with relation tricks and now we can use back propagation algorithm within the stochastic system which is which is driving a lot of progress right now but we haven't quite figured out how to do that in in the case of Boltzmann machine so so that's a very interesting perspective I actually wasn't aware of which was in an earlier era where computers were slower that the RPM you know the pre-training was really important as only fast the computation that that drove switching to standing back from you know in terms of the evolution of the community is thinking in deep learning another topic I know you spent a lot of time thinking about this the generative unsupervised versus supervised approaches do share bit about how you're thinking about that has evolved over time yeah I think that's a that's a really I feel like it's a very important topic particularly if we think about unsupervised or semi-supervised or generative generative models because to some extent a lot of successes that we've seen there recently is due to supervised learning and back in the early days unsupervised learning was was primarily viewed as unsupervised pre training because we didn't know how to train these multi-layer systems and even today if you're working in a settings where you have lots and lots of unlabeled data and a small fraction of labeled examples you know these unsupervised pre training models so building these generative models can help you know for for supervised die so I think that a lot of us in the community you know it's kind of less it was the belief when I started doing my PhD was all about generative models and try to learn these stacks of ball because that was the only way for us to train these systems today there is a lot of work right now on generative modeling you know if you look at generative adversarial Network if you look at variation within quarters the energy models is something that my lab is working on right now as well I think it's it's very exciting research but we haven't perhaps we haven't quite figured it out again for many of you who are thinking about getting in the deploying field this is one area that's I think we you know will make a lot of progress and hopefully in the near future so unsupervised early unsupervised learning right head laying oh maybe you can think of it as unsupervised learning or semi-supervised learning where you have I give you some hints or some examples of what what different things mean and I throw you lots and lots of unlabeled data so you know thank you very important insight that in an earlier era of deep learning where computers just slower the restricted Boltzmann machine and deep Boltzmann stream that was needed for initializing the neural network weights but as computers got faster straight backprop then start to work much better so you know one of the topic that I know you've spent a lot of time thinking about is the supervised learning versus generative models unsupervised learning approaches so how has your tell me a bit about how you're thinking on that debate has evolved over time I think that we all believe that we should be able to to make progress there it's just it's just you know you know all the work on Boltzmann machines variational t encoders yes you can think a lot of these models as generative models but we haven't quite figured out you know how to you know really make them work and how can you make use of logic almost and even if even for I see a lot of an IT sector you know companies have lots and lots of data lots of unlabeled data there's a lots of efforts for going through annotations because that's the only way for us to to make progress right now and it seems like you know we should be able to make use of unlabeled data because it's you know it's just abundance of it and and we haven't quite figured out how to do yet so you mentioned for people wanting to enter deep learning research you know unsupervised learning the exciting area today there are a lot of people wanting to enter a deep learning either research or applied work so for this global community either researcher of my work what advice would you have yes I think that one of one of the key advisors I think I should give is people entering that field I would encourage them to just try different things and not be afraid to try new things and not be afraid to try to innovate I can give you one example which is when I was a graduate student you know we were looking at neural nets and he's a highly non convex systems that are hard to optimize and I remember talking to my friends with in the optimization community and the feedback was always that well there is no way you can solve these problems because these are non convex we don't understand optimization how could you ever even do that you know compared to doing comics optimization and it was surprising because in our lab you know we never really cared that much about those specific problems we just were thinking about how can we optimize and whether we can get interesting results and that effectively was driving the community so we're not were you know we were we were not scared maybe to some extent because we didn't maybe because we were lacking actually the theory behind optimization but but I would encourage people to just try and not be afraid to try to tackle hard problems yeah and I remember you once said don't learn to code just into high level you know deep learning frameworks but actually understand yes that's right I think that bolon it's one of the things that I try to do it when I teach you deep learning class is is is one of the for one of the homeworks I'm asking people to actually code backpropagation algorithm for convolutional neural networks and it's you know it's painful but but at the same time if you do it once you really understand how these systems operate and how they work and how you can efficiently implement them on on on GPU and I think it's it's important for you too when you go into research or industry you have a really good understanding of what these systems are doing so it's it's important I think you know since you have both academic experience that's professor and corporate experience I'm curious if someone's sensitive learning what are their pros and cons of doing a PhD versus joining a company yeah I think that's that's actually a very good question in my particular lab I have a mix of students some students want to go and take an academic route some students want to go and take an industry route and it's it's becoming very challenging because you can do amazing research in industry and and you can also do amazing research in academia but in terms of pros and cons in academia I feel like you have more freedom to work on long term problems or if you think about some crazy problem you can work on it so you have a little bit more more freedom at the same time the research that you're doing industry is also very exciting because in many cases you can with your research you can impact millions of users if you develop you know a core AI technology and and obviously within the industry you have much more resources in terms of compute and be able to you know do really amazing things so there are pluses and minuses that it really depends on on what you want to do and right now it's interesting very interesting environment where academics move to industry and then you know focus on industry move to academia but not as much and so it's it's you know it's it's it's a very it's very exciting times it sounds like your academic machine learning is great and corporate machine learning is great and the most important thing is just jumping right either one just jump in so it really depends on your own your preferences because you can do amazing research in either place so you've mentioned supervised learning as one exciting frontier for research are there other areas that you consider exciting frontiers for research yeah absolutely I think that what I see now community right now in particularly deep learning community is there are a few trends one particular area that I think is really exciting is the area of deep reinforcement learning because we were able to figure out how we can train Ages in virtual worlds and this is something that's in just the last couple of years you see a lot a lot of progress of how can we scale these systems how can we develop new algorithms how can we get ages to communicate to each other with each other and and it's I think that that area is and generally the the settings where you're interacting with the environment is super exciting the other area that I think is really exciting as well is the area of reasoning and natural language understanding so can we build dialogue based systems can we build systems that can reason that can read text and be able to you know answer questions intelligently I think this is something that a lot of research is is focusing on right now and then there's not a sort of sub-aerial so is this area of being able to learn from fewer examples so typically you know people think of it as one short learning or transfer learning a setting where you know you you learn something about the world and I throw you a new task at you and you can solve this task very quickly much like humans do without requiring lots and lots of labeled examples and so this is something that's a lot of us in the community are trying to figure out how we can how we can do that and how can we have come closer to human-like human-like learning abilities Thank You Russ for sharing all the comments and inside so there's especially if you say hearing the story of your early days dude thanks Andrea yeah thanks for having me

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from DeepLearningAI · DeepLearningAI · 3 of 60

← Previous Next →

Forward and Backward Propagation (C1W4L06)

Forward and Backward Propagation (C1W4L06)

deeplearning.ai's Heroes of Deep Learning: Yuanqing Lin

deeplearning.ai's Heroes of Deep Learning: Yuanqing Lin

deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov

deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov

deeplearning.ai's Heroes of Deep Learning: Yoshua Bengio

deeplearning.ai's Heroes of Deep Learning: Yoshua Bengio

deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel

deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel

deeplearning.ai's Heroes of Deep Learning: Ian Goodfellow

deeplearning.ai's Heroes of Deep Learning: Ian Goodfellow

deeplearning.ai's Heroes of Deep Learning: Andrej Karpathy

deeplearning.ai's Heroes of Deep Learning: Andrej Karpathy

Using an Appropriate Scale (C2W3L02)

Using an Appropriate Scale (C2W3L02)

Gradient Checking (C2W1L13)

Gradient Checking (C2W1L13)

Gradient Checking Implementation Notes (C2W1L14)

Gradient Checking Implementation Notes (C2W1L14)

Learning Rate Decay (C2W2L09)

Learning Rate Decay (C2W2L09)

Understanding Mini-Batch Gradient Dexcent (C2W2L02)

Understanding Mini-Batch Gradient Dexcent (C2W2L02)

Mini Batch Gradient Descent (C2W2L01)

Mini Batch Gradient Descent (C2W2L01)

The Problem of Local Optima (C2W3L10)

The Problem of Local Optima (C2W3L10)

Exponentially Weighted Averages (C2W2L03)

Exponentially Weighted Averages (C2W2L03)

Tuning Process (C2W3L01)

Tuning Process (C2W3L01)

Understanding Exponentially Weighted Averages (C2W2L04)

Understanding Exponentially Weighted Averages (C2W2L04)

Bias Correction of Exponentially Weighted Averages (C2W2L05)

Bias Correction of Exponentially Weighted Averages (C2W2L05)

Gradient Descent With Momentum (C2W2L06)

Gradient Descent With Momentum (C2W2L06)

Normalizing Activations in a Network (C2W3L04)

Normalizing Activations in a Network (C2W3L04)

Hyperparameter Tuning in Practice (C2W3L03)

Hyperparameter Tuning in Practice (C2W3L03)

Adam Optimization Algorithm (C2W2L08)

Adam Optimization Algorithm (C2W2L08)

RMSProp (C2W2L07)

RMSProp (C2W2L07)

Fitting Batch Norm Into Neural Networks (C2W3L05)

Fitting Batch Norm Into Neural Networks (C2W3L05)

Why Does Batch Norm Work? (C2W3L06)

Why Does Batch Norm Work? (C2W3L06)

Batch Norm At Test Time (C2W3L07)

Batch Norm At Test Time (C2W3L07)

Softmax Regression (C2W3L08)

Softmax Regression (C2W3L08)

Deep Learning Frameworks (C2W3L10)

Deep Learning Frameworks (C2W3L10)

Neural Network Overview (C1W3L01)

Neural Network Overview (C1W3L01)

Training Softmax Classifier (C2W3L09)

Training Softmax Classifier (C2W3L09)

Why Deep Representations? (C1W4L04)

Why Deep Representations? (C1W4L04)

Gradient Descent For Neural Networks (C1W3L09)

Gradient Descent For Neural Networks (C1W3L09)

Neural Network Representations (C1W3L02)

Neural Network Representations (C1W3L02)

TensorFlow (C2W3L11)

TensorFlow (C2W3L11)

Activation Functions (C1W3L06)

Activation Functions (C1W3L06)

Explanation For Vectorized Implementation (C1W3L05)

Explanation For Vectorized Implementation (C1W3L05)

Getting Matrix Dimensions Right (C1W4L03)

Getting Matrix Dimensions Right (C1W4L03)

Understanding Dropout (C2W1L07)

Understanding Dropout (C2W1L07)

Building Blocks of a Deep Neural Network (C1W4L05)

Building Blocks of a Deep Neural Network (C1W4L05)

Why Non-linear Activation Functions (C1W3L07)

Why Non-linear Activation Functions (C1W3L07)

Computing Neural Network Output (C1W3L03)

Computing Neural Network Output (C1W3L03)

Backpropagation Intuition (C1W3L10)

Backpropagation Intuition (C1W3L10)

Train/Dev/Test Sets (C2W1L01)

Train/Dev/Test Sets (C2W1L01)

Deep L-Layer Neural Network (C1W4L01)

Deep L-Layer Neural Network (C1W4L01)

Random Initialization (C1W3L11)

Random Initialization (C1W3L11)

Other Regularization Methods (C2W1L08)

Other Regularization Methods (C2W1L08)

Normalizing Inputs (C2W1L09)

Normalizing Inputs (C2W1L09)

Derivatives Of Activation Functions (C1W3L08)

Derivatives Of Activation Functions (C1W3L08)

Parameters vs Hyperparameters (C1W4L07)

Parameters vs Hyperparameters (C1W4L07)

Vectorizing Across Multiple Examples (C1W3L04)

Vectorizing Across Multiple Examples (C1W3L04)

What does this have to do with the brain? (C1W4L08)

What does this have to do with the brain? (C1W4L08)

Dropout Regularization (C2W1L06)

Dropout Regularization (C2W1L06)

Vanishing/Exploding Gradients (C2W1L10)

Vanishing/Exploding Gradients (C2W1L10)

Basic Recipe for Machine Learning (C2W1L03)

Basic Recipe for Machine Learning (C2W1L03)

Bias/Variance (C2W1L02)

Bias/Variance (C2W1L02)

Forward Propagation in a Deep Network (C1W4L02)

Forward Propagation in a Deep Network (C1W4L02)

Weight Initialization in a Deep Network (C2W1L11)

Weight Initialization in a Deep Network (C2W1L11)

Numerical Approximations of Gradients (C2W1L12)

Numerical Approximations of Gradients (C2W1L12)

Regularization (C2W1L04)

Regularization (C2W1L04)

Why Regularization Reduces Overfitting (C2W1L05)

Why Regularization Reduces Overfitting (C2W1L05)

This video discusses the contributions of Ruslan Salakhutdinov to deep learning, including his work on restricted Boltzmann machines and autoencoders. The conversation also explores topics such as generative models, supervised learning, and deep reinforcement learning. Viewers can learn about the importance of optimization, understanding how deep learning systems operate, and the exciting frontiers of research in supervised learning and deep reinforcement learning.

Key Takeaways

Train a stack of restricted Boltzmann machines to learn one layer at a time
Use pre-training tricks to initialize neural network weights
Apply backpropagation to train deep neural networks
Use variational encoders as directed versions of Boltzmann machines
Code the backpropagation algorithm for convolutional neural networks

💡 Generative models can help with supervised learning by pre-training on unlabeled data, and understanding how deep learning systems operate is important for research or industry.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Unsupervised Learning

View skill →

How to implement K-Means from scratch with Python

How to implement K-Means from scratch with Python

K-Means Clustering - The Math of Intelligence (Week 3)

K-Means Clustering - The Math of Intelligence (Week 3)

Mean Shift with Titanic Dataset - Practical Machine Learning Tutorial with Python p.40

Mean Shift with Titanic Dataset - Practical Machine Learning Tutorial with Python p.40

Self-/Unsupervised GNN Training

Self-/Unsupervised GNN Training

Statistical Learning: 12.R.3 Hierarchical Clustering

Statistical Learning: 12.R.3 Hierarchical Clustering

Stanford Online

Clustering with DBSCAN, Clearly Explained!!!

Clustering with DBSCAN, Clearly Explained!!!

StatQuest with Josh Starmer

Related Reads

What Is MLIR and Why Does It Exist?

Learn about MLIR, a intermediate representation for machine learning models, and its purpose in optimizing ML workflows

Dev.to · Fedor Nikolaev

Why Choosing the Right Machine Learning Development Company Matters More Than the AI Model

Choosing the right machine learning development company is crucial for turning AI investments into measurable results, as it can make or break the success of AI projects

Medium · Machine Learning

Data privacy in AI training: federated learning, differential privacy, and synthetic data

Learn how federated learning, differential privacy, and synthetic data preserve data privacy in AI training, and why they matter for secure machine learning

Data Preprocessing: Encoding and Feature Scaling in Machine Learning

Learn to preprocess data by encoding and scaling features for better machine learning model performance

Medium · Machine Learning

Is Python Dead in 2026?| Truth About Python in AI Era | 90 Days Roadmap @FameWorldEducationalHub

FAME WORLD EDUCATIONAL HUB