ML Interpretability: feature visualization, adversarial example, interp. for language models
In this video, I will be introducing Machine Learning Interpretability, a vast topic that aims at understanding the inner mechanisms of how machine learning models make their predictions, with the aim of debugging them, making them more transparent and trustworthy.
I will start by reviewing deep learning and the back-propagation algorithm, which are necessary for understanding adversarial example generation and feature visualization for computer vision classification models. In the second part, I will show how we can leverage the knowledge built in the first part of the video and apply it to …
Watch on YouTube ↗
(saves to browser)
DeepCamp AI