Vision Transformer(ViT) - Image is worth 16x16 words | Paper Explained

Deep Learning Revision · Beginner ·📄 Research Papers Explained ·2y ago
In this video, we discuss the paper “An image is worth 16x16 words: Transformers for image recognition at scale” which introduced the Vision Transformer(ViT) architecture. We start with the motivation of this remarkable architecture, continue with early transformers for computer vision works, dive deep into the intricate details of ViT architecture, unpack ViT training and finetuning methodologies, and highlight significant developments from recent follow-up papers. Enjoy the video? Show your support with a Like, and don't forget to Subscribe for more insightful discussions. Any feedback, questions, or innovative ideas are always welcome in the comment section below! Slides: https://docs.google.com/presentation/d/1IcXGiKPoEHDVgC7tlNLFlBIkUunq_H46Arv3kUYHh6g/edit?usp=sharing Personal links: - Twitter: https://twitter.com/Jeande_d - LinkedIn: https://www.linkedin.com/in/nyandwi/ - GitHub: https://github.com/Nyandwi - Deep Learning Revision Newsletter: https://deeprevision.substack.com - Personal website: https://nyandwi.com - Complete Machine Learning Package: https://nyandwi.com/machine_learning_complete/ Some links from the video: - ViT paper: https://arxiv.org/abs/2010.11929 - Big vision repo: https://github.com/google-research/big_vision - ViT Pytorch: https://github.com/lucidrains/vit-pytorch - Yann LeCun Tweet on ViT vs CNNs: https://twitter.com/ylecun/status/1481198016266739715 #deeplearning #ai #computervision #transformers
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

The ABCs of reading medical research and review papers these days
Learn to critically evaluate medical research papers by accepting nothing at face value, believing no one blindly, and checking everything
Medium · LLM
#1 DevLog Meta-research: I Got Tired of Tab Chaos While Reading Research Papers.
Learn to manage research paper tabs efficiently and apply meta-research techniques to improve productivity
Dev.to AI
How to Set Up a Karpathy-Style Wiki for Your Research Field
Learn to set up a Karpathy-style wiki for your research field to organize and share knowledge effectively
Medium · AI
The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap
Scientific knowledge may be stuck in a local minimum, hindering optimal progress, and understanding this concept is crucial for advancing research
ArXiv cs.AI
Up next
X Revealed Their Secret Algorithm on Github #algorithm #twitter #tech
Analytics Vidhya
Watch →