PATCH EMBEDDING | Vision Transformers explained
About this lesson
I will cover Vision transformer in three parts. The first part which is this video focusses on patch embedding in vision transformer. I will go over all the details and explain everything happening inside the patch embedding in VIT in detail. I will also go over how an implementation of patch embedding for vision transformer in Pytorch would look like. The second part which goes through attention can be found here - Attention in Vision Transformer (Part Two) - https://www.youtube.com/watch?v=zT_el_cjiJw The third part which builds entire transformer and shows how to visualize attention maps and positional embeddings can be found below - Implementing Vision Transformer (Part Three) - https://www.youtube.com/watch?v=G6_IA5vKXRI *Timestamps* : 00:00 Intro 00:56 Need for Patch Embedding in Vision Transformer 01:30 Converting Image into Sequence of Patches 01:59 Patch Embedding Projection 02:45 Positional Information for Patches 03:40 CLS Token 04:10 Patch Embedding Responsibilities 04:40 Patch Embedding Module Implementation 08:02 Outro *Paper Link* - https://tinyurl.com/exai-vit-paper Implementation will be pushed here after all three videos are out - https://tinyurl.com/exai-vit-code *Subscribe* - https://tinyurl.com/exai-channel-link Background Track - Fruits of Life by Jimena Contreras Email - explainingai.official@gmail.com
DeepCamp AI