Variants of ViT: DeiT and T2T-ViT

Machine Learning Studio · Advanced ·🧠 Large Language Models ·2y ago
As you recall from our previous video on ViT, the original ViT needs lots of training data such as JFT-300M. But, if we use a mid-size dataset like ImageNet-1k, the performance of ViT is lower than that of CNNs. In this video, we cover two ViT variants called DeiT (Data Efficient Image Transformers) and Tokens-to-Token ViT (T2T-ViT). Both these models have been able to design vision transformers that can be trained on ImageNet while achieving higher performance than CNNs.
Watch on YouTube ↗ (saves to browser)
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Next Up
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)