Variants of ViT: DeiT and T2T-ViT

Name: Variants of ViT: DeiT and T2T-ViT
Uploaded: 2024-03-17T22:54:07+00:00
Channel: Machine Learning Studio
Description: As you recall from our previous video on ViT, the original ViT needs lots of training data such as JFT-300M. But, if we use a mid-size dataset like Imag...

Machine Learning Studio · Advanced ·🧠 Large Language Models ·2y ago

As you recall from our previous video on ViT, the original ViT needs lots of training data such as JFT-300M. But, if we use a mid-size dataset like ImageNet-1k, the performance of ViT is lower than that of CNNs. In this video, we cover two ViT variants called DeiT (Data Efficient Image Transformers) and Tokens-to-Token ViT (T2T-ViT). Both these models have been able to design vision transformers that can be trained on ImageNet while achieving higher performance than CNNs.

Watch on YouTube ↗ (saves to browser)

Next Up

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)