DETR Explained | End-to-End Object Detection with Transformers | DETR Tutorial Part 1

ExplainingAI · Beginner ·👁️ Computer Vision ·1y ago
This tutorial video covers DETR, end to end object detection with transformers. DETR transforms object detection into a direct set prediction problem. There are no anchors, no need for NMS, just elegant transformers. In this video which is Part I of two part video, we go deep into DETR by Facebook AI, understanding how it replaces traditional object detection pipelines with a transformer-based architecture for end-to-end object detection. The video will go over DETR model, its architecture breakdown, how it removes the need for NMS and anchor boxes, Hungarian matching and loss used to train it. The goal of this DETR tutorial is to break down everything thats relevant in the paper, explain the DETR model architecture, and by the end give clarity on how transformers are used for end to end object detection in DETR. In the next part using the architecture and loss of DETR that we go over in this part I video, we will be implementing and training it on voc dataset. ⏱️ Timestamps: 00:00 DETR : End-to-end object detection with transformers 00:51 High Level Overview of DETR Architecture 13:10 Backbone of Detection Transformer 14:35 Detr Transformer Encoder 19:07 Detr Transformer Decoder 26:00 Hungarian matching for Detr Object Detection 38:04 Matching Strategy and Cost for Detr explained 42:57 DETR(Detection transformer) Loss Explained 45:27 Auxiliary Loss in DETR 46:58 DETR Video’s Part I and Part II Outline 📖 Resources: Detr Paper - https://tinyurl.com/exai-detr-paper Hungarian Matching Notes - https://econweb.ucsd.edu/~jsobel/172aw02/notes8.pdf Vision Transformer Videos Patch Embedding Video - https://www.youtube.com/watch?v=lBicvB4iyYU Attention Video - https://www.youtube.com/watch?v=zT_el_cjiJw Transformer Module Implementation Video - https://www.youtube.com/watch?v=G6_IA5vKXRI Cross Attention Segment from Stable Diffusion Video - https://www.youtube.com/watch?v=hEJjg7VUA8g&t=2096s Generalized IOU Segment from YOLOv4 Video - https://youtu.be/b148nt9P8J
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Inside SAM 3D: how Meta turns a single image into 3D
Learn how Meta's SAM 3D technology turns a single image into 3D, revolutionizing the field of computer vision
Medium · Machine Learning
Inside SAM 3D: how Meta turns a single image into 3D
Learn how Meta's SAM 3D technology generates 3D models from single images, revolutionizing the field of computer vision
Medium · Deep Learning
Demystifying CNNs: How Convolutional Filters and Max-Pooling Actually Work
Learn how Convolutional Neural Networks (CNNs) use convolutional filters and max-pooling to recognize images
Medium · Data Science
Your "Biometric Age Check" Isn't Verifying Identity — And Defense Lawyers Know It
Biometric age checks don't verify identity, a crucial distinction for developers in computer vision and biometrics
Dev.to AI

Chapters (10)

DETR : End-to-end object detection with transformers
0:51 High Level Overview of DETR Architecture
13:10 Backbone of Detection Transformer
14:35 Detr Transformer Encoder
19:07 Detr Transformer Decoder
26:00 Hungarian matching for Detr Object Detection
38:04 Matching Strategy and Cost for Detr explained
42:57 DETR(Detection transformer) Loss Explained
45:27 Auxiliary Loss in DETR
46:58 DETR Video’s Part I and Part II Outline
Up next
How Transformers Finally Ate Vision – Isaac Robinson, Roboflow
AI Engineer
Watch →