What is YOLO algorithm? | Deep Learning Tutorial 31 (Tensorflow, Keras & Python)

codebasics · Beginner ·👁️ Computer Vision ·5y ago

Skills: CV Basics90%Modern CV Models80%

Key Takeaways

The video explains the YOLO algorithm, a state-of-the-art object detection technique in computer vision, using tools like Tensorflow, Keras, and Python. It discusses how YOLO outperformed previous algorithms and its applications in object localization and detection.

Full Transcript

yolo is state of the art object detection algorithm and it is so fast that it has become almost a standard way of detecting objects in the field of computer vision previously people were using sliding window object detection then more faster versions were invented such as rcnn fast rcnn and faster rcnn but in 2015 yolo was invented which outperformed all the previous uh object detection algorithms and that's what we are going to discuss today we will go over the theory on how exactly yolo works and in the future video we will also do coding so this video is just about the theory behind how yolo works and will try to see why it is faster full form of yolo is you only look once let's say you're working on an image classification problem where you want to decide if the image is of a dog or a person in this case the output of neural network is pretty simple you will say dog is equal to one person is equal to zero but when you talk about object localization you're not only telling which class this is you're also telling the bounding box or the position of an object within the image so here in addition to dog is equal to 1 and person is equal to 0 you are also telling about the bounding box now how exactly you do that so in in terms of neural network output you can have a vector like this where pc is the probability of a class so here if there is a dog or a person then this number will be one if there is no dog or no person this number will be zero then the bounding box so bx bi by is the coordinate of the center which is indicated in yellow circle here and 1670 is the width and height of this red box c1 is class one that is for dog so here it will be one c2 is for person and it will be zero if you have a different image like this there is a person here this is my picture in my high school the pc probability of any class is 1 because there is some object and these are like bounding box coordinates and c 1 is 0 because it's not a dog and c 2 is 1 because it's a person and when you have no object in the image the pc will be zero and rest of the values don't matter so now you can train a neural network to uh classify the object as well as the bounding box so you can have i am just showing three images here but you can have less than ten thousand such images and for each of these images since it's a supervised learning problem you need to give the bounding boxes and the way you give bounding boxes to neural network understand neural network only understands numbers so you have to convert this into this kind of vectors so you will have a vector of size 7 for each corresponding image so that will so image is x strain and y train will be a vector of size 7. you can have 10 000 such images you can train a neural network in a in a way that if you input a new image now it will tell you that particular vector and now this vector is telling you that this is a dog because c1 is set to 1 and it is also telling you the bounding box so basically it's essentially giving you the answer for your object detection or object localization rather this only works for a single object if you have a multiple objects what do you do here there is person and a dog in the same image one might say that okay you know in my image there could be n number of object there could be two dogs three people there could be five dogs one person you don't know how many objects are there in the picture so it's hard to determine the dimension of your neural network output if you have one one object um it's it's pretty fixed right but if you have n number of objects and you don't know then determining the size of the output of neural network is hard you can say upper max is 10 let's say there will be only 10 objects and you can have 10 into 7 which is like a 70 size vector but what if there are 11 objects see so that doesn't work so you have to do something else all right so let's say you have this image and there are two bounding boxes that this image has what yolo algorithm will do is it will divide this image into this kind of grid cells so i'm using four by four grid here it could be three by three it could be 19 by 19. there's no fixed rule that it has to be four by four and for each of the grid cells for example this grid cell you can encode or you can come up with that vector that we saw previously which is pc bounding box c1 and c2 there are no objects here so probability of class will be zero and then rest of the values don't matter but for this particular grid cell so i have highlighted here the dog is there in the picture see when dog is expanding to multiple grid cell you try to find the central place of that dog and the dog belongs to that particular grid cell so i'm in this particular cell here and when i look at the coordinates you can think about this per point as a 0 zero and this point has one one coordinate and now you can create this vector where p c is one which means you have some object then c one and c two c one is for dog so it is one c two is per person it is 0 there is person's head here but the person's center is here so this person object belongs to this cell and then 0.05 like this particular distance is 0.05 this is 0.3 because see this whole thing is 1 and then your bounding rectangle can go out of your grid cell it is fine that's why these values are more than one so 1.3 and 1. oh sorry 2 and 1.3 so that is the width so 2 is this width and 1.3 is height so it is this height and now talking about this particular grid cell so there is a person center here so we can say person is in this grid and therefore c2 class value 1 is 1 c1 is 0 because there is no dog and these are like bounding boxes so 0.32 is see 0.32 is this much 0.02 is this this particular height and it is 3 because the rectangle with this yellow line is equal to almost 3 the size of see the width of this grid cell and if you compare this this is three times this that's why i have three here and now you can have uh for remaining all the cells the vector will be this so pc will be zero remaining will be don't care so now you have four by four by seven volume why because you have four by four total grid cells 16 cells each cell is a vector of size seven that's why i'm saying four by four by seven so if you're talking about this top left cell and if you expand it in a z direction that will be this vector of size 7 so i hope you're getting an idea if you don't please pause the video and just think about what i just said so now you have the image and then the bounding rectangles now you can form your training data set so your training data cell will have so many such images let's say i am showing only three for example but you will have 10 000 such images each image will have bounding rectangle and based on that rectangle you will try to derive you will first form this kind of grid 4x4 grid or 3x3 or 19x19 it varies it doesn't have to be four by four and you will come up with the y or a target vector which will be for each cell there will be one vector so there will be 16 such vector per training sample or per training image using this now you can train your neural network and after you have trained it it can do prediction so when you now give this type of image it can produce 16 such vectors and y 16 because this is like 4 by 4 grid which will basically tell you the bounding rectangle for each of these objects so this is the yolo algorithm it is called you only look once because we are not repeating it see we are not doing something like okay we have 16 cells so it's not like we are inputting it 16 times and doing 60 nitration in one forward pass you can make all your prediction that is why it is called you only look once now this is a basic algorithm we need some tweaks because there could be few issues with this approach first issue is the algorithm might detect multiple bounding rectangles for a given object it is possible so how do you tackle that so let's think about this let's say for a person it detected all these two yellow and this one white rectangle and we know by visual observation that this white one is the most accurate one and the algorithm will also throw out the probability it will say this is point nine percent you know the pc the pc class it will say this is point nine percent matching with person and the other rectangles have less probability so maybe we can look at all the probabilities for a person class and take the max right well we cannot do this okay if you just take a max and if there is another person what happens to that you don't know where that person is right so so as a neural network as a computer you don't know so you can't take a max you have to use different approach so we use this concept of iou so iou is basically intersection over union which is you take this rectangle which is 0.9 this is that white rectangle and then for that same class which is person you will take all other rectangles and try to find overlapping area and to find overlapping area you use iou so here in this case see this is that yellow box okay so this is that yellow box here and this is the white box and the area indicated in this orange color is intersection area area indicated in purple colors is union area so you find division of these two and if the objects are overlapping this value will be more so let's say if it is the value is more than 0.6 or 0.7 we can say these rectangles are overlapping if they are completely overlapping the value will be 1 if they are not overlapping at all value will be 0. so now we find that these two yellow boxes are overlapping because their iou is let's say greater than 0.65 and then you discard those rectangles so i discarded all the rectangles which had iou greater than 0.65 and kept the rectangle which has class probability as max okay so this so i do this for a personal object then i do the same thing for a dog object so for dog i find that okay point 81 this is the max probability i find all other rectangles in this image again there could be two more dogs here and there will be rectangles for those also so you will try to find overlap okay so let's see if there is a dog here you will not find overlap so you will not discard that particular rectangle but this rectangle you find it to be overlapping and since point 81 is max point seven is less you discard this and you get final bounding boxes this technique is also called nomex operation so after neural network has detected all the objects you apply no max suppression and you get these unique bonding boxes there could be another issue is what if a single cell contains the center of two objects in this case the dog and the person both are in the middle's middle uh grid cell now we use this vector to represent the grid cell but see this vector can represent only one class so how do you represent two class well i have this value for dog i have this value for person so instead of having a seven dimension vector how about we have a vector of size 14 where you're just concatenating these two vectors okay so this is said to have a basically it has two anchor boxes so this is one anchor box this is second anchor box so here you have two anchor boxes and you can actually have more than two anchor boxes let's say if there are three objects which has the same center then you can have three anchor boxes you can have five anchor boxes but if your grid sales are small enough then in real life it's hard to have you know many objects belonging to one grid cell so now cnn with two anchor boxes will look something like this so instead of a vector of size the only change is now you have a vector of size uh 14 if you if you want to have three anchor boxes you'll have a vector of size 21 7 into three okay and that will give you your final output so that was all about you only look once or yolo algorithm it's a very very fast algorithm even on a video clip which is let's say at 40 frame per second it can detect objects really fast and it is the most modern way of detecting objects so if you are in computer vision fields if you want to do object detection you have to use yellow because it is very fast and accurate in the next video we will be looking at some code we will do a real object detection in image and in video using yolo framework i hope you're liking this series so far if you do give it a thumbs up and share it with your friends thank

Original Description

YOLO (You only look once) is a state of the art object detection algorithm that has become main method of detecting objects in the field of computer vision. Previously people used techniques such as sliding window object detection, R CNN, Fast R CNN and Faster R CNN. But after its invention in 2015, YOLO has become an industry standard for object detection due to its speed and accuracy. In this video we will understand the theory behind how exactly YOLO algorithm works. In next video we will write code to detect objects using YOLO framework. 🔖 Hashtags 🔖 #yoloalgorithm #yolodeeplearning #yoloobjectdetection #yolopython #yoloobjectdetection #yoloopencv Do you want to learn technology from me? Check https://codebasics.io/ for my affordable video courses. Deep learning playlist: https://www.youtube.com/playlist?list=PLeo1K3hjS3uu7CxAacxVndI4bE_o3BDtO Machine learning playlist : https://www.youtube.com/playlist?list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw 🌎 My Website For Video Courses: https://codebasics.io/?utm_source=description&utm_medium=yt&utm_campaign=description&utm_id=description Need help building software or data analytics and AI solutions? My company https://www.atliq.com/ can help. Click on the Contact button on that website. #️⃣ Social Media #️⃣ 🔗 Discord: https://discord.gg/r42Kbuk 📸 Dhaval's Personal Instagram: https://www.instagram.com/dhavalsays/ 📸 Instagram: https://www.instagram.com/codebasicshub/ 🔊 Facebook: https://www.facebook.com/codebasicshub 📱 Twitter: https://twitter.com/codebasicshub 📝 Linkedin: https://www.linkedin.com/company/codebasics/ ❗❗ DISCLAIMER: All opinions expressed in this video are of my own and not that of my employers'.

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from codebasics · codebasics · 0 of 60

← Previous Next →

Python Tutorial - 1. Install python on windows

Python Tutorial - 1. Install python on windows

Python Tutorial - 2. Variables

Python Tutorial - 2. Variables

Python Tutorial - 3. Numbers

Python Tutorial - 3. Numbers

Python Tutorial - 4. Strings

Python Tutorial - 4. Strings

Python Tutorial - 5. Lists

Python Tutorial - 5. Lists

Python Tutorial - 6. Install PyCharm on Windows

Python Tutorial - 6. Install PyCharm on Windows

PyCharm Tutorial - 7. Debug python code using PyCharm

PyCharm Tutorial - 7. Debug python code using PyCharm

Python Tutorial - 8. If Statement

Python Tutorial - 8. If Statement

Python Tutorial - 9. For loop

Python Tutorial - 9. For loop

Python Tutorial - 10. Functions

Python Tutorial - 10. Functions

Python Tutorial - 11. Dictionaries and Tuples

Python Tutorial - 11. Dictionaries and Tuples

Python Tutorial - 12. Modules

Python Tutorial - 12. Modules

Python Tutorial - 13. Reading/Writing Files

Python Tutorial - 13. Reading/Writing Files

How to install Julia on Windows

How to install Julia on Windows

Python Tutorial - 14. Working With JSON

Python Tutorial - 14. Working With JSON

Julia Tutorial - 1. Variables

Julia Tutorial - 1. Variables

Julia Tutorial - 2. Numbers

Julia Tutorial - 2. Numbers

Python Tutorial - 15. if __name__ == "__main__"

Python Tutorial - 15. if __name__ == "__main__"

Julia Tutorial - Why Should I Learn Julia Programming Language

Julia Tutorial - Why Should I Learn Julia Programming Language

Python Tutorial - 16. Exception Handling

Python Tutorial - 16. Exception Handling

Julia Tutorial - 3. Complex and Rational Numbers

Julia Tutorial - 3. Complex and Rational Numbers

Julia Tutorial - 4. Strings

Julia Tutorial - 4. Strings

Python Tutorial - 17. Class and Objects

Python Tutorial - 17. Class and Objects

Julia Tutorial - 5. Functions

Julia Tutorial - 5. Functions

Julia Tutorial - 6. If Statement and Ternary Operator

Julia Tutorial - 6. If Statement and Ternary Operator

Julia Tutorial - 7. For While Loop

Julia Tutorial - 7. For While Loop

Python Tutorial - 18. Inheritance

Python Tutorial - 18. Inheritance

Julia Tutorial - 8. begin and (;) Compound Expressions

Julia Tutorial - 8. begin and (;) Compound Expressions

Python Tutorial - 12.1 - Install Python Module (using pip)

Python Tutorial - 12.1 - Install Python Module (using pip)

Julia Tutorial - 9. Tasks (a.k.a. Generators or Coroutines)

Julia Tutorial - 9. Tasks (a.k.a. Generators or Coroutines)

Julia Tutorial - 10. Exception Handling

Julia Tutorial - 10. Exception Handling

Python Tutorial - 19. Multiple Inheritance

Python Tutorial - 19. Multiple Inheritance

Python Tutorial - 20. Raise Exception And Finally

Python Tutorial - 20. Raise Exception And Finally

Python Tutorial - 21. Iterators

Python Tutorial - 21. Iterators

Python Tutorial - 22. Generators

Python Tutorial - 22. Generators

Python Tutorial - 23. List Set Dict Comprehensions

Python Tutorial - 23. List Set Dict Comprehensions

Python Tutorial - 24. Sets and Frozen Sets

Python Tutorial - 24. Sets and Frozen Sets

Python Tutorial - 25. Command line argument processing using argparse

Python Tutorial - 25. Command line argument processing using argparse

Debugging Tips - What is bug and debugging?

Debugging Tips - What is bug and debugging?

Debugging Tips - Conditional Breakpoint

Debugging Tips - Conditional Breakpoint

Debugging Tips - Watches and Call Stack

Debugging Tips - Watches and Call Stack

Python Tutorial - 26. Multithreading - Introduction

Python Tutorial - 26. Multithreading - Introduction

Git Tutorial 3: How To Install Git

Git Tutorial 3: How To Install Git

Git Tutorial 1: What is git / What is version control system?

Git Tutorial 1: What is git / What is version control system?

Git Tutorial 2 : What is Github? | github tutorial

Git Tutorial 2 : What is Github? | github tutorial

Git Tutorial 4: Basic Commands: add, commit, push

Git Tutorial 4: Basic Commands: add, commit, push

Git Tutorial 5: Undoing/Reverting/Resetting code changes

Git Tutorial 5: Undoing/Reverting/Resetting code changes

Git Tutorial 6: Branches (Create, Merge, Delete a branch)

Git Tutorial 6: Branches (Create, Merge, Delete a branch)

Git Github Tutorial 10: What is Pull Request?

Git Github Tutorial 10: What is Pull Request?

Git Tutorial 7: What is HEAD?

Git Tutorial 7: What is HEAD?

Git Tutorial 9: Diff and Merge using meld

Git Tutorial 9: Diff and Merge using meld

Difference between Multiprocessing and Multithreading

Difference between Multiprocessing and Multithreading

Python Tutorial - 27. Multiprocessing Introduction

Python Tutorial - 27. Multiprocessing Introduction

Python Tutorial - 28. Sharing Data Between Processes Using Array and Value

Python Tutorial - 28. Sharing Data Between Processes Using Array and Value

Git Tutorial 8 - .gitignore file

Git Tutorial 8 - .gitignore file

Python Tutorial - 29. Sharing Data Between Processes Using Multiprocessing Queue

Python Tutorial - 29. Sharing Data Between Processes Using Multiprocessing Queue

Python Tutorial - 30. Multiprocessing Lock

Python Tutorial - 30. Multiprocessing Lock

Python Tutorial - 31. Multiprocessing Pool (Map Reduce)

Python Tutorial - 31. Multiprocessing Pool (Map Reduce)

Python unit testing - pytest introduction

Python unit testing - pytest introduction

The YOLO algorithm is a fast and accurate object detection technique that processes an image only once, making it suitable for real-time applications. It uses a CNN with anchor boxes to detect objects and is a modern way of detecting objects in computer vision fields. By understanding how YOLO works, you can implement it for object detection tasks.

Key Takeaways

Divide an image into a grid of cells
Encode each cell with a vector containing class probability, bounding box coordinates, and object size
Apply non-max suppression to discard overlapping rectangles
Use anchor boxes to represent multiple classes in a single grid cell
Implement YOLO using Tensorflow, Keras, and Python

💡 YOLO's ability to process an image only once makes it a fast and efficient object detection algorithm, suitable for real-time applications.

🔒 Pro feature: Ask AI to explain this lesson →

More on: CV Basics

View skill →

Identify Horses or Humans with TensorFlow and Vertex AI

Building a Dog Breed Identifier App from scratch - DogNet

Building a Dog Breed Identifier App from scratch - DogNet

Aladdin Persson

Apply OpenGL Texturing and Camera Systems

Apply OpenGL Texturing and Camera Systems

Aerial Image Segmentation with PyTorch

Aerial Image Segmentation with PyTorch

How to Install Stable Diffusion - automatic1111

How to Install Stable Diffusion - automatic1111

Sebastian Kamph

NVIDIA RTXGI Unreal Engine 4 Plugin: Introduction and Setup

NVIDIA RTXGI Unreal Engine 4 Plugin: Introduction and Setup

NVIDIA Developer

Related Reads

Building Anime Lip Sync in ComfyUI: A Detection-Guided Diffusion Pipeline

Learn to build an anime lip sync pipeline in ComfyUI using detection-guided diffusion, a technique that combines computer vision and generative models

Membangun MataBakti: Ketika Computer Vision Belajar Menemukan Cacat pada PCB

Learn how to apply computer vision to detect defects on Printed Circuit Boards (PCBs) and improve manufacturing quality

Medium · Deep Learning

The Role of 3D Cuboid Annotation in Autonomous Vehicle Perception

Learn how 3D cuboid annotation enables autonomous vehicles to perceive their environment accurately, and why it's crucial for safe navigation, with steps to apply this knowledge in practice.

Vision AI: Transforming Business Operations with Computer Vision AI

Learn how Vision AI transforms business operations with computer vision, and why it matters for companies to leverage video data

Marketing management for ugc net| Important topics of marketing management ugc net commerce dec 2023

Bhoomi Learning Centre~Dr. Muskan