Region Proposals - Explained!

CodeEmporium · Advanced ·🔢 Mathematical Foundations ·10mo ago

Key Takeaways

The video explains region proposals in object detection, including their purpose, benefits, and implementation using OpenCV, highlighting computation efficiency and hierarchical image nature.

Full Transcript

Greetings fellow learners. In this video we are going to talk about region proposals. The what, the why, and the how. So what are region proposals? They are essentially candidate bounding boxes where an object of interest may be present. So each one of these boxes is a region proposal. Now why region proposals? They were introduced in object detection around the 2010s for the following reasons. One is computation efficiency compared to traditional sliding window methods. And two is that they exploited the hierarchical nature of images for better performance than traditional methods. Now we're going to discuss both of these in a little bit more detail. Starting with computation efficiency. So back in the 90s and early 2000s, object detection was done with sliding window approaches and it kind of works something like this. You take an image and then you have like a fixed size window indicated by this red box. And for this red box we are going to extract certain features using at the time maybe in the '90s it was wavelets early 2000s it was like the histogram of gradients just different feature extraction techniques to create this little vector. This could be like 50 floatingoint numbers for example to represent this square over here. Then we pass this into a trained SVM classifier. This is going to be a binary classifier. In this case, let's say it just tries to recognize a tiger and it'll output some probability number. So probability in this case would be that this window contains a tiger. And then what we would do is slide this window by like a pixel or two. And then repeat the process of feature extraction. Maybe feature selection in the middle over here. and then SVM classification. And then keep doing this, sliding the window pixel by pixel throughout the image until the entire image is covered. And then we do this for images of even different scales too to make sure that we're picking up objects of different sizes, some closer to the camera and some away from the camera. And then after that, we have like now like hundreds of thousands of these bounding boxes. What we do is that we iterate over the bounding boxes in the descending order of their SVM prediction and then remove the bounding boxes which have a very high overlap. And so you'll end up with probably one bounding box ideally per object and that's how object detection is like traditionally done. Now an issue with this approach is that we have this very expensive like feature extraction SVM classification step. These are done for many windows on many scales. So hundreds of thousands of times potentially and so it could become quite expensive. Now what we can do in order to mitigate this is region proposals on different scales instead. So what we do at a very high level is that we have an image and we create just 2,000 bounding boxes in total. Each of these 2,000 bonding boxes are region proposals for which we extract features. So we'll have like 2,000 of these features. We'll then p pipe them into an SVM classifier in order to detect if it's like there's a tiger or any object of interest in that window. And so this entire step or sequence of steps is going to be executed a fixed number of times only regardless of the image size. And hence it scales quite efficiently. And this is especially true as we get into like the 2010s where we have much higher quality images in data sets. So we have like larger number of pixels and so the sliding window approach just becomes more and more inefficient and expensive. So another reason why we use region proposals is for a better performance. So consider this image over here. It's actually a pretty straightforward image for us to detect objects. So it's easy to detect a cat. But this image here, it's not super straightforward. We have like well if we want to detect like what is a table, the bounding box would probably be around this entire table even with all the contents on it. But this table consists of also a bowl which we would draw a bounding box like this and then a salad which would be like this within which we have like spoons. So it's very clear from this image too that in general images are intrinsically hierarchical and region proposals can exploit this fact and hence they yield better performance. Now how we actually do this is using a technique well one of the many techniques called selective search. So selective search is a reason proposal technique that exploits this hierarchical nature of images and it combines the effects of exhaustive search along with image segmentation to come up with region proposals efficiently. So exhaustive search by the way is like a sliding window approach and image segmentation is the grouping of related or similar pixels together. Now let's take a look at the algorithm for selective search for creating region proposals. So first the input here is going to be an image. The output is going to be 2,00 bounding box region proposals. So for this step one we generate initial pixel regions R with the Felgian swab segmentation setting K to 50. So, Felian swap segmentation is a graphbased segmentation that is done as a pre-processing step to object detection typically. And this K will determine how easy it is to combine regions of these segments together. Larger K means that larger segments will be there in the final segmentation. So, to illustrate what this actually looks like, it's going to look something like this. So this demo is essentially going to call the the the segmentation algorithm where we pass in this image and let's just say this K value is going to be 50. So with an input image, you can create all of these segments. Now if you increase this K value, you see that the the size of these segments has now you know increased the each of these color little plages are a segment. And now if you increase K to 150, they increase further still. You increase K to 300, these segments again increase further still. And so the idea is that we can create these segments quite quickly uh for any input image. And so we start with this first one right now with K is equal to 50. Now, for more information on this, I've coded this out entirely and you can refer a previous video for this or like a GitHub in down in the description below. Now, once we generate this initial segmentation, we're going to initialize a similarity set to be like an empty set. And then what we do is we determine the similarity between every adjacent region R and RJ and then add it to S. And by similarity, we can just like give a score to like how similar two two adjacent regions are in color or in texture. And what I mean specifically to is that let's say like you know this region, this red region over here and all the neighboring regions over here, we're going to compute like the similarity score and we're going to add it to our set. So you can imagine there's like thousands of these similarity scores, especially for very like small case where the segments are very fine. Next what we do is step four where we iterate over for you know all the the region you know similarities that are in this set we'll take the ones the two regions with the highest similarity score and we will now merge them together. So if they're R and RJ they will merge to form this new region RT. And because it's a new region now, we can remove all of the old similarity scores attached to the old regions. And then we calculate the new similarity scores for region RT with its surrounding regions. And so S now has a slightly less number of uh similarities here. And we keep doing this until S will eventually become an empty set. And this will merge all segments until they are one region. And in order to kind of visualize that, you can kind of see right here where let's say this is iteration one. This is the initial thousand swab uh segmentation algorithm that we initially ran. And after many iterations of this grouping and merging of regions, you can kind of see how you know in subsequent iterations you have like the number of the regions themselves or the segments become bigger until over time you can kind of see that eventually the there's only like all of the segments are constituted in one region. So we now have a hierarchy of segments that we just created. And then what we can do is we can repeat it for different values of K. So this is K is equal to 100. We will have now an initialization that looks a little bit different from the previous case. And we again try to create a hierarchy of merging these segments together until we get a single segment over here. We do the same for K is equal to 150 as well where we start like this and then continuously merge these segments. And then we also do it for K is equal to 302 where we start like this and then continuously merge until we have a single segment that looks like this. So now we have like four hierarchies al together and this is exactly what is in step five where we repeated it for you know K is equal to 50, 100, 150 and 300. Next, what we'll do is we'll draw bounding boxes around each region of each iteration and each value of K. So there's tens of hundreds of thousands of these bounding boxes as each segment now has a little bounding box. Then we sort these bounding boxes such that the largest bounding boxes first and then we will iteratively remove bounding boxes that overlap with these largest bounding boxes. And here we will only keep like the largest 2,000 of these bounding boxes at the end to have 2,00 region proposals. And effectively what that kind of looks like is this image over here where we have 2,000 of these each of these like red boxes is a region proposal and that's all we have. We have 2,000 of these region proposals and this can be used for further processing in object detection. Now, one thing that's important to note here is that like intuitively you can imagine that this is actually a very uh there's a lot of these segmentations that we're running and it it seems like a very long process and honestly it still is kind of a long process. So this is like a 1500 cross 1200 image and this took 15 minutes to run but you know we can actually speed this up and many speedups have been proposed over time. One is to downscale the input image. The second is to prune the bounding boxes initially by ensuring that they are some sane range of aspect ratios. So we don't want like bounding boxes to be like very tall single pixel slivers because that doesn't really make sense. So we are just like calling them all together so that we don't need to create bounding boxes and sort them all and there's no need to process it. Next is to reduce the number of segmentation hierarchies by reducing the number of Ks. So instead of taking 50, 150, 300 and all that, you can take like probably a less number of Ks. The next we can do is also use some built-in OpenCV libraries that might have simpler regionbased similarity calculations. So if we're calculating like similarities when we're trying to merge different regions, it might be better to have like use more quickly computable heristics which is done in many of these like OpenCV built-in sources. So I hope all of this and how we create these bounding boxes makes sense. Quiz time. Have you been paying attention? Let's quiz you to find out. Why use region proposals versus the sliding window approach for object detection? A. Region proposals reduce the number of candidate regions. B. Region proposals focus computation on likely object areas, making detection faster. C. Region proposals discard unlikely background windows before classification. Or D. Sliding windows are more efficient because they evaluate fewer regions than proposals. Note that multiple options may be correct. And I will give you a few seconds to answer this question. The correct options are A, B, and C. Did you get them right? Comment your reasoning down below and let's have a discussion. And at this point, if you do think I deserve it, please do consider giving this video a like because it will help me out a lot. Now, that's going to do it for quiz time. And before we go, let's generate a summary. So in this video we looked at what are region proposals and we defined them as candidate bounding boxes where an object of interest may be present. Then we discussed why we use region proposals especially in like the 2010s and it's because of computation efficiency as well as improved performance. And specifically for the current algorithm that we took we exploit the hierarchical nature of images. And then we took a look at an algorithm for selective segmentation which is an algorithm for creating these region proposals. And we also saw some fun images to show the results of the final process along with some caveats and how we can make this much faster. And that's all that we have for today. And I'm going to link some resources to like other videos that kind of talk about the Felgian swab segmentation and some other topics in general down in the description below. So please do check that out. I'm also going to release the code for this too on GitHub in the description once more. And like I mentioned before, if you think I deserve it, please do consider giving this video a like and I will see you in the next one. Bye-bye.

Original Description

Let's understand region proposals: what it is, why we do it and how we can achieve this with some code. ABOUT ME ⭕ Subscribe: https://www.youtube.com/c/CodeEmporium?sub_confirmation=1 📚 Medium Blog: https://medium.com/@dataemporium 💻 Github: https://github.com/ajhalthor 👔 LinkedIn: https://www.linkedin.com/in/ajay-halthor-477974bb/ RESOURCES [1 📚] Slides used in the video: https://link.excalidraw.com/p/readonly/ckRLqQFeGtBgULeCEFHQ [2 📚] Main paper for selective search : https://www.researchgate.net/publication/262270555_Selective_Search_for_Object_Recognition [3 📚] Code for the video: https://github.com/ajhalthor/computer-vision-101/blob/main/selective_search.ipynb [4 📚] My video on graph based image segmentation used for pre-processing: https://youtu.be/sSx5Qujq0Fs?si=AEOh9wvE-rGxvoat PLAYLISTS FROM MY CHANNEL ⭕ Reinforcement Learning: https://youtube.com/playlist?list=PLTl9hO2Oobd9kS--NgVz0EPNyEmygV1Ha&si=AuThDZJwG19cgTA8 Natural Language Processing: https://youtube.com/playlist?list=PLTl9hO2Oobd_bzXUpzKMKA3liq2kj6LfE&si=LsVy8RDPu8jeO-cc ⭕ Transformers from Scratch: https://youtube.com/playlist?list=PLTl9hO2Oobd_bzXUpzKMKA3liq2kj6LfE ⭕ ChatGPT Playlist: https://youtube.com/playlist?list=PLTl9hO2Oobd9coYT6XsTraTBo4pL1j4HJ ⭕ Convolutional Neural Networks: https://youtube.com/playlist?list=PLTl9hO2Oobd9U0XHz62Lw6EgIMkQpfz74 ⭕ The Math You Should Know : https://youtube.com/playlist?list=PLTl9hO2Oobd-_5sGLnbgE8Poer1Xjzz4h ⭕ Probability Theory for Machine Learning: https://youtube.com/playlist?list=PLTl9hO2Oobd9bPcq0fj91Jgk_-h1H_W3V ⭕ Coding Machine Learning: https://youtube.com/playlist?list=PLTl9hO2Oobd82vcsOnvCNzxrZOlrz3RiD MATH COURSES (7 day free trial) 📕 Mathematics for Machine Learning: https://imp.i384100.net/MathML 📕 Calculus: https://imp.i384100.net/Calculus 📕 Statistics for Data Science: https://imp.i384100.net/AdvancedStatistics 📕 Bayesian Statistics: https://imp.i384100.net/BayesianStatistics 📕 Linear Algebra: https://imp.i384100.net/
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from CodeEmporium · CodeEmporium · 0 of 60

← Previous Next →
1 Linear Regression and Multiple Regression
Linear Regression and Multiple Regression
CodeEmporium
2 Logistic Regression - THE MATH YOU SHOULD KNOW!
Logistic Regression - THE MATH YOU SHOULD KNOW!
CodeEmporium
3 Generative Adversarial Networks - FUTURISTIC & FUN AI !
Generative Adversarial Networks - FUTURISTIC & FUN AI !
CodeEmporium
4 Deep Learning on the Cloud - GPU TO LEARN FASTER
Deep Learning on the Cloud - GPU TO LEARN FASTER
CodeEmporium
5 Deep Mind's AlphaGo Zero - EXPLAINED
Deep Mind's AlphaGo Zero - EXPLAINED
CodeEmporium
6 Mask Region based Convolution Neural Networks - EXPLAINED!
Mask Region based Convolution Neural Networks - EXPLAINED!
CodeEmporium
7 Attention in Neural Networks
Attention in Neural Networks
CodeEmporium
8 Depthwise Separable Convolution - A FASTER CONVOLUTION!
Depthwise Separable Convolution - A FASTER CONVOLUTION!
CodeEmporium
9 One Neural network learns EVERYTHING ?!
One Neural network learns EVERYTHING ?!
CodeEmporium
10 Neural Voice Cloning
Neural Voice Cloning
CodeEmporium
11 AI creates Image Classifiers…by DRAWING?
AI creates Image Classifiers…by DRAWING?
CodeEmporium
12 Unpaired Image-Image Translation using CycleGANs
Unpaired Image-Image Translation using CycleGANs
CodeEmporium
13 K-Means Clustering - EXPLAINED!
K-Means Clustering - EXPLAINED!
CodeEmporium
14 Random Forest Classification
Random Forest Classification
CodeEmporium
15 Data Science in Finance
Data Science in Finance
CodeEmporium
16 Hypothesis testing with Applications in Data Science
Hypothesis testing with Applications in Data Science
CodeEmporium
17 A/B Testing - Simply Explained
A/B Testing - Simply Explained
CodeEmporium
18 The Kernel Trick - THE MATH YOU SHOULD KNOW!
The Kernel Trick - THE MATH YOU SHOULD KNOW!
CodeEmporium
19 Support Vector Machines - THE MATH YOU  SHOULD KNOW
Support Vector Machines - THE MATH YOU SHOULD KNOW
CodeEmporium
20 Principal Component Analysis (PCA) - THE MATH YOU SHOULD KNOW!
Principal Component Analysis (PCA) - THE MATH YOU SHOULD KNOW!
CodeEmporium
21 History of Calculus - Animated
History of Calculus - Animated
CodeEmporium
22 Curiosity in AI
Curiosity in AI
CodeEmporium
23 DropBlock - A BETTER DROPOUT for Neural Networks
DropBlock - A BETTER DROPOUT for Neural Networks
CodeEmporium
24 Autoencoders - EXPLAINED
Autoencoders - EXPLAINED
CodeEmporium
25 Recurrent Neural Networks - EXPLAINED!
Recurrent Neural Networks - EXPLAINED!
CodeEmporium
26 LSTM Networks - EXPLAINED!
LSTM Networks - EXPLAINED!
CodeEmporium
27 Building an Image Captioner with Neural Networks
Building an Image Captioner with Neural Networks
CodeEmporium
28 10 Machine Learning Questions - ANSWERED!
10 Machine Learning Questions - ANSWERED!
CodeEmporium
29 How do neural networks work?
How do neural networks work?
CodeEmporium
30 Evolution of Face Generation |  Evolution of GANs
Evolution of Face Generation | Evolution of GANs
CodeEmporium
31 How does Google Translate's AI work?
How does Google Translate's AI work?
CodeEmporium
32 How to keep up with AI research?
How to keep up with AI research?
CodeEmporium
33 How does YouTube recommend videos? - AI EXPLAINED!
How does YouTube recommend videos? - AI EXPLAINED!
CodeEmporium
34 Variational Autoencoders - EXPLAINED!
Variational Autoencoders - EXPLAINED!
CodeEmporium
35 Logistic Regression - VISUALIZED!
Logistic Regression - VISUALIZED!
CodeEmporium
36 Gradient Descent - THE MATH YOU SHOULD KNOW
Gradient Descent - THE MATH YOU SHOULD KNOW
CodeEmporium
37 Boosting - EXPLAINED!
Boosting - EXPLAINED!
CodeEmporium
38 Transformer Neural Networks - EXPLAINED! (Attention is all you need)
Transformer Neural Networks - EXPLAINED! (Attention is all you need)
CodeEmporium
39 Loss Functions - EXPLAINED!
Loss Functions - EXPLAINED!
CodeEmporium
40 Optimizers - EXPLAINED!
Optimizers - EXPLAINED!
CodeEmporium
41 NLP with Neural Networks & Transformers
NLP with Neural Networks & Transformers
CodeEmporium
42 Batch Normalization - EXPLAINED!
Batch Normalization - EXPLAINED!
CodeEmporium
43 Activation Functions - EXPLAINED!
Activation Functions - EXPLAINED!
CodeEmporium
44 Data Scientist Answers Interview Questions
Data Scientist Answers Interview Questions
CodeEmporium
45 Why use GPU with Neural Networks?
Why use GPU with Neural Networks?
CodeEmporium
46 How do GPUs speed up Neural Network training?
How do GPUs speed up Neural Network training?
CodeEmporium
47 BERT Neural Network - EXPLAINED!
BERT Neural Network - EXPLAINED!
CodeEmporium
48 ConvNets Scaled Efficiently
ConvNets Scaled Efficiently
CodeEmporium
49 Transformer Neural Net makes music! (JukeboxAI)
Transformer Neural Net makes music! (JukeboxAI)
CodeEmporium
50 What do filters of Convolution Neural Network learn?
What do filters of Convolution Neural Network learn?
CodeEmporium
51 We're hosting a Machine Learning Conference!
We're hosting a Machine Learning Conference!
CodeEmporium
52 MLconfEU 2020: Machine Learning Conference for Software Engineers
MLconfEU 2020: Machine Learning Conference for Software Engineers
CodeEmporium
53 Are Neural Networks Intelligent?
Are Neural Networks Intelligent?
CodeEmporium
54 Time Series Forecasting with Machine Learning
Time Series Forecasting with Machine Learning
CodeEmporium
55 Few Shot Learning - EXPLAINED!
Few Shot Learning - EXPLAINED!
CodeEmporium
56 How does a Data Scientist Fight FRAUD?
How does a Data Scientist Fight FRAUD?
CodeEmporium
57 How would a Data Scientist analyze Customer Churn?
How would a Data Scientist analyze Customer Churn?
CodeEmporium
58 Expectations with Machine Learning
Expectations with Machine Learning
CodeEmporium
59 Why Logistic Regression DOESN'T return probabilities?!
Why Logistic Regression DOESN'T return probabilities?!
CodeEmporium
60 How you SHOULD code Machine Learning
How you SHOULD code Machine Learning
CodeEmporium

The video teaches the concept of region proposals in object detection, their benefits, and implementation using OpenCV, focusing on computation efficiency and hierarchical image nature. Region proposals reduce the number of candidate regions, focusing computation on likely object areas and discarding unlikely background windows before classification. By understanding region proposals, viewers can improve object detection pipelines and implement efficient computation methods.

Key Takeaways
  1. Create region proposals using Selective Search algorithm
  2. Extract features from region proposals
  3. Pipe features into an SVM classifier for object detection
  4. Merge region proposals to form a hierarchy of segments
  5. Draw bounding boxes around each region and remove overlapping boxes
💡 Region proposals significantly reduce computation time by focusing on likely object areas and discarding unlikely background windows before classification, making object detection faster and more efficient.

Related Reads

Up next
How to Open OSM Files (OpenStreetMap Data)
File Extension Geeks
Watch →