Intro to Computer Vision: Fireside

Roboflow · Beginner ·👁️ Computer Vision ·5y ago

Skills: CV Basics90%Modern CV Models80%Generative CV70%

Key Takeaways

The video introduces the basics of computer vision, covering topics such as image classification, object detection, and segmentation, and discusses the importance of data collection, annotation, and model training in computer vision problems, using tools like Roboflow, CVAT, and VOTT

Full Transcript

hey everyone this is jacob from roboflow uh i'm here today with matt from roboflow to talk about uh an intro to computer vision so uh matt uh tell us a little bit about yourself and uh then we can kind of dive into the topic at hand yeah uh well thank you for uh having me on uh this is my first time doing a fireside chat with you and i'm i'm excited to do it i've watched a couple of them before i'm roboflow's new hire i'm our growth manager uh so working on a lot of different problems throughout roboflow and just getting to work with you jacob and and the rest of the team so it's been been a busy six weeks so far but very excited to be here awesome awesome so matt put together a really awesome post recently about an intro to computer vision um which i think a lot of you would be interested in kind of hearing about so we're here today to kind of take it apart and have a discussion about computer vision just kind of from a high level and so uh so let's let's go ahead and dive right in so matt what would you say just kind of at a high level is what is computer vision yes so i like to think about computer vision as it's getting a computer to see and understand the way you and i as humans see and understand things so it's you know our goal is to just teach a computer how how to do something so that for example i have my coffee mug here uh it is in the it is in the early morning for me here in baltimore and by early morning i mean 11 30 a.m but i drink coffee at all hours uh you our goal is to be able to teach a computer to recognize that this is a coffee mug and ideally at some point you can teach computers to all sorts of things with it for example if there was a robot if that robot needed to pick up this coffee mug and wanted to take a drink from it that that robot or that computer would need to be able to recognize this is the coffee mug where is the handle how do you move the arm again if it's a robot it might have like a bionic arm or something to grasp the handle pick it up and then move it to your face these are all things that we as humans do but we don't really think about it beyond when we're babies and children because it just becomes so natural but for computers we have to teach them to do all of these things so computer vision the way that i think about it is it's the way for computers to be able to see and understand things the way that you and i do as humans even though you and i as humans do it really implicitly and just kind of naturally at this point yeah definitely so how would a how would a computer kind of like even start at you know kind of being able to identify things and that it's being shown yeah so that's a good question and i think that one of the things that's helpful is to talk a little bit about my background so before coming to roboflow i was a data scientist and an educator and so when we think about data science there's a lot of different kind of things that we often think about but the goal is for us to be able to give data to a computer and for that computer to pick up patterns now this is a very very oversimplified version of what a data scientist does but the you know it's and the human is involved with it it's not just the computer picking up on patterns there's a lot of human creativity that has to be involved in data science but at the end of the day when we think about the types of models that we want to build or the types of things we want computers to be able to do we as humans take data or information we give it to the computer and the computer uses that to learn or to get a better understanding about the world around it so computers can actually do the same thing with with images and i think we've got a visual that we want to show that that kind of will hopefully help to make clear how computer vision is is very much like that just general idea of data science so what you see on your screen here these are actually three versions of the same picture of abraham lincoln so all the way on the left-hand side you've got this very very pixelated picture of abraham lincoln and uh you so you might be able to make out like the boundary of his head where his eyes would be you you may be able to tell that he's got the the big black beard that you generally think of with uh when you think of abraham lincoln so you see that image on the left well if you move over to the middle image it's that exact same image but you'll notice there are numbers in each of those pixels that number represents how light or how dark that pixel is so for example the darkest pixels around the beard area you'll notice that those have very low values like zero and 1 and 2. conversely the lightest pixels like around his forehead are in the 200s you see like 237 239 251. so those numbers represent how light or how dark a pixel is if you move to that image all the way on the right you see the same exact thing but now we've just dropped the color itself you don't see any of the gray scale all you're seeing is the numbers but those are the exact same numbers all three of these things are representations of the same image of abraham lincoln now the computer can understand those pixel values on the right hand side in that image that's something that we feed into a computer and get it to learn and understand patterns where do the dark uh where do the dark pixels or those low numbers tend to cluster where do the lighter pixels and where do the uh the higher numbers tend to cluster or appear in these images and computers can pick up patterns in this way just like if you were to give a spreadsheet of data or a pandas data frame or an r data frame of information and send that to the computer computers will be able to learn uh about how to see and understand what it is looking at here in a similar way to what you would do if you wanted to get a computer to give you a line of best fit for some data or build a like a decision tree or random force on some of your data as well so this is sort of how if we pull the veil back a little bit how computers start to understand how to see in a way that you and i as com as humans also get to see yeah yeah definitely that that makes a lot of sense um i think this one is particularly fascinating you know that there's only like 10 or 10 by 10 pixels or something and we can still already kind of pick up that this is abe lincoln you know and this is all the data that you would need to to send into the computer to kind of get it to start to learn some of those so basic concepts that we just kind of take for for granted yeah and luckily like i mean in this case i told you that it was abraham lincoln so uh you know it might be harder to pick up on that but when it comes to a computer and a computer learning about this maybe we show the computer um let's say that we wanted to have a uh we wanted to have a uh a presidential detector or a presidential classifier and we wanted the computer to be able to understand am i looking at a picture of abraham lincoln or franklin delano roosevelt or george washington you know three different presidents throughout american presidents throughout history and what we might do is we might take a bunch of images you know whether it's five or whether it's 500 images of abraham lincoln of george washington of franklin delano roosevelt or fdr and pass all of those into the computer and it sees on the right hand side those those that array of pixels that represent the colors that are shown in the image and eventually what will happen is that uh the computer will be able to pick up on those patterns again we oftentimes think of we oftentimes think of abe lincoln with a beard however to my knowledge george washington did not have a beard or was not known for having a beard and frankly i could not tell you about franklin delano roosevelt if fdr had a beard or didn't have a beard but the computer by looking at a bunch of different pictures or these pixel values of abe lincoln versus george washington versus fdr may ultimately get to understand when it sees a bunch of those low pixel values in the bottom part of the image that might mean there's a beard there which is indicative of abraham lincoln so in any case yeah it's really exciting to see how computers can can start to learn about um images because it's it's uh it's frankly really really interesting and really exciting and we can go far beyond presidential detectors or presidential classifiers and instead we can solve a bunch of different problems with computer vision um so yeah this is just one of many many many examples certainly certainly so yeah so kind of thinking about different examples you know from from what i've gathered there's a few kind of um archetypes of computer vision problems um that uh the the different things you can be trying to do will fall into um and what was kind of wondering if you could take us through some of those and uh kind of break down the different uh branches of computer vision if you will yeah so the first thing that i want to say is that computer vision is a very very very fast growing field there are so many different organizations who are doing research in computer vision like i said there are many different applications and so the applications that we talk about here are not going to be exhaustive of or the types of computer vision problems won't be exhaustive of all the types of problems that that one can solve and as time goes on right now it's at the end of 2020 but in who knows what will happen in 2021 i know that you've written a few blog posts this year about uh or rather more than a few blog posts this year about which models are are these state-of-the-art models and how it's like every two months there's a new model that tends to outperform old models uh and it's it's really exciting to be part of that and to see how quickly that stuff is is changing so just wanted to add that caveat up front that with the the what we're going to go through here in a minute this represents what the landscape of computer vision looks like today in december of 2020 but when we think about january of 2021 or december of 2021 and beyond uh it may look very different so with all of that being said yeah there are four very common types of computer vision problems the first one and you can see on the left hand side of the screen is called classification so in that image there there's a cat so let's say that you are a person and you want to like let's say that you're a child and your parents uh are trying to teach you what a cat looks like and what a dog looks like well maybe there are you as a child maybe there are a bunch of different pictures of cats and dogs more realistically maybe your parents would read you a book or something like that where you know they point out what the cat looks like in the book and they point out what the dog looks like in the book and over time you see enough images to where you understand this is a cat or this is a dog the same thing happens with computer vision so what we could do is we could take a stack of 100 images and we could have maybe 50 cats and 50 dogs and we feed those into the computer and the computer starts to understand what differentiates a cat from a dog what are the features about a cat perhaps sharper ears or more pronounced whiskers or maybe longer and more flexible tails whereas dogs also four-legged creatures they you know they have different features as well and the computer ultimately gets to pick up on that that would be considered a classification problem where in every image things are put into one of a specific number of buckets in this case one of two buckets one for cat and one for dog you can think of it as like categorizing flash cards each one goes into one pile or another pile the example of the presidential classifier that i mentioned would be an example of a classification problem because there you've got images of abraham lincoln george washington and fdr in each image you only have one of those presidents a closely related problem is called classification and localization it's got the same sort of approach where you try and classify the uh you try and classify which bucket an image falls in but what you'll notice about the image here is there's a square around that cat and that's what's called localization not only are you trying to show which not only are you trying to predict in which bucket does an image fall but you're also trying to localize in that image or identify where what location in that image the cat falls or the dog falls there's a lot of examples of this uh and so before moving on to talking about object detection and semantic segmentation a real world example of this is like a a wildfire detector actually somebody's been using roboflow to tackle that so yeah um yeah so if if you oh yeah sorry i guess we'll we'll move ahead a little bit here yeah yeah so this would be an example of a classification and localization problem so in this image somebody is this is an open source data set you're able to to play around with it what you want to be able to do is you want to be able or the the user here wants to be able to identify when smoke happens and where that smoke happens so that a drone could be dispatched to try and dump water on the source of that smoke to prevent wildfires from happening so this is an example like i said of a classification and localization problem not only are we understanding is there smoke or is there not smoke in this wildfire detector or in this smoke detector but we're also trying to localize where is that smoke and so you can see the light green box or the yellow box around that image or around the wildfire smoke specifically saying yes this is smoke one other thing that i want to call out for people who may or may not be familiar with this you'll notice that there's a percentage there as well like smoke 98 smoke 94. this is the you might think of this as as the confidence or the probability that it actually is seeing smoke there computer vision is not perfect just like human vision is not perfect and so whenever we make predictions we're often able to say hey we're pretty sure that this is smoke but there's a possibility that it's something else maybe it's really foggy out or there's a low cloud or you know something else going on that makes it look like smoke but as we see here there's a pretty high chance that what it's looking at is smoke and we as humans look at that and probably would believe that it's smoked too definitely definitely i mean this use case really just shows just how how useful this can be i mean i couldn't imagine if every on top in california had one of these it would be you know all kinds of fires could be prevented with with this kind of surveillance right definitely a lot of potential to applying computer vision in this and and many other contexts definitely definitely um but yeah so i guess kind of uh to carry the conversation forward i i guess um i'm just kind of curious you know um now that we've kind of seen all the cool things that uh computer vision can do um how would one kind of like get started on sort of like actually starting a project and getting going and what what what might it look like to get going in computer vision yeah so for people who come from a data science background the steps to tackling a computer vision problem are not going to look that different from the steps of tackling just a general data science problem in this case data are going to be the images that we work with but if you're less familiar with a data science workflow for example if you are a web developer or your product manager or you're somebody who who just doesn't have as much experience in in data science or data analysis there are a series of steps that you can take to be able to pre uh to be able to produce a a computer or rather solve a computer vision problem so this is something that we've put together at roboflow and you'll notice that there are seven steps to this process moving from left to right you have collect label organize process train deploy display now i write about all of these in the um i read about all of these in in the article that's posted on the roboflow blog but the idea here is that you start on the left-hand side with collecting data you have to in for example with that wildfire detector you need to be able to go ahead and um you need to be able to have images of both wildfires and non-wildfires then moving into the labeling stage somebody would have to label the images so when you saw for example the wildfire there was the smoke there was a big green box around the smoke that box is what we call a bounding box and in order for us to be able to use a bounding box um or rather in order for the computer to be able to put a bounding box around that smoke the computer has to be told what is smoke and what isn't smoke and what do other bounding boxes look like so you as the user will actually have to annotate or label your images by drawing a bounding box around objects that's something that as you see here there's a couple of integrations you can use a tool called cvat or cvat you can use microsoft v-o-t-t we actually just last week um we publicly launched roboflow annotate where if you upload images to roboflow you can actually annotate those images directly in roboflow as well so you can put the bounding boxes around images uh everybody can do it that's free and honestly it's i find it kind of fun so would encourage you to upload a couple of images and try that out once you've got that done then what you need to do is you need to move into organizing and processing your images so oftentimes when you're tackling a real computer vision problem you're probably working with a team so maybe you and a couple of other teammates are collecting different images and labeling them you need some place to organize all of that together and then you need to process those images so let's say that for example i have an iphone but let's say that jacob you have an android phone and you and i are both taking images of um you know if we go back to like the cat and dog classifier maybe i take a bunch of pictures of my dog i don't know if you have a dog or a cat but maybe you take pictures of of a pet well our pictures or images are probably going to be different sizes because an iphone camera and an android camera are probably not going to be the same and if somebody else has a digital camera or whatever there's all sorts of reasons why images may come in different shapes and sizes well that's not good when it comes to to solving a computer vision problem that actually presents a number of challenges so you need to format your images and pre-process them to make sure that they're all the same size you can also do something really really cool called image augmentation but i want to pause before getting into that because i think that's really cool uh and i think there's a lot of stuff that can go into that so i just want to see are there is there anything that any questions that you have about kind of the the first i'll call it three and a half steps that we've talked about for this process so far yeah yeah i have a few but um maybe we keep keep going along and then i'll ask you some some deeper dives on uh individual sections once once we've seen the hole all the way through to the touchdown line sure so when it comes to so once you've pre-processed your images you can also do something called augmenting your images one of the biggest challenges in all of data science or in all of statistics you know if you've ever taken a statistics class you are it's kind of drilled into your head how important it is for you to have as large a sample size as possible now there are a couple of cases in which you don't want to do that or you have to be very careful about how you do that but all else held equal the more data you have the better and that probably makes sense if we think back to that that presidential detector that presidential classifier and we're looking at abe lincoln we're looking at george washington and we're looking at fdr i think that if a computer has five images of each versus 500 images of each the computer is probably going to have a better or an easier time generating predictions that are accurate when you're working with 500 images in each bucket because if you only have five pictures of each well the compute that might not be enough for the computer to learn same sort of thing with a with a child who's learning what's the difference between a dog and a cat the more times a child sees a cat and sees a dog it's easier for that child to to understand this is a cat versus this is a dog so image augmentation is this really cool technique where when it comes to image data we can make small tweaks or small perturbations to the data that will allow you to increase your sample size so for example what we've got here is we've got the uh we've got the roboflow mascot lenny the raccoon and so if we look at the image all the way on the left you have lenny and there's been a bounding box drawn around lenny well there are a number of changes that we can make to this image obviously this is a picture of lenny the raccoon and it should be treated as such what we can do though is if you look uh so you've got that left hand image moving over to the right a little bit you'll see two copies of that same image i'll actually start with the one on the bottom so notice that the one on the bottom is the exact same picture of lenny except it's a little bit uh it's a little bit lighter and it's been cropped a little bit you'll notice that his paws are actually at the very very bottom of the image if you look at the image directly above that you're gonna see a similar image of lenny but again there's some visual effects that have changed you'll also notice that that picture has been reflected so that that lenny it's still lenny it's still a raccoon but it is a uh you're seeing slightly different perspectives of lenny you can move on to the right and you can see the same sort of thing you can see the image tweak uh being uh we apply a little bit of shear to it or we apply a little bit of rotation to it we blur the image we might rotate it 90 degrees we might add a bit of noise to it all of these things are the same picture of lenny so let's say that we took one picture of lenny we've now used image augmentations to go from one image of lenny to six images of lemon now i do want to add a caveat here this isn't some magical solution this is not something where we can just say hey we've got like we had one picture of lenny now we have six pictures of lenny and so our sample size effectively is six these are all very related or correlated images to one another uh it would be better for us if we had six pictures of different raccoons as opposed to one picture of the same raccoon tweaked six different ways but we're working with what we've got so you can use image augmentation to increase that sample size that's in your data and what this does is it gets your computer to understand a little bit more about in this case how do you identify a raccoon in the woods so by changing the lighting a bit or changing the angle or adding a little bit of noise the computer can start to understand specifically what what is it that makes a raccoon a raccoon and how can we better detect that um so i think image augmentation is one of the coolest things and it it's fairly unique in my experience to things like image and video computer vision problems yeah definitely definitely definitely a super powerful way um to be able to you know expand your training set and and really kind of show the computer as many different versions of things as as you can that's that's that's really amazing um yeah so so yeah so so moving along on the on the full process so let's see let's let's let's step back for a second so we've we've gathered a bunch of images we've annotated them with labels so we have things that we can teach the computer now we've got this data set we've made it a lot uh bigger and more diverse with uh augmentations and we've used pre-processing to get it into the right format um so where where are we headed yeah so if we step totally back remember our goal is to be able to do a computer vision problem we want to teach our computer to see and understand things the way that you and i do so we've kind of done all of the input stuff now we're at like the building phase this is where we train a model so if you depends on if you come from like a data science or a more technical background or not but i'm going to flash this image up it's going to look really scary but we'll we'll talk about it so we want to be able to build a model now a model is just a simplification of reality so you and i build models all the time every day so back before the covet times when people generally would actually commute into work we all had mental models about how long it takes us to commute to work for example uh i live in baltimore and my fiancee would go to work every day he works in washington dc that is a 90-minute commute now every single day his commute was not exactly 90 minutes in fact it was probably never exactly 90 minutes to get into the office but you've got this mental model it takes 10 minutes to get to the train it takes 50 minutes on the train then you go to station you get on the metro you walk to the office like you've got all of those steps and it's roughly 90 minutes it's not a perfect model but it helps us to understand what are the things that we need to do for example my fiance can use that to figure out when does he need to be in the office when is the last train he can take to get into the office on time when does he need to set his alarm and things like that the model doesn't have to be perfect but that is all a model is so what we're seeing here is a much much much more sophisticated model when it comes to developing computer vision models there are a number of choices that you can make by and large the most the most impressive types of models are things called convolutional neural networks and so if you're familiar with convolutional neural networks or had experience with them that's great i'm not going to walk through each of these components because frankly it's when it comes to an intro to computer vision that's not uh not really worth our time here but the goal is for us to be able to see to take an image and turn it into a prediction so if you look in the top left corner where you see it says backbone and there's like c1 through c5 at the very bottom of c1 you see the outline of an image that is one image you can think of that as an image of lenny the raccoon or an image of abraham lincoln or an image of a cat or a dog and what it's going to do is it's going to go in this really complex looking system of of pipes here that will then move from left all the way to the right and once you go through all of these numeric transformations and and frankly just all of this math that happens at the end the computer is going to be able to learn what is this image or the computer is going to be able to make a prediction about what this image is what we do and how we train a model is we give our model or we give our computer a lot of training data so we might if we wanted to build a lenny um a lenny the raccoon object detector which we didn't talk about object detection earlier it's where in an image you may have multiple objects and the computer should be able to pick up lenny or pick up any objects in the image that you want it to pick up so what you do is you might give your computer ten or a hundred or a thousand or ten thousand or a million images of lenny with bounding boxes drawn around lenny and what is gonna happen is we're gonna pass those images into this network and it's gonna move all the way from left to right and then at the end it's going to check and see was i correct or was i not it's almost like flash cards you look at a flash card you make a guess as to what the answer is you flip the card over you see whether your answer was right or not and then you use that to adjust your thinking and you do that over and over and over again that's the way this convolutional neural network is going to work images are going to be passed in the computer is going to make a guess it's going to look at the true answer and say oh i was really close or i was not very close if you were really close and that means your computer is doing a good job if it's not very close the computer says we've got to adjust a lot of things so let's adjust now let's move the next images through the network and see how it goes again and again and again so that's that's the way that machine learning models in general work convolutional neural networks are especially complex models but they tend to do pretty well with computer vision and i know that that you jacob do a lot of work with the different types of models that are out there like the the efficient debt and the the scaled yolo v4 tiny yolo v5 and all of that so i kind of want to toss it to you because this really is your bread and butter yeah this is this is really a personal passion for me um but i agree with matt that some of the details are kind of outside of the scope of this this just intro to computer vision um but the important thing is kind of how how matt outlined you know that there's a bunch of images coming through there's a network that's being trained um and then at the end you're able to get kind of the prediction uh out so you're able to it's really just a data transformation process of transforming the image that's coming in into some kind of actionable insights um and then i guess kind of you know nearing the uh the end of our talk here um it's been you know it is quite a quite a process um to kind of go through to make sure that you're kind of getting everything right throughout this pipeline um but i was kind of thinking that maybe we we talk a little bit about what what the motivation is for for this process and and you know trying to keep everything well connected throughout the way and and um you know what what might be sort of uh the motivation behind all this work yeah so the motivation behind all of this work is is to do what it is that you want to do you're trying to teach a computer to see for some purpose so let's figure out what is that purpose and let's let's get the computer to do that not lose sight of that goal so for example if you want to be able to build an app that can you know uh you know an app that can be deployed to your phone and can you know you you download the app and you you pull it up and then you're able to maybe scan something like you're able to to scan a dog and see what kind of breed it is or you want to be able to scan um you know a bunch of things in a factory very very quickly and efficiently so that humans don't have to look at everything those are there's i mean there are literally an uncountably infinite number of things that can be done with computer vision so really the sky's the limit in terms of the thing that you want to do once you've gone through all of these steps then you get to decide how you want to use that model do you deploy that model to an app like i just said do you build that model and kind of share that model with somebody else so that they can use it do you want to be able to deploy it on on a website so that you could just do with a with a video camera there could be like a bounding box that you put around something there's all sorts of different things that you could do there and you can display your results but that's the end goal to a related question that that you asked is let's talk about all of those pieces and you know having a unified approach to all of those pieces up front there's an adage in data science and in many other fields that we we describe as garbage in garbage out so the idea is that if you put garbage in things you whatever you get out of it is going to be garbage so for example let's say that i wanted to work on um let's say that i wanted to be able to look at a customer satisfaction let's say that i work for a company and i want to be able to understand how satisfied people are with my service so what i do is i collect data and let's say i do everything right i build the exactly correct type of model any statistical assumptions that are made are satisfied i've got a ton of data i cleaned the data well that preserves the the original form of the data like i didn't make any mistakes i handled missing data properly did all of the things from start to finish correct but let me tell you i gathered my data from yelp what do you think jacob is the problem if i gather my data from yelp than using to understand customer satisfaction with my product or my service yeah i mean it it might be a total mess you might be getting you know all kinds of different data moment may not be accurate i'm not entirely sure but when we think about yelp as a service and and what you know you think about people using yelp online yelp is not really a random sample of your customers yelp is generally where people go to complain now that's not always the case some people are just really really big yelpers they'll go on yelp and they'll review people and reveal review services and products and things like that constantly um there are some people who might say hey i had a fantastic time and i'm going to go on yelp and i'm going to like give this five stars but in a lot of cases just based on what i've seen on yelp most frequently people go to yelp because they're complaining they have something to complain about so if you wanted to understand how how um pleased are people with my business with my organization and you go on yelp that's probably going to be a pretty skewed view as to how people actually feel about your service the same sort of thing is happening here so what you see on your screen again you might think of it as like a garbage in garbage out so what we see on the screen here you've got this is considered an object detection problem you'll notice that there are bounding boxes around many different objects so you've got cars that are labeled you have motorbikes that are labeled you have people that are labeled as person you have buses uh it looks like those are all of the individual classes that we have here now this model does an okay job because you see for example oh there are a lot of bounding boxes and the bounding boxes seem to be pretty tight around the objects themselves like it the bounding box for a car pretty well matches the limits of the car it's certainly not perfect but it seems to you know those seem to be okay the problem with this is that when you look at this image here there's a lot of things that aren't labeled there's a lot of things that are missing and so that might indicate if you build your model and you take a look at it and you say this is what my output looks like you're not done with your process you need to go back and see what are certain choices that i made that um that prevented my model from being as accurate as it could be is it that i made the choice to build my model without enough data do i need to go back and collect more data do i need to augment my images more did i do a poor job of labeling my images do i need to be better about drawing bounding boxes very intentionally around each of the objects did i just not have enough data on motorbikes for example i see a lot of motorbikes in this image that are unlabeled so once you go back through your process and focus on that data collection that data organization that pre-processing and augmentation then what you might do is retrain your network and then you end up with a significantly better model something that looks like this so you'll notice that it's still not perfect there's a lot of things at the very very back of the street in the back of the image where uh you see cars and things like that but the computer isn't picking up on those you'll also see some people on the other side of the road some people and motorbikes that that are missing but you'll notice that the computer is doing a much better job of identifying things that are directly within its field of vision here in front especially that little cluster of people in front of the bus where you on the last image you saw a lot of that was blank the computer is now picking up these people these motorbikes um and that is a that's a good thing so we would say this model is performing better than it was in the previous image definitely definitely and i i think that that is you know just a really good example of the motivation behind the whole process which is just to not only build a model that will do the job that you you set out for it to do but that will do it well with high fidelity um and then yeah and then just reflecting a little further the the beauty of things here is that once you get this model finished um and you get it to a place where where you want it you can obviously continuously improve it and make it better and better but then you have something that you have locked in place that can make inferences you know as as many times as you want it will run you know just like just like a computer you know it runs it always runs so that's a really motivating fact i think yeah and i think it'd be remiss if i didn't bring up the idea of of ethics and of equity when it relates to this because this is a really important point to bring up as we think about like the the two images that we saw there where there was the image that didn't perform as well and there was a lot of there were a lot of annotations that were missing and then the the latter image where things looked to be performing much better it's important to keep in mind what your goal is and it's important to keep in mind what is the problem that you are trying to solve for example if you're thinking about a presidential classifier the the fdr the abe linc in the george washington example that is you know that's probably i don't know what the use case for that would be but i imagine that's fairly low stakes that might be a toy project that that is something where if you if your model got a classification wrong then you know that probably wouldn't be the end of the world there might not be a ton of harm that's associated with your model getting that wrong in the case where we were looking at that at that street uh think about that in the context of self-driving vehicles and that is one example where obviously you need a model that is performing very very very well if that is the goal and so keeping that goal in mind having a self-driving vehicle for example um there are a lot of ethical implications and equity-based implications that are wrapped up in that and so we need to be cautious to not say well it's good enough for our purposes well it's we've got 95 accuracy or 99 accuracy people will often say like would you ever get on a plane if that pilot had a 99 success rate or something like that i think that a lot of people would say no um and and so it's very very important to think about that as well as there are a lot of documented examples where especially as it relates to computer vision for for people that based on a number of different factors there are models tend to not perform as well for people of color people with darker skin and a lot of that has to go back to the beginning of the process are we collecting enough data from various groups of people for example if you're training set for data if you're doing medical imaging and the training set for your data is overwhelmingly white patients that affects the performance of that machine learning model if you are building a if you're trying to build a sensor that can detect uh like like the the hand sanitizer sensor uh you know you put your hand underneath and it detects and it shoots that out if it's generally tested in environments that are lit that you know that better augments or better calls out white skin than darker skin darker pigmented skin then that that technology may not perform as well for people of color and so i think that it's really important to understand how none of these pieces are like intractable from one another and so that all uh that all boils into when we think about building these types of models and just the the iterative nature of it if you go back to jacob if you go back to that slide where we were talking about like collect label organize and all of that the goal is to be able to design things in a way where you see from start to finish you go all the way from left to right you start with collect then you move into label then organize into all of that and that makes it easy for us to mentally digest but realistically there's a lot of continuous improvement that's got to take place there's a lot of opportunity for us to once we train that model say hey that model is not performing as well as we want so before we deploy it let's go back to the collection phase and get more data or let's go back and re-label and improve the labeling on that so i know that was long-winded but i think it's it i definitely would have been remiss if i hadn't brought that up and it's something that absolutely needs to be considered when when tackling computer vision problems most certainly most certainly and as matt pointed out here you know this whole continuous loop process is something where we're working heavily on making easier and faster at roboflow so if you're thinking about using computer vision consider checking us out and you know i just wanted to say you know thanks so much for joining matt and i today in this discussion and you know it's been really awesome so yeah thank you i'm excited to to do more of these fireside chats in the future and perhaps lead a few of my own uh but yeah excited to be here excited to be working with you uh and uh and i i'm looking forward to hearing from people who view this video and uh and understanding what are the types of of problems that you're solving with roboflow yes definitely definitely so thanks so much for joining us and uh we'll see you all on on the next video take care

Original Description

What is computer vision? How does a computer start to "see?" How does someone tackle a computer vision problem? All of this, and more, are discussed by the virtual fireside in this video. Share your thoughts below and be sure to subscribe to our channel! --- 0:00 - Introduction 1:02 - What is computer vision? 4:06 - How does a computer start to "see?" 9:09 - What some of the different types of computer vision problems? 16:00 - How can someone start tackling a computer vision problem? (Includes collecting, labeling, organizing, pre-processing data.) 20:49 - What is image augmentation? 25:39 - What does it mean to build a computer vision model? 31:40 - Why does this process have so many steps? 39:42 - Acknowledging ethics and equity in computer vision --- Check out some links: Roboflow -- http://roboflow.com/ Intro to Computer Vision Post -- https://blog.roboflow.com/intro-to-computer-vision/ Original Source of Abraham Lincoln Pixel photo -- https://openframeworks.cc/ofBook/chapters/image_processing_computer_vision.html Inspiration for Computer Vision Problem Types visual -- Stanford CS231n Course http://cs231n.stanford.edu/

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Roboflow · Roboflow · 51 of 60

← Previous Next →

YOLOv3 PyTorch Notebook Tutorial

YOLOv3 PyTorch Notebook Tutorial

How to Train YOLOv4 on a Custom Dataset (PyTorch)

How to Train YOLOv4 on a Custom Dataset (PyTorch)

How to Train YOLOv5 on a Custom Dataset

How to Train YOLOv5 on a Custom Dataset

How to Use the Roboflow Dataset Health Check

How to Use the Roboflow Dataset Health Check

What is Mean Average Precision (mAP)?

What is Mean Average Precision (mAP)?

How to Use the Roboflow Model Library

How to Use the Roboflow Model Library

How to Train EfficientDet in TensorFlow 2 Object Detection

How to Train EfficientDet in TensorFlow 2 Object Detection

How to Train YOLO v4 Tiny (Darknet) on a Custom Dataset

How to Train YOLO v4 Tiny (Darknet) on a Custom Dataset

Ask the Roboflow Team Anything - Episode 1

Ask the Roboflow Team Anything - Episode 1

Exploring The COCO Dataset

Exploring The COCO Dataset

Community Spotlight: Improving Uno with Computer Vision

Community Spotlight: Improving Uno with Computer Vision

Mosaic Data Augmentation - Deep Dive

Mosaic Data Augmentation - Deep Dive

Hands on with the OAK-1

Hands on with the OAK-1

Glenn Jocher: What is New in YOLO v5?

Glenn Jocher: What is New in YOLO v5?

How to Use Amazon Rekognition Custom Labels and Roboflow to Build an Object Detection Model

How to Use Amazon Rekognition Custom Labels and Roboflow to Build an Object Detection Model

An Interview with Brandon Gilles, Luxonis Founder and OAK Chief Architect

An Interview with Brandon Gilles, Luxonis Founder and OAK Chief Architect

How to Train a Custom Mobile Object Detection Model (with YOLOv4 Tiny and TensorFlow Lite)

How to Train a Custom Mobile Object Detection Model (with YOLOv4 Tiny and TensorFlow Lite)

Tackling the Small Object Problem in Object Detection

Tackling the Small Object Problem in Object Detection

Fast.ai v2 Released - What's New?

Fast.ai v2 Released - What's New?

Teaser: Roboflow Train (1-Click Computer Vision AutoML)

Teaser: Roboflow Train (1-Click Computer Vision AutoML)

How to Train a Custom Resnet34 Image Classification Model

How to Train a Custom Resnet34 Image Classification Model

How to Label Images for Object Detection with CVAT

How to Label Images for Object Detection with CVAT

Deploy YOLOv5 to Jetson Xavier NX at 30 FPS

Deploy YOLOv5 to Jetson Xavier NX at 30 FPS

Elisha Odemakinde Hosts Roboflow ML Engineer, Jacob Solawetz

Elisha Odemakinde Hosts Roboflow ML Engineer, Jacob Solawetz

Getting Started with VoTT - Computer Vision Annotation

Getting Started with VoTT - Computer Vision Annotation

How to Manage Classes in Object Detection (Rename, Combine, Balance)

How to Manage Classes in Object Detection (Rename, Combine, Balance)

How to Train YOLOv4 on a Custom Dataset in Darknet

How to Train YOLOv4 on a Custom Dataset in Darknet

Is Grayscale a Preprocessing or Augmentation Step in Computer Vision?

Is Grayscale a Preprocessing or Augmentation Step in Computer Vision?

Getting Started with Image Data Augmentation

Getting Started with Image Data Augmentation

Glenn Jocher: Image Augmentation in YOLO v5 and Beyond

Glenn Jocher: Image Augmentation in YOLO v5 and Beyond

GA Hosts Roboflow - Healthcare and AI

GA Hosts Roboflow - Healthcare and AI

How do self driving cars know when to stop?

How do self driving cars know when to stop?

What is PASCAL VOC XML?

What is PASCAL VOC XML?

AutoML Showdown: Google vs Amazon vs Microsoft

AutoML Showdown: Google vs Amazon vs Microsoft

How is computer vision changing manufacturing?

How is computer vision changing manufacturing?

The Alphabet in American Sign Language

The Alphabet in American Sign Language

Luxonis OAK-D: Computer Vision on Device

Luxonis OAK-D: Computer Vision on Device

How to Train a Custom Faster R-CNN Model with Facebook AI's Detectron2 | Use Your Own Dataset

How to Train a Custom Faster R-CNN Model with Facebook AI's Detectron2 | Use Your Own Dataset

TensorFlow vs PyTorch: Fireside

TensorFlow vs PyTorch: Fireside

Occlusion Techniques in Computer Vision

Occlusion Techniques in Computer Vision

A Customizable Web Application for Your Computer Vision Model

A Customizable Web Application for Your Computer Vision Model

Model Tradeoffs and the Future of Computer Vision

Model Tradeoffs and the Future of Computer Vision

Designing an Augmented Reality Board Game App

Designing an Augmented Reality Board Game App

YOLOv4 - Advanced Tactics

YOLOv4 - Advanced Tactics

How to Use CreateML and Build a Computer Vision iPhone App | AR Object Detection

How to Use CreateML and Build a Computer Vision iPhone App | AR Object Detection

Fireside Chat: Computer Vision in Agriculture

Fireside Chat: Computer Vision in Agriculture

Scaled-YOLOv4 Tops EfficientDet: Research Rundown

Scaled-YOLOv4 Tops EfficientDet: Research Rundown

What is Image Preprocessing?

What is Image Preprocessing?

Building a Community of Creators with BlkArthouse and Von Deon

Building a Community of Creators with BlkArthouse and Von Deon

How to Train Scaled-YOLOv4 to Detect Custom Objects

How to Train Scaled-YOLOv4 to Detect Custom Objects

Intro to Computer Vision: Fireside

Intro to Computer Vision: Fireside

The Best Way to Annotate Images for Object Detection

The Best Way to Annotate Images for Object Detection

The Computer Vision Process: Fireside

The Computer Vision Process: Fireside

How to Annotate Images with Your Team Using Roboflow

How to Annotate Images with Your Team Using Roboflow

Introducing the Roboflow Object Count Histogram

Introducing the Roboflow Object Count Histogram

How Fast is the M1 at Machine Learning? Benchmarking Apple's M1 and Intel's Chips

How Fast is the M1 at Machine Learning? Benchmarking Apple's M1 and Intel's Chips

CLIP: OpenAI's amazing new zero-shot image classifier

CLIP: OpenAI's amazing new zero-shot image classifier

How I hacked my Nest camera to run custom models

How I hacked my Nest camera to run custom models

Getting Started with the Roboflow Inference API

Getting Started with the Roboflow Inference API

Transfer Learning in Computer Vision | What, How, Why

Transfer Learning in Computer Vision | What, How, Why

This video introduces the basics of computer vision, including image classification, object detection, and segmentation, and discusses the importance of data collection, annotation, and model training in computer vision problems. The video also covers the use of tools like Roboflow, CVAT, and VOTT, and the importance of continuous improvement in the iterative process of building computer vision models.

Key Takeaways

Collect images of objects or scenes
Label and annotate the images
Pre-process the images
Train a computer vision model
Test and evaluate the model
Refine and improve the model

💡 Collecting diverse and high-quality data is crucial for building accurate and reliable computer vision models

🔒 Pro feature: Ask AI to explain this lesson →

More on: CV Basics

View skill →

Identify Horses or Humans with TensorFlow and Vertex AI

Building a Dog Breed Identifier App from scratch - DogNet

Building a Dog Breed Identifier App from scratch - DogNet

Aladdin Persson

Apply OpenGL Texturing and Camera Systems

Apply OpenGL Texturing and Camera Systems

Aerial Image Segmentation with PyTorch

Aerial Image Segmentation with PyTorch

How to Install Stable Diffusion - automatic1111

How to Install Stable Diffusion - automatic1111

Sebastian Kamph

NVIDIA RTXGI Unreal Engine 4 Plugin: Introduction and Setup

NVIDIA RTXGI Unreal Engine 4 Plugin: Introduction and Setup

NVIDIA Developer

Related AI Lessons

Cloud-Optimized OpenCV + A Special Surprise Announcement on OpenCV Live

Learn about Cloud-Optimized OpenCV for faster computer vision computations and a special announcement on OpenCV Live

When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…

Learn how to build an AI-powered exam monitoring system using Computer Vision and DeepFace to assist professional certification exams

Medium · Python

When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…

Build an AI-powered exam monitoring system using Computer Vision and Deep Learning to enhance professional certification exams

Medium · Deep Learning

When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…

Build an AI-powered exam monitoring system using Computer Vision and Deep Learning to enhance exam security and integrity

Medium · Cybersecurity

Chapters (9)

Introduction

1:02 What is computer vision?

4:06 How does a computer start to "see?"

9:09 What some of the different types of computer vision problems?

16:00 How can someone start tackling a computer vision problem? (Includes collecting

20:49 What is image augmentation?

25:39 What does it mean to build a computer vision model?

31:40 Why does this process have so many steps?

39:42 Acknowledging ethics and equity in computer vision

Marketing management for ugc net| Important topics of marketing management ugc net commerce dec 2023

Bhoomi Learning Centre~Dr. Muskan