Simplifying Image Recognition using ApertureDB and Python

DataCamp · Beginner ·👁️ Computer Vision ·2y ago

Skills: CV Basics90%ML Pipelines70%

Key Takeaways

This video demonstrates how to use ApertureDB and Python for image recognition, accessing the COCO dataset and performing image recognition tasks using deep learning and database infrastructure.

Full Transcript

hello everyone and thank you for joining today's Cod long my name is Ree and I'll be your moderator today we we're going to kick off the session in a couple of minutes we're just waiting so everyone has a chance to join in the meanwhile though we'd love to hear from you so let us know where you're joining from using the chat or the comments what platform you're watching on and yeah tell us something that you'd like to get out of the session today uh we are going to be using data count workspace today for the code long so uh if you don't have an account already then make sure you've got one uh make sure you get signed up um I'll be sending a link through so you can C along with us on T Camp workspace uh you won't need a paid account you can do this all on a free free account um but yeah keep your eyes peeled for that link as well uh shortly as well I'm going to be sending through a link so you can register for the event everyone that registers for the event will get sent the recording as well as the resources as well uh you can keep keep an eye out for the link in the chat but you can also head over to dat camp.com webinars as well and you should be able to sign up there brilliant I think that's everything for me for the moment so I will be back to repeat these messages for any new join shortly but until then enjoy the background [Music] music [Music] [Music] [Music] [Music] [Music] hello everyone and thank you for joining today's code along my name is reys and I'll be your moderator today we're going to kick off today's session in about a minute or so we're just waiting so everyone has a chance to join in the meanwhile though we'd love to hear from you so let us know where you're joining from using the chat or the comments depending on what platform you're watching on and yeah tell us something that you'd like to get out of today's webinar uh if you have any questions at any point throughout the session uh technical or Theory then uh please let us know using the chat as well and we'll be saving them for the Q&A at the end uh if you haven't registered for the event already then please make sure that you do that head over to dat camp.com / webinars or you can find the link that I've sent in the chat I'll be sending it again shortly uh and yeah get registered and we'll send you the recording as well as any other resources as well brilliant uh we are going to be using uh data count workspace today so if you don't have an account already then please sign up for one uh you'll only need a free one to code along with us today and yeah you can keep your eyes peeled we'll be putting a QR uh code on screen as well as a link in the chat so you can code along with us shortly brilliant I think that's everything from me so now I'll hand you over to your host for today's session Richie Richie please take it away hi there data scamps and data champs this is Richie now when you think of data analysis the data type that first comes to mind is usually numbers but advances in deep learning and database technology mean that images and videos are now fair game for analyzing so I might go so far as saying that important computer vision task like image detection image classification and image recognition are well on their way to becoming mainstream activities for data scientists So today we're going to do some image recognition on a very famous data set Microsoft's common objects in context and our guest is uh Lise Remy a co-founder and the CTO at apure data uh the maker of the image database apure DB uh Lou is uh leads the design and development of aperture DB um and he previously worked as a research scientist at the software and systems research group at Intel Labs now for those of you who were here two weeks ago aperture DB might sound familiar so we had a great webinar on computer vision in business uh so ree can uh post a link to the recording in the chat um and luis's uh colleague and founder uh co-founder vishaka joined a panel to talk about uh different use cases of computer vision uh so please do check out the recording when you get chance and I believe vishaka is uh watching us on on LinkedIn so please do say hi to her in the chat now uh with that uh over to you uh Louis all right thank you yeah let's get started today we're going to be covering um we're going to be running a Cod alone session where uh we're going to learn how to how to simplify the image analytics using aperture DB and python we're going to be using pytorch and the idea is going to be to um explore a large data set with uh a lot of reach metadata and use that for doing analysis on how well some of the models that we're building um are doing so uh let's jump right into it and uh the first thing that we're going to be doing is uh setting up all the network and all the Machinery so that we can use some pre-trained models to run object detection over um over a set of images so the original name of the session was simplifying image recognition uh we started working on image recognition uh but then we realized that much more interesting and uh you know uh novel use case would be just running object detection this is detecting all the different objects that are on a large image um so we that that's part of image analytics in general so we're just going to go for for that which is a much more fun uh activity to do so in this C session what we're going to be doing is running uh three different pre-trained models and uh run OB object detection over those uh over certain images and we're going to be using Python and pytorch for retrieving and running the models we're going to be using the Coco data set as Richie mentioned uh here's a link with more information about the Coco data set you can run these notebooks uh later on they will uh they will always be accessible and we're going to be retrieving the data from the Coco data set from aperture DV rather than you know giving a copy of the data set to each person participating here we're all going to be using a single copy of the data set and we're going to be retrieving that data from aperture DV uh at the end of the code alone session uh basically you will learn three main things one is going to be learn how to run different object detection models using p torch and the pre-trained models that they offer the other thing that we're going to be learning is how to create and retrieve data sets using aperture DV and not only you know uh at a data set level but we will be able to create and uh uh and manage different data sets as we need them and finally we're going to learn how to evaluate the performance of a model using what is the values on their bounding boxes um all all the detections that we do on the image we're going to compare them to a ground Truth uh let's start it by first setting up up and uh running these three pre pre-train models that we're going to be use uh we're going to be using today so uh the three models are uh one that is called uh resnet 50 is actually faster rcnn with a resnet 50 backbone this is the most accurate model that we have but it's uh it's lower as as we will see we'll also be using faster rcnn with a mobile net B3 backbone which is much faster but it's less accurate it's less accurate for some of the labels and not for the others that's part of the evaluation that we'll be doing and finally we're going to be using routina net which is a good balance between speed and accuracy uh even though it's unless we run on gpus it the the speed is in the or like in the order of the the first option the reset based option oh sorry to interrupt can you just increase the text size a little bit I'm getting old me eyesight's failing AB absolutely is that better uh yeah maybe one more one more let's do it thank you awesome all right good uh the first thing we're going to be doing is installing making sure that we have all the dependencies up and running so we're going to be running this in the background uh for this first part of the session which is just the first notebook we don't really need aperture TV we're just going to be using um byy torch and open CV but apertur TV installs all the dependencies so that's why I'm running this and once that installation is done we should be good to go so the first thing we're going to be doing is uh running an object detector so we created a class for for your convenience here so that you can analyze and use it later for um your own purposes and I'm going to walk you through a little bit of how it works uh but we're just going to very briefly cover because essentially what we're using our pre-train models and we're going to be using open CV and peill to just display the images and run run run the different um run the different uh run the different models over the images so um as a starting point we're going to read an image from file we have a couple of images here just for testing so we have this image of a tennis player and we have this image of a pizza on a table um and we're going to be running some inference just to get started over those images so the first thing we need to do is just read the image from file in this case and we need to do some color adjustment this is just uh opencv Machinery in the way that the network is configured and then these are the three networks that we talk about that we're going to be using these are the different choices that we'll have for running the for running the um these are the three models that we're going to be running and the first thing that we're going to create is this bounding box detector and this is part of this uh INF for. Pi file that has this class and I'm just going to very briefly walk you through it has a very basic Constructor uh and here we specify the models that we're going to be using and we're going to be using pip torch so what we need is the uh these packages which already come with these models already pre pre-trained and uh we can optionally use GPU uh here if if it is present and this is all you really need to set up the model so you you see how simple it is you just use what um because we're using pip torch for this and they already provide all these pre-trained models we can just uh instantiate the the the classer is going to be running the model and and the detection with just very few lines of code so that's what we're going to be using and that's pretty much what happens when we create this detector here so on our first step we're going to just take this image of uh the tennis player create the detector and run the uh run the inference function which is going to run the detection and detect all the different objects that are present on that image so let's go ahead and just run that um and as you can see we have this image of this tennis player here uh and there are only two detections with their confidence so we have a person in a tennis racket so far so good uh we are trying with Retina net which is supposed to be uh the model that has a good balance between accuracy and speed let's just for the sake of the example change this network and let's run something else let's see if there is any difference um it doesn't look like there is much difference maybe the it looks like the confidence of a person is higher in this case um but yeah this is just one image let's try with another image here let's try with a pizza image and see what we get let's try again with routina net and let's see what we get um all right with ratina net we get a label for pizza we get a label for dining table and we get also detection for a fork you can see that the confidence are uh especially the confidence on the pizza detection is not that High um but but that's part of the nature of of of these different Networks and um the important thing we're going to be doing today is doing some evaluation of all these different models uh so that we can iterate and then you know later on this is part of a larger pipeline where we train a new model we test how well it's doing and then we do this iteration process again so uh so far so good we have the model app and running and now what we can do is compare the performance of the three models we could do something very quick um let me ju just to show you what I'm I'm going to do is run uh all the different detections and let's use the image of the tennis player first so I'm going to uh just run this cell again and now I'm going to run the three detectors over the over that same image let's see what we get so the rest net model took 1.3 seconds the the mobile net took 0.08 it is much faster and then the retina net took about the same order of magnet the same order of um the same amount as reset pretty much so let's actually show the different images to see whether or not um we are getting the same the exact same labels um so with rest net we are getting both the person and the tennis racket with mobile net same and and with Retina net same so so far it looks like uh at least for this image uh both of the networks are giving you know giving us a really good detection let's try with a pizza image and see what happens so we're going to switch this to the pizza image and we are going to compare the performance of the Tre of the three let's see what we get uh all right in this case we get for rest net look that the pizza is not detected or at least it's not detected with a confidence of at least 50% so that we consider a good detection and for the case of rest net we only have a dining table and a fork and for the case of mobile net we only get oops where did it go here in the case of mobile net we also only get the dining table and the fork but in the case of ratina net the pizza was detected with 56% um uh confidence so we can we can already start seeing how depending on the image depending on the content of the image depending on the lighting depending on a lot of factors some of the models will work better than others right and in some images the models will will perform the same but in some other images the models will perform completely different and that is the analysis that we want to do so just to get an idea um here what I'm going to do here is just run uh infer uh infer on a loop on a number of images remember that we're reading the images from the file system here it's just the image that is right here present here and uh we are the the image is loaded in memory and because we you know we are training at um at certain speed right now we are doing somewhere between 10 and 13 images per second uh and this this is an interesting uh piece of information to remember because later what we're going to be doing is retrieving the images in real time from aperture TV and we're going to see how well we're doing in terms of the images per second and we're going to see that there is not much difference and I'm going to explain why uh but for now keep in mind that we have you know around these numbers of course if we change the uh the network for something that uh for a network that takes more time to run the inference the number of images per second will drop because we are pretty much bottleneck by the inference pipeline in this case for the slower uh for the slower networks it's we're doing that about one image per second but on the fast Network we're doing 10 to 13 images per second and these are numbers that are specific to the Jupiter notebook and this environment which is testing environment in real life you would deploy this you know on a cluster on machines with gpus and it's going to be batch processing you're going to do a lot of batching and prefetching so you can get much higher speeds but um just for the sake of uh this experiment we're taking the numbers that we're getting um when we're doing the operations here so okay you you get the idea here so far so good and so far what we have is this detector class that uh using pytorch implements uh runs a different pre-train models these are the again the three pre-train models that we're going to be using uh now we have an idea that some of the models work better than others in certain images for certain labels and we want to do a more larger scale evaluation to see how well these model are performing on a on a much richer data set right and with that we're going to jump into the second uh part which is uh this notebook two where we're going to be using aperture DV to retrieve images from Coco data set directly and running the inference and the the object detection over the images and um Computing a metric of how well they are are uh when compared to uh ground truth so uh you can go here to this link where um where you're going to be able to log in into aperture DV in your case um you can use the credentials that are provided a little bit below on task one uh the user is researcher and the password is this thing that you can copy paste so you should be able to log in by coming here researcher and you will be able to see this thing that I'm seeing uh right here uh I'm going to be using this one that has a little bit more uh permissions because I'll be running a few things so here what we have is um the aperture DV uh web UI that show us all the data that it's already prepopulated on the database so we ingested the Coco data set here so we have the full Coco data set which is composed of founding boxes polygons images feature vectors that we added and a few other things that um we have been adding when when trying this object detection model um aperture DV also offers a way to very quickly search and look at the images so we're going to be running a really quick query here where you can see this image that we were playing with the image of the tennis player with uh ground truth bounding boxes these are the bounding boxes that are uh uh that they were generated by a person who actually did labeling so there are high quality we're considering these bounding boxes as ground truth and we're going to be retrieving the values of these bounding boxes and compare them to what we're getting out of the um out of those different three models um we also have the image of a pizza here where we have the you know we have the ground truth bound box for the pizza detection the fork and the dining table as well what we're going to be doing essentially retrieving this information and retrieving the information Associated to pretty much all of the images in the data set because we have the ground truth bounding boxes for all of them and we're going to be comparing them to whatever we're getting uh from those models that we're going to be running all right so let's get started um this is kind of like an overview of the data that it's there and our goal is going to be to evaluate and improve an object detection model so this is a very common use case you need to build a model because that model needs to run inference and make predictions either about what's going to be the next movie that a person is gonna want want to watch want to want to watch what is going to be the you know the the product that is likely for a person to buy if they bought certain product or we're going to be you know um all of this inference that runs uh we need to do evaluation of how well they're doing in the case of object detection with bounding boxes and because we have ground Truth uh on the database we can just compare whatever we're getting on the detection um against what's on the database and come up with a metric to say whether or not they are doing well so for example uh well in uh we're going to go over this steps uh essentially we're going to generate a data set using aperture DV and pytorch we're going to run object detection over the images on the data set this kind of like this the the Second Step here uh the object detection is going to generate a set of bounding boxes that are going to be uh the the infer bounding boxes the detections we're going to send those back to aperture DV so that we store them there and then we're going to do uh uh we're going to um oops we're going to run uh compute the performance of for each image so for each image we're going to see how well the model did for that image based on the detections we're going to compute a global performance for uh for a specific model and we're going to store that in aperture DV and the idea is that the uh in this workflow we are just GNA improve the model that involves training this is a step that we're not going to cover today because involves training and that takes a lot of time uh and the process is essentially running all over these steps and repeat and doing iterations over until we get to the performance of a model that we're happy with so we're going to be working on all these different steps the first step that we're going to be covering is the creation of a pytorch data set and the way that we're going to create this pytorch data set that we're going to be using is through aperture DV so we are going to create a data set object that is going to be fetching the images from aperture DV as they are needed let's go right into it so the first step is to create a connection to the aperture D instance so the process is pretty much the same what you just did if you were uh following and if you had a chance to go to the web UI uh it's doing that loging but we also have a python uh python module for aperture DV that allows you to create connections and run queries against aperture DV so that's going to be the first thing that we're going to be running and the way that that works is that we just need to import um the connector and we're going to be importing some of the things that we're going to be using later like a p torch data set and um ways to display images and we're going to connect to the DV right very straightforward we just specify a host and uh user and a password so let's run this everything went well and the first thing that we're going to be running is uh the first thing we're going to be doing to test the connection is running a very simple query now um aperture DV has a query language that is specialized to deal with uh visual objects as first class citizens so we provide functionality to retrieve images to store bounding boxes to retrieve bounding boxes to store feature vectors um or you know uh embeddings and run nearest neighbor computation our our functionality is uh we have a lot of functional you can visit our documentation website where you will find more information about the query language and um and all the different options that we have for example for doing image and um and uh image operations and we also have a YouTube tutorial video of one hour that will cover all the basics of the of the aperture DB API uh but for uh for this goe along session I'm just going to show you how some of the queries work and we're going to be running them and you can uh investigate a little bit more later about exactly the specifics of these queries the most basic query is just get status so we're just going to be running a query that is going to retrieve whether or not um we can connect and everything is fine so let's run this so it's very quick that's why you will see let me clean the outputs and we run that and the query is very quick because it's just a status query and we're running version uh 19 of aperture DV which is exactly the same as we are seeing on our status page here so so far so good we were able to connect to aperture DV and we should be ready to go um the the second part uh that we're going to be working on is retrieving images from aperture DV because we need to uh learn how to retrieve an image and maybe learn learn how to retrieve the bounding boxes associated with the image so that we can create a data set and we can use that data set to um run the inference pipeline so uh as I was mentioned before this is a very short query that will allow us to retrieve an image from aperture DV so in this case what we're going to be doing is uh running a fine image and we're going to constru strain that based on the ID right in this case I know that this ID corresponds to that image of the tennis player I can come here and actually show you because if I open this image you will see that one of the properties associated with the image is this ID which is the one that we're going to be using for filtering the image so uh what this sare is doing is just retrieving that one image that has that particular ID and we just go ahead let's let's just go ahead and wrun this and as you can see we get a response and then we are displaying the actual image this is the image that we're retrieving from aperture DV um and we're getting this as a response which is also you know same as a query it's a piece of Json let's make it a little bit more interesting and let's retrieve all the metadata properties associated with that image we can do that with um with all properties equals true so if we run that what we're going to get is all the metadata information associated with that particular image so it's this is the same that we are seeing on the web UI really um and that's it this is how easy it is to retrieve an image from aperture DB and from then on everything that we do is just going to be variations of this queries uh we're going to running queries that are a little bit more complicated and more Rich because we're going to be filtering images by their content based on the ground truth labels that we have uh and we can create a data set out of that so let's go ahead and do it um as for the next thing um we mentioned that whenever you uh so we have the images and we have some ground truth bounding boxes associated with the images right so let's retrieve in a programmatic way all of those bounding boxes associated with image so in that case we're going to be running a queries that is slightly different uh this quer is going to be running the same fine image as before we are declaring a reference that we're going to be using later again this is uh internals of the API uh but we are doing the same thing just retrieving this specific image and in this particular case we're also retrieving all the bounding boxes associated with the image so if we run this query you will see that we are getting the bounding boxes and we are getting all the properties associated with the bounding boxes including the label which is person so we have person and tennis rackets as label and each bounding box also has a source The Source in this case is ground truth because that's the only information that we have for those images on the DV and we also have a I think we have a h confidence somewhere no the confidence all right the confidence we're gonna whenever we compute the whenever we run the detection we're going to get confidence for both uh for for those images so so far so good what we have done so far is being able to retrieve these images from uh from aperture DV that are running uh on this uh CCO uh on on this URL that we shared in this particular case that database only stores the Coco data set but you can store as many data sets uh as you want on on an aperture DV instance uh the next step is going to be to run a query and generate a pipor data set so that we we can use that pytorch data set either for running inference or in the future for running training it is the same uh the same object so what we're going to be doing here is doing all the Imports for pytorch and everything that we need and we're going to be using the pytorch data set class this is just a wrapper class that will generate a pytorch data set that it's available to use in pytorch very easily so in this case what we're going to be doing is uh creating a data set out of a specific set of images each image on the Coco data set has different uh different properties and one of the properties is the license the license is just you know um corresponds to one of the specific created common licenses it's just an integer value but what we're going to be doing is retrieving all the image that have in in in the case of uh the Cod data set all the images that have a license are images that are um that have the corresponding bounding boxes so we're going to be filtering based on uh whether or not the images have that particular property so in this case the query that we're going to be running to generate a data set is going to be uh running a constraint based on the license value again this is arbitrary and later on we're going to be creating other data sets based on other things we're going to limit to 100 just for uh to a th just for demonstration purposes and I'm going to and um and here we're not we're not really going to be running the query we're g we're just going to be giving this query to the pytorch data set object on aperture DV and this is going to generate a data set object now the data set object is something that then we can give to a data loader in pytorch this is terminology that is specific to python uh I'm sorry to P torch but what data loaders allows allows you to do is to uh specify a data source and they provide multi-processing and prefetching as part of the um as part of this class so that when you are running inference or running training using a model um as you're passing a set of images through the model you in the background are retrieving the next batch of images that you're going to be passing through the model and usually because the bottleneck is on running the inference or running you know crunching the numbers doing all those you know Matrix multiplications when when you're running the network um usually the bottleneck is on on that Computing side so we have plenty of time to go to the file system to memory or even over the network to retrieve uh the next batch that is coming and this is why we we have this uh abstraction provided because it's very useful to be able to express everything as a pytorch data set and then use it for running training or difference as you can see that the amount of code that you need for that is uh very little this is all included in our python packages and all you need to specify is a query like this and what you're going to get at the end of that is a ey torch data set and as the images are needed and uh are are requested the data loader will take care of doing the prefetching and have all the images ready when you need them so let's just run this here uh as you can see is very quick why because all we're doing is running a metadata operation on the database and we're not retrieving any images at this point the images will be retrieved when they are needed when there is a a as we start accessing the data set and they're going to be prefetched so we pretty much we're not going to notice that the images are coming over the network um just for demonstration purposes I can um in reality if we just comment this out and we rerun the query you will see that the actual data set is 100,000 images 123,000 images right so this is the total number of images and the data set is already created and ready for us to use even though it has a 100,000 images and again the reason why why why it works so fast is because all we do when we create a data set is a metadata operation and as the images are needed they're going to be fetched from aperture DV um so it's very straightforward to start using you don't need to download any data sets you don't need to download labels and manage you know S3 objects and CSV files with metadata everything is instored in uh uh in aperture DV and ready for you to use so we're down here let's put the limit back just um just for the sake of the demo and um now that we know how to retrieve and create a data set from aperture DV uh and access the images and the bounding boxes and now that we also have a detector we have three different variations of a detector based on these three different pre-train networks we can start testing we can start testing how well this is performance and now kind of like the sky is a limit so um uh the first step was to you know this this is the pipeline that we're going to go over the first step was to create a pip torch data set which uh uh it's done we're going to be doing some variations if we have time at the end uh the next step is running the detection we have everything in place to run the detection uh we need the what what we need to do is make sure that we ingest the bounding boxes into aperture DV right so we have this step clear we just need to build the Machinery to run the injection of aperture DV as you can see here we only have the ground tro bounding boxes right we don't this is the only set of bounding boxes that we have for that image um but we know how to run the detector over the images so what we're going to be doing now is I'm going to set up a detector the mobile detector and I'm going to use the data loader this is the the image is being retrieved from aperture DV as they are needed and I'm going to just run an example to see how many images we can um we can process uh uh we can process just to get an idea and as you can see we are between 10 and 13 images per second which is the same that we were seeing when we were running uh the detection of on on on a local image right and this is kind of like the most interesting uh thing and because we are using pytorch and the data loaders that will take care of doing the prefetching in different processes the by the time that we need to pass the image through the network the image is already there even though it was retrieved from the system and even though here it's been retrieving over the Internet so you see the potential here you no longer have to C uh to keep your own copy of the data set you can have the data set on a shared uh database instance and all of the people in your team can be interacting and using the same data set at the same time and retrieving the images um as they're needed so um again the caveat here uh these numbers are not very you know uh scientific but the uh uh this is just to to show a point but in reality again you'll be running uh a a testing stage and when you are doing you know you're trying to learn what's going on you're going to be using a jupyter notebook and you're going to be using all of these things and you will notice no difference if you're retrieving the images from the network and once you're done and you feel confident about the model you will end up deploying this on a cluster that has gpus and it's going to be running this uh much faster at the same time the the database is able to respond to you know hundreds of thousands of qus per second so even if you have a more intensive workload you will be able to prefetch and retrieve those images um at the speed that they're needed because again the bottom leg ke is on the processing on the on the Computing side so you have plenty of time to go and retrieve uh data from even from the network um okay so far so good so we run over the data set here using the data loader and the other thing that we can do is just we have this data set so every time that we try to access an image on the data set that image is going to be fetched from aperture DV so let's go ahead and run this here uh this is just showing the image corresponding to that uh showing the seventh image to the data set which happens to be an image that you will be familiar with uh which is our tennis player right um and let's say we just want to access the eighth element instead which is going to be another image familiar to you which is going to be the um the pizza and we're showing both the pizza and the pizza with the detections of mobile net here right so now what we have to do so we have a data set we're retrieving the images from aperture DV let's go back to our map here so um we are ready to run inference over a large number of images over the entire data set and push the images into aperture DB um I don't uh in order to do that um that is step three we're going to compute the detection for images on the data set and push all those detections into aperture DV so this is done by this file this file [Music] um uh we're we're leaving this file here for for you to use later uh but you don't have the right permissions to uh write stuff into aperture DV that's why um I'm going to be running this on a separate thing but here you can find all the logic that is needed and I guess the most important um thing to look here is this we're going to find an image and for each detection on that image we're going to create a bounding box right we have we have this uh API that allow us to create bounding boxes and Associate them to images so pretty much what we're going to be running is um a set of queries that look like this now I'm going to be running these queries in the background um because I have right permissions on the database and it's going to take a minute to run all those thousand images through the detector and back uh and insert the information back to aperture TV so let's go back here to our step four uh to step three so while step three completes so right now we're we're running um our detection model that that we learn how to use we're retrieving the images from aperture running the running detection and pushing those bounding boxes into aperture DB um and we're going to be done into this stage but now we have to compare the bounding boxes we have to learn whether or not those bounding boxes that we're getting as a result of the detection are good or not right um and that is part of the you know this overall pipeline we always want to be improving our models so that their detections are better so we need to come up with a metric on how well uh those detections are doing and uh for the sake of the code along session what we're going to be doing is uh generating a metric based on something that it's very commonly used which is the intersection of Union so the intersection of a union here here you have a very good example intersection of a union so you have the the the ground truth bounding box which is the green one and the predicted bounding box which is the red one this is an example and the intersection of our Union is computed in this way is the overlap area of the bounding boxes divided by the um by the area of the Union of the bounding boxes so if the bounding boxes are exactly the same the intersection of a union is going to be one and if the bounding boxes do not overlap at all the intersection of our Union is going to be zero and if there is some overlap that will that will generate a value between zero and one and that is usually what it's commonly used for um establishing whether or not a infer bounding box infer detection is good when compared to ground truth so in order to do let's wait a so right now we have let's see what's going on we ingested the new bounding boxes and if we come here we'll probably see yeah do you see that now we have a few other bounding boxes already there that that is the result of the detection so we run it for a thousand images it took um less than a minute uh in in this environment we can again go much faster if we do this on a deployment on a cluster uh but right now we have the new detections of the bounding boxes and we're ready to compare them so we prepare a function for you which is called compare bounding boxes and it's based on a functionality and apertur be called intersection of Union um let me show you very quickly how that works on aperture DV you can run custom queries um first of all let's check the bounding boxes that are present on the system so we have uh 800,000 bounding boxes for ground truth and we just have a few of the bounding boxes for uh mobilet this is because we just process a thousand images right we can we can get all of them if we want uh but we're good to go we're good to go we have those bounding boxes ready the the next thing that I wanted to show you is how you can do intersection over Union and what that means so what we're going to be doing here is finding an image we're going to constrain by the unique idea of an image in this case I think this is the tennis player and then we're going to be finding the bounding boxes that are ground truth and we're going to find the bounding boxes that are mobile net everything associated with image all of this is part of the same query and then we're going to run the region intersection of a union operation aperture DV which is supported natively that means that for every image we can run a query that looks like this where we compare the ground truth against ment or pretty much any other um um any other network uh that we evaluate and we're going to get a metric on how well those bounding boxes are so after running the query I get the result so the result shows me the uh all the bounding boxes and as you can see here for ground truth we have a person and a racket and then for mobilet we also have a person and a racket right but when we compute the intersection of a union this is what we get right and the highest value here corresponds to the maximum intersection of a union which we can assume is the same bounding box um that has been predicted but you see that it is it is not one is 0 87 and 67 right they're not perfect that is what we can uh do with but just by running one query aperture DV after we have ingested all the all the metadata uh we can get a value for the intersection of a union and know exactly how well um our detector is performing so what we added here on the notebook is just a a small function the function is on this infer dop file that we shared with you and what this compare bounding Box is doing is pretty much running the same query that uh we just showed you and what he doing is iterating over all of the bounding boxes in the image and Computing the average score right so let's see what that looks like uh let's go back here and let's run this for the tennis image and from the piz uh for the pizza image and let's see what we get so as you can see here are the bounding boxes that are detected the person um and the ground truth labeled person is the same as the predicted label person and then uh the ground truth racket tennis racket and the prediction level tennis racket are the same so we're going to average those and that give us a score of 77 just for this image right so the same when we run the same for the pizza image we see that we have 90% for fork uh we have 84% but we have a ground truth of pizza and the prediction label is bold so this is a wrong detection right um and then we have the other detection which uh overlaps almost perfectly which is the dining table which is detected by both but the score is much lower because this is a detection that was missed even though it's 84% accuracy but it's on the on the wrong label right so uh that's why the score even though we're averaging all of this we're not taking into account this value again all of this is arbitrary and um it really depends on the use case how you want to do this um but we get all the intersection of a union from aperture DV so we can do pretty much whatever we need after that um and this is this is how we compare for one image now we can uh um what we're going to do is for mobilet we're going to create a new data set where we're going to find all the images that um that has a bound that all images that have Bounty boxes with uh this Source we're going to create a data set for that and then we uh let's see let's run that so that give us 9 58 images and not a thousand because not all of the Thousand images that we uh run the detection on not all of them had bounding boxes some of them didn't have any bounding boxes and any detections at all and now what we're going to do here is a little bit of python and threading so that we run things in parallel um we're going to be running uh because here we're going to be running multiple requests into aperture DV to retrieve those intersection of unions and we're going to compete um uh compute a global score for that model so let's run that let's see what we get and again this is you know pretty much a thousand queries that are going to be translated uh that are going to be sent to aperture TV and give us a score an overall score so if we average the score of all the images of all the detections for mobile net we get 0.36 which is pretty low right pretty pretty low so what I'm going to do is um in parer I'm going to be ingesting um another set of bounding boxes that will hopefully be uh better but right now we have the global uh value for for the mobile net model um and we're ready to go to the final step which is so we're already here Computing the performance model and storing that model into aperture DV um how do we do that uh we're going to we're going to do something where where we're going to um find all the images that have a bounding box with that model here we get the images and we're going to connect all of those images to a new entity that we're going to be adding let's call it object model detection code along right and we're going to store the model and the score right the internal sof the query again I'm going to leave it for later but we run this query oh of course we're not authorizing to perform these actions I need to run um I'm going to run this query somewhere else because I have right permissions give me one sec and and all right so we run this query and as you can see here we um we have an object detection class which essentially links to all the images that were used to comp compute the performance of that of of that particular model and I guess another thing that we can do is I have a query here that is going to run and find all the models that we have and all the respectives um score so this is something that I pre-computed beforehand we were getting 0.36 for mobile net but for retina net it was slightly higher right 0.46 um because the model even though it's the model slower but it give us a better performance in general right um so the other thing that we're going to run here we computed for mobile net let's change that and um let's do it over ratina net just for demonstration purposes um we have a few less images on the sorry when we run the query we have a few less images we have 921 um it doesn't really matter it can be hundreds of thousands right because they are retriev as they are needed so it's really fast to compute and the final part that we're going to be doing is Computing that um Global score for that specific Network which based on what uh we computed beforehand it should be around 0.4 okay good that's what we get and um yeah the cycle will be the same right after we um after we compute the performance model we store that on aperture DV and we store both both information about the model and all the images that were used to compute the performance of that model and we have our evaluation ready we are ready to figure out what's going on why the why the detection is not as good as we want run some training inference um uh improve the model and run this cycle again and with aperture DV one thing that you can very easily do is right now we're creating data sets based on uh or a particular value but we can also create a data set of images that have only 10 only rackets for example and this is going to be the final example that I'm going to be showing you where we can do we can create a data set using images um well the example that I have here is I don't want all of the images or or um I just want images of dogs and cats and I can maybe generate a data set for that specifically right so what I'm going to do is finding all the bounding boxes that have a ground truth with dogs and cats and retrieve all the images and here I'm limiting to 10 um uh just for demonstration purposes but out of all of those images we can get exactly those images that corresponds to either dogs and cats and use this image specifically for either running training or for doing the evaluation of the model right and again you don't need to hold this data set on your machine uh because the images can be prefetched as they are needed directly from the database and with that and slightly over time um I am gonna stop here for any questions thank you that was magnificent I have to say um whatever I see things like this it does feel like I'm living in the future I mean that was like some pretty powerful stuff going on there and yeah pretty simple to do um now we have a ton of questions from the audience before we get to audience questions I have one question view so I think one thing that really surprised me there was that the pizza was very hard to detect like now in my mind Pizza is a very common object it's got quite a distinctive sort of um visual appearing so can you tell me a bit more about what makes an object easy or difficult to detect um it's hard to tell especially on these models that are you know um CNN and they are not using any um we're using these models that are they have some ears uh you know they have been for there for for some time so they're not the best models that are out there usually what people do is take those models um and use transfer learning from those models and retrain them for their specific use case let's say you need to run a detector for finding chairs and tables um so they will take this model and uh retrain it with more specific data sets to improve the uh improve the performance um it depends on the lighting it depends on the uh example set that at training at training time you um the network saw so it depends on on on a lot of factors what happens usually on this uh and specifically in Coco and the models that are designed to to train on Coco is that Coco is a research type of data set so it has labels that are all over the place they have images that are coming from um these are images that some of them are coming from the yfcc data set which are people are images that people upload so the lighting is all different um there are a lot of different conditions um and the other thing is t

Original Description

Advances in deep learning and database infrastructure have allowed the analysis of images and videos to become mainstream. Image recognition is one of the most important tasks in computer vision. In this session, you'll use ApertureDB to access the COCO dataset and run image recognition using Python. Key Takeaways: - Learn how to use image data stored in ApertureDB. - Learn how to perform image recognition. - Learn about workflows for image and video data. Code Along With Us! https://bit.ly/3sx2zAO [DATASET] The Common Objects in Context dataset is provided in ApertureDB, but you can read all about it here: https://bit.ly/3u1FeaV [DOCS] ApertureDB docs, for further learning https://bit.ly/3QrexUH [WEBINAR] AI for Visual Data: Computer Vision in Business: https://bit.ly/47qqOQd [COURSE] Image Processing in Python: https://bit.ly/470gsXu [CODE-ALONG] Sloth or Pastry? Using PyTorch and Deep Learning for Image Classification: https://bit.ly/40znf88 [TUTORIAL] What is Image Recognition?: https://bit.ly/3FNviEB

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from DataCamp · DataCamp · 0 of 60

← Previous Next →

SQL Server Tutorial: Date manipulation

SQL Server Tutorial: Date manipulation

R Tutorial: Intermediate Interactive Data Visualization with plotly in R

R Tutorial: Intermediate Interactive Data Visualization with plotly in R

R Tutorial: Adding aesthetics to represent a variable

R Tutorial: Adding aesthetics to represent a variable

R Tutorial: Moving Beyond Simple Interactivity

R Tutorial: Moving Beyond Simple Interactivity

Python Tutorial: Why use ML for marketing? Strategies and use cases

Python Tutorial: Why use ML for marketing? Strategies and use cases

Python Tutorial: Preparation for modeling

Python Tutorial: Preparation for modeling

Python Tutorial: Machine Learning modeling steps

Python Tutorial: Machine Learning modeling steps

R Tutorial: The prior model

R Tutorial: The prior model

R Tutorial: Data & the likelihood

R Tutorial: Data & the likelihood

R Tutorial: The posterior model

R Tutorial: The posterior model

R Tutorial: An Introduction to plotly

R Tutorial: An Introduction to plotly

R Tutorial: Plotting a single variable

R Tutorial: Plotting a single variable

R Tutorial: Bivariate graphics

R Tutorial: Bivariate graphics

Python Tutorial: Customer Segmentation in Python

Python Tutorial: Customer Segmentation in Python

Python Tutorial: Time cohorts

Python Tutorial: Time cohorts

Python Tutorial: Calculate cohort metrics

Python Tutorial: Calculate cohort metrics

Python Tutorial: Cohort analysis visualization

Python Tutorial: Cohort analysis visualization

R Tutorial: Building Dashboards with flexdashboard

R Tutorial: Building Dashboards with flexdashboard

R Tutorial: Anatomy of a flexdashboard

R Tutorial: Anatomy of a flexdashboard

R Tutorial: Layout basics

R Tutorial: Layout basics

R Tutorial: Advanced layouts

R Tutorial: Advanced layouts

Python Tutorial: Time Series Analysis in Python

Python Tutorial: Time Series Analysis in Python

Python Tutorial: Correlation of Two Time Series

Python Tutorial: Correlation of Two Time Series

Python Tutorial: Simple Linear Regressions

Python Tutorial: Simple Linear Regressions

Python Tutorial: Autocorrelation

Python Tutorial: Autocorrelation

R Tutorial: The gapminder dataset

R Tutorial: The gapminder dataset

R Tutorial: The filter verb

R Tutorial: The filter verb

R Tutorial: The arrange verb

R Tutorial: The arrange verb

R Tutorial: The mutate verb

R Tutorial: The mutate verb

R Tutorial: What is cluster analysis?

R Tutorial: What is cluster analysis?

R Tutorial: Distance between two observations

R Tutorial: Distance between two observations

R Tutorial: The importance of scale

R Tutorial: The importance of scale

R Tutorial: Measuring distance for categorical data

R Tutorial: Measuring distance for categorical data

Python Tutorial: Plotting multiple graphs

Python Tutorial: Plotting multiple graphs

Python Tutorial: Customizing axes

Python Tutorial: Customizing axes

Python Tutorial: Legends, annotations, & styles

Python Tutorial: Legends, annotations, & styles

Python Tutorial: Introduction to iterators

Python Tutorial: Introduction to iterators

Python Tutorial: Playing with iterators

Python Tutorial: Playing with iterators

Python Tutorial: Using iterators to load large files into memory

Python Tutorial: Using iterators to load large files into memory

SQL Tutorial: Introduction to Relational Databases in SQL

SQL Tutorial: Introduction to Relational Databases in SQL

SQL Tutorial: Tables: At the core of every database

SQL Tutorial: Tables: At the core of every database

SQL Tutorial: Update your database as the structure changes

SQL Tutorial: Update your database as the structure changes

Python Tutorial: Classification-Tree Learning

Python Tutorial: Classification-Tree Learning

Python Tutorial: Decision-Tree for Classification

Python Tutorial: Decision-Tree for Classification

Python Tutorial: Decision-Tree for Regression

Python Tutorial: Decision-Tree for Regression

Python Tutorial: Census Subject Tables

Python Tutorial: Census Subject Tables

Python Tutorial: Census Geography

Python Tutorial: Census Geography

Python Tutorial: Using the Census API

Python Tutorial: Using the Census API

R Tutorial: A/B Testing in R

R Tutorial: A/B Testing in R

R Tutorial: Baseline Conversion Rates

R Tutorial: Baseline Conversion Rates

R Tutorial: Designing an Experiment - Power Analysis

R Tutorial: Designing an Experiment - Power Analysis

R Tutorial: Introduction to qualitative data

R Tutorial: Introduction to qualitative data

R Tutorial: Understanding your qualitative variables

R Tutorial: Understanding your qualitative variables

R Tutorial: Making Better Plots

R Tutorial: Making Better Plots

SQL Tutorial: OLTP and OLAP

SQL Tutorial: OLTP and OLAP

SQL Tutorial: Storing data

SQL Tutorial: Storing data

SQL Tutorial: Database design

SQL Tutorial: Database design

Python Tutorial: Introduction to spaCy

Python Tutorial: Introduction to spaCy

Python Tutorial: Statistical Models

Python Tutorial: Statistical Models

Python Tutorial: Rule-based Matching

Python Tutorial: Rule-based Matching

This video teaches how to simplify image recognition using ApertureDB and Python, covering image data access, image recognition, and workflows for image and video data. It provides a hands-on introduction to computer vision and deep learning for beginners.

Key Takeaways

Access the COCO dataset in ApertureDB
Install and import necessary Python libraries
Load and preprocess image data
Perform image recognition using PyTorch
Evaluate and refine the image recognition model

💡 Using ApertureDB and Python can simplify image recognition tasks by providing easy access to image data and integrating with deep learning libraries like PyTorch.

🔒 Pro feature: Ask AI to explain this lesson →

More on: CV Basics

View skill →

Identify Horses or Humans with TensorFlow and Vertex AI

How to Build and Install OpenCV from Source | Using Visual Studio and CMake | Computer Vision

How to Build and Install OpenCV from Source | Using Visual Studio and CMake | Computer Vision

Building a Dog Breed Identifier App from scratch - DogNet

Building a Dog Breed Identifier App from scratch - DogNet

Aladdin Persson

Apply OpenGL Texturing and Camera Systems

Apply OpenGL Texturing and Camera Systems

Aerial Image Segmentation with PyTorch

Aerial Image Segmentation with PyTorch

How to Install Stable Diffusion - automatic1111

How to Install Stable Diffusion - automatic1111

Sebastian Kamph

Related AI Lessons

When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…

Learn how to build an AI-powered exam monitoring system using Computer Vision and DeepFace to assist professional certification exams

Medium · Python

When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…

Build an AI-powered exam monitoring system using Computer Vision and Deep Learning to enhance professional certification exams

Medium · Deep Learning

When the Camera Becomes an Exam Proctor: Building an AI-Powered Exam Monitoring System with…

Build an AI-powered exam monitoring system using Computer Vision and Deep Learning to enhance exam security and integrity

Medium · Cybersecurity

Your Face Is About to Become Your Phone Number

Indonesia's mandatory facial verification for SIM cards is a massive test for biometric identity verification at scale, with implications for developers in computer vision and biometrics

Marketing management for ugc net| Important topics of marketing management ugc net commerce dec 2023

Bhoomi Learning Centre~Dr. Muskan