Learning From Simulated & Unsupervised Images through Adversarial Training - TWiML Online Meetup

The TWIML AI Podcast with Sam Charrington · Advanced ·📄 Research Papers Explained ·8y ago

Key Takeaways

The video discusses a research paper on learning from simulated and unsupervised images through adversarial training, with a focus on generating realistic images of eyes and hand gestures. The paper uses a refiner network and a discriminator network to improve the quality of synthetic images.

Full Transcript

hey everyone its Sam here from this week in machine learning and AI what you're about to watch is the recording from our first ever online Meetup where Josh Manila presented Apple cvpr best paper award winner learning from simulated and unsupervised images through adversarial training we hold meetups monthly where we cover classic and hot papers and machine learning and AI and we'd love for you to join us if you haven't registered yet hop on over to Twilley Icom slash meetup to sign up enjoy the Meetup [Music] okay cool so uh yeah well thanks everyone for coming today I'm going to talk about the apples paper at cvpr which I attended the talk at a couple weeks back which is it was really cool learning through simulated and unsupervised images through adversarial training and this work is again by Srivastava and some others and I'm going to present it in my name's Josh so I guess we'll start what are we going to talk about first I'm just gonna introduce myself then I'm gonna talk about the motivations behind this paper behind some of my own motivations well I think it's cool and why like machine learning and the future will be done will be benefited by this some of the methods they use like the actual like the ideas the networks and some of the equations that went into it some of the experiments that they talked about in the paper if you did read it they noted two different experiments of an eye gazing experiment as well as a hand gesture experiment I'm probably just going to talk about the eye gazing experiment but I might go over to I might move over to the paper itself and show some pictures from the hand gesturing as well and then after that I have some like introductory discussion topic so they might kick some stuff off but again if other people have questions please post them in the slack I'll try to have the slack on on my phone right next to me so if any questions come up on the channel while I'm in the presentation or if anything seems unclear I'll try to uh I'll try to answer it and I'll be in the I'll be in the meetup channel by the way so just a quick introduction to myself I'm a machine learning engineer at our go yeah we're a self-driving car company and I work on more or less the perception side of machine learning for self-driving cars so really it's when you look at the world where as a car where as a person where they going stuff like that previous to this I worked on the uber self-driving car also in like a more machine learning type role for that I was at Facebook where I was building Hardware switches and servers it's very different to machine learning and AI but there's actually quite a bit of knowledge I learned from that that carries over and also Amazon where I was writing software I went to school at Michigan Technological University a degree in computer engineering and besides a machine learning engineer I also consider myself a roboticist where I solve machine learning problems through the eyes of like a robot assistants a lot of my background is more on the robotics practitioner level and for some of that reach for some of those so my background experiences is why I really enjoyed this paper so let's talk about some motivation so let's I guess first let's really define a problem right let's define this problem called eye gazing so let's say that you have an iPhone and you and the iPhone is like looking at your eyes and the is like looking at your eyes to the camera and you're trying and you're moving your head around and it finds your eyes but it's trying to figure out where on the screen your eyes are looking this is essentially the eye gazing problem that we're talking about so you get all these little these images over here of like eyes and you're basically trying to find where in space these eyes are pointing now it's really hard to get this data right like you're gonna have to get data to Train like a machine learning algorithm or to or train whatever whatever algorithm you want whether it's machine learning or not you pretty much have to like take pictures of a bunch of people's eyes and also calculate where they're gazing right you'll have to like they'll have to look at a wall or something and actually and you'll have to figure out exactly where they're gazing and that takes a long time to get data so this bottom row over here this is synthetic data so imagine if you like took like a video game basically and took pictures of people of like like 3d characters eyes and video games that's what all these are and it's a lot easier to generate that sort of data right because it's it's them that synthetic data I can make the 3d video game character do whatever I want look at whatever direction I want and get perfect I gazing characteristics but the really annoying part is that these like up here these three eye gazing image examples these look real as a person like these bottom synthetic images they look pretty good but I can you can tell pretty easily they're fake like the skin texture is like very perfect there's no umber there's almost no imperfections and what's happening and when a machine learning algorithm tries to learn all these synthetic images it might do poorly on real images so you can always mix real and fake images but then you get kind of you'll get kind of this bimodal distribution you'll get a uneven data and you're in your data set and it might confuse the algorithm so really what what is this motivation what do we want to do we want to take this and dedicate it we don't want to throw it out but can we make it look a lot like the real data somehow Queen can we throw the synthetic data through an algorithm and make it look extremely realistic because we have this data generator already we might as well make use of it and we have some pieces of real images and we want to use those some real images so how do we how do we put these together well that's what this gaen is for so before I before I give you the answer one of these rows is actually is real real images of eyes and the other row is actually fake image of eyes so I don't you take a second try to guess which is real and which is fake and I'm guarantee it's gonna be probably pretty hard because they both look very realistic so I'll give you a second if you want to really like try to stare into the images because I couldn't figure it out so in three two one this top is real miss bottom is fake and this is really good actually like so inside of the paper Apple actually surveyed a bunch of people and gave them these real eyes and these fake eyes and they actually like people like statistically speaking could not tell the difference between the between the real and fake almost like what they define as a visual Turing test which I think is I think is incredible that's really that's really awesome and also for machine learning that's perfect if people can't tell the difference then I can I can add all this data into my algorithm and and train much much more efficiently with much more data I have hopefully a better algorithm so really whether it's all boils down to is can we create a network or an algorithm or anything to refine these fake images so I just want to give like a little bit of vocabulary first these computer-generated images on the top here these are what I will call synthetic images these are these are coming out of like some 3d video game character algorithm right and these bottom here these come out of the the algorithm talked about the paper these are called refined images and then real images which I haven't shown yet are actually just real right that's what we saw before like these are these are real images okay and then something else that's really really neat about this before we get into those specifics notice how the gaze direction in the synthetic and refined images has been preserved like this this gaze Direction kind of straight this direction kind of straight this directions looking at it it's a like right in the middle this one's not as good but it's still it's still not terrible this one's looking up and this one's also looking up and this is critical in fig in this this transfer because if you change the gaze direction you or find the image then it doesn't matter anymore that the label for that it synthetic for that refined image is it's pointless so it was really important that they made sure the the images had to match at least in a gaze direction or whatever their business faces so what they ended up coming up with it was this thing called sim gap so and the network that they used to actually create these refined images is what's called the refiner Network just it's it refines so essentially a refiner takes in synthetic data it looks at real images and but moobot a boom kids refined images and notice how these these real images are unlabeled it only looks at real images just to get like what an eye looks like so you don't have to label these real images which is really neat like I don't have to get all that the fancy gaze direction anymore but of course like how do you actually trainee you're not gonna have a person like telling the refiner Network hey this is a good image this is a bad image how do you actually like train a refiner network such that it knows what it knows it's done a good job well really we I the beginner the generative adversarial Network does what I just did with you with we can create this discriminator algorithm or to screw in our network whatever whether it's a neural network or not doesn't really matter that tries to decide if they're refined image is real or if it's fake or some random image is real so as like a high-level pipeline right so you have some simulator some 3d video game simulator generates a synthetic image goes through refiner creates refined image the discriminator is given some random leak some just some random real image and it has to decide is there a find immaterial or is the real image real and then if the if it gets to the point where you can't really it can't figure out whether they're fine image is real or the unlabeled or the real image is real it doesn't know which one then you know the refiner is working so essentially the discriminator says hey i the discriminator tells the refiner which one it thinks is real and their founder takes on new accountant says oh i did a good job I pull the discriminator or oh I did a bad job I didn't fool it so really the discriminators job is to show the refiners job is to trick this discriminator and make and help it and help it learn what a good what a good a refined image looks like but the question is okay so this is like a very high-level overview of just how this pipeline works how do we actually train this thing what do you the lot how do we actually train their finer and what to the loss functions look like on this image so let's go over to the trainer finer so the refiner like I said the refiner finds some is trying to map the synthetic image to the refined image and also keep the now this is very important this is what the subtraction sign is up here it essentially as part of the refiners loss function tries to align as many I guess you could say pixel pixels as possible between the synthetic and they're fun to try to keep the gaze direction and that's just this bottom part up here this basically says the subtraction between the refined image and its own image and the l1 norm of the error plus what is this other pieces other pieces how well did I do did i trick the discriminator so we're in a highly different in a different perspective the total loss how bad they were finder did is the sum of two things how well did we trick the discriminator plus how close are we to the original image where this is is essentially the probability that the discriminator thinks it's a real image so you know if it's our it's a fake image so if this is one then or if this is very high then you know the refiner did a bad job and of course if this is very high they're finer did a bad job so we want to minimize both of these things together so now let's look at the discriminator so the discriminator so so far I've been talking about the discriminator as just this network or this thing that just gives out a single one or zero just is this is this image fakers this real but if you if you do that then what's it called yeah so if you do that the discriminator could be tricked by like little parts of the image like let's say there's a little part of the image that's that's not doing so well it might figure out it's fake but then the rest of the image looks perfect and a person might not even know that so what we essentially do is we the discriminative two Regents so it says like this part of the region there's a 20% chance the image is real this part of the region there's a 50% chance that the image is real or image is fake and this local loss basically this local loss helps the refiner or are sorry when I say dnr I refer to a discriminator and refiner this local us helps the finer from missing small details in the image so I can this perimeter can basically say hey refiner you're doing a good job making a fake I don't know pupil or this pupil part of the image but you're doing a really bad job trying to make a fake bottom left part of the eye or this little iris part so it can help the refiner just like tweak the individual pieces of the eye image so now what now what does it mean for the discriminator to do that what is the loss function on the discriminator well what's if we take a step back what's the discriminator doing the discriminating is this a real image or a fake image so the discriminator itself is doing well when it isn't tricked right lives primitive job is to not be tricked it's like the screener needs to be really really confident that it has not been tricked and it needs to give the refiner as much information of how bad it is possible so if we take the probability that the refined image is synthetic right this is this basically says is the is there find image fake plus the probability that the real image is real which essentially says do I am I sure about if the real image if I know the real image is real then the discriminator is doing a good job and if the discriminator knows this the refined image is synthetic or fake and discriminators doing your job but you want to minimize that you want to make sure this is small you want to trick the discriminator you want the discriminators to do really bad so let's see let's go to let's go to a higher level overview back over here I skipped this part just because we haven't gone into light because I already went into the lost functions of the refiner in the discriminator so I just want to revisit just the overall training of this and since this system so step one we sample a small batch of synthetic images where we just grab some images from the simulator and then we run these synthetic images through this refinery we get refined images and we calculate a loss on there and we train there fine we make it better now no we're not that we're not training the discriminate right now we can't do that or else because we have this like circle you can't you can't train in a cyclical Network so we're just assuming the discriminator has been trained it's all good and we're just using it as is right now we're training the refiner using the discriminate at the start the refiner create certified image it checks a different it checks the differences between the synthetic and our find it has the discriminator hey can you guess if this is a refined or real image the screener says yes or no based on different piece of the image gives that information to the refiner and then the refiner keeps training and training training we do like to I think they do two or three leaves some some small amount of loops after we do that then we sample a new batch of synthetic and real images and then we train the discriminator and now we hold the refiner constant because again we can't train the refiner and the discriminate at the same time there to make sure we train one of them at a time so we hold the refiner constant train the discriminator we basically run a bunch of a through the refiner again get a bunch of refined images get a bunch of like real images and try to make the discriminator as good as possible at classifying images as of course real or fake and we do that a few a couple times again I don't remember exactly how many times but there's only a few and then after we do that we go all the way back to the beginning and we repeat this process over and over until the discriminant ill one of two things are a few things happen one just some large amount of time to you could look at the refined images as a person right and give like a visual Turing test again and see how well they're doing or three another metric you could use which they show off in the they actually don't show in the paper as much but they show in their blog which I can post a link to you can actually tell look at when the discriminator is 50 percent right and 50 percent wrong it's their images were refined or real when the discriminator is completely fooled completely baffled and there's no point and keep in going anymore I guess there's a little bit of a point because there are fine we'll still gain some will still could still train off the difference between the synthetic and the real information but besides that you're pretty much done because we're using you know we're using this discriminator as the end-all is this thing is this image refined or is this image real so this is kind of like the overall training of this whole system and now I'm going to go now I'm gonna go into some more like some of the experiments that they did and again if any time while I'm talking you guys have questions please post in the Meetup slack group so let's skip over so let's look at some itraq experiments this is just one of the tables that they supplied in the in the paper itself you can see that this R / S is like with real mix synthetic data and these are all just different algorithms trained for the for the eye tracking task on real data and then notice how on the synthetic data yeah in the synthetic data does extremely well on the air let's see so we've got some image we got some questions popping it Sam s is the idea here that we're eventually going to be able to create a ton of labeled or fine images for each synthetic image so probably for now for each yeah so we're assuming that the so I'm sorry the question one more time is the idea here that we're eventually going to be able to create a ton of labeled or find images for each synthetic image so each synthetic image should create in a deal scenario like just one refined image what I'm what we're what we were talking about in this loop this is purely just training as soon as we know that the discriminator and the refiner are completely trained we don't need a discriminator anymore we assume this simulator is just going to be spewing out a bunch of random pieces of synthetic data and each synthetic image is then going to be refined and then used in some new data set so I don't know if that if that answers your question Sam okay but I guess yeah we can we can talk about that a little bit towards the end let's keep going yeah so yeah so again on the synthetic data yet so the synthetic data with this algorithm the eye tracking task performed very well what's it called it performed very well versus other cannons and convolutional networks and forests and whatever and this this algorithm with a convolutional network did did well even compared to other convolutional net works with real images no this is a CNN with you and you team multi-view I don't remember I think this is a data set because this is real images but I'm not too short but the CNN with multi-view did much unreal images did much worse than the CNN with the unity has refined image images although I'm not what I when I when I'm personally a little confused about is I don't know if this UT multi-view is a public data set if so I would have wanted to see them run the CNN with the with UT multi-view synthetic synthetically altered images but I'm not sure that's a question I should I could probably ask him although not to trail off too much but when I was when I went up to the booth to ask them questions at cvpr they said in a very Apple way like oh we can't answer that question we this is this whole public facing thing is new to us also so yeah so anyways of some more eye tracking experiments again like I said before people weren't able to tell the difference like I really this quote like I just copy and paste that I think is really cool in our aggregate analysis ten subjects chose the correct label 517 times out of 1,000 trials meaning they were not able to reliably distinguish real images from synthetic images or synthetic and they also did like I think somewhere in the paper they did another study where they directly asked the suit as the the same participants to distinguish between this and and the real images and they were reliably um with high percentage able to distinguish them so you know from a science from a statistical standpoint that this did work I mean it did work to whichever people they sampled but I mean as humans even with their regardless of how much they cherry pick for the paper it looks pretty cool like these are you can see again these are the synthetic images these are the refined images and these over here are some real images now I guess maybe this example is not as good fizzy notes I could I think I could tell a little bit difference because here's a blocky but he is all look pretty real like I I would not be able to tell the difference but yeah so I I think this is all all this is pretty neat something else I wanted to show you before we get into the discussion topics is when I switch can you guys see my screen if I like change screens real quick yeah we saw it okay you did okay good let's see so I'm gonna switch over to the paper I probably should have added this up thanks for saying yes I'm a slag so let's see where is it sorry about this I just remembered this I wanted to I wanted to include one more thing but Oh something I something else I didn't include the there were finer there are finer actually they they found out that they're the refiner needed a history of images not not just like a single image at a time and my algorithm I talked about before was slightly altered just because the algorithm also takes into account how well have I been doing so far and they're just and you can read the paper for more specifics but this image just shows that like they're the when the discriminator gets history how the images change and how they don't change how they change without history and you could see they're a lot better with or without with them without history and then besides that I think there was other thing all these hands stuff these are just the hand gazing art sorry the hand pose estimation experiment which is another experiment they did that yielded very similar results I think though that that's all I wanted to talk about right now so let me get into my discussion topics I don't know if I went too quickly but so some possible discussion topics that I have right now what else could this be applied to really I mean anything that anything where you know to start keep that off synthetic data is hard to come by this can be applied to anything confuse you please post that in slack again I'll try to clear that up like just as like a thought experiment what do you think would happen if you got rid of that extra loss near the top of the refiner network and then finally what happens when you add memory to the scrim inator I talked about that a little bit but for now we can leave that out because I didn't go too far into it so I guess real quick I'm just gonna before we get too close I'm going to look at the meetup stuff let's see joining that Cornish does anyone know if they trained on left eyes and right eye separately the refined image of figure six seems ambiguous to me is a nose from the bottom left or bottom right or did is that knows about a left or bottom right or did they were fine I learned to make it left-right ambiguous let's look so I know in the paper itself let's see no sorry it's that was a little bit of knowing two people yeah so here's a bunch of images so the question is is it left-to-right agnostic it looks like it so it looks like it does actually have left or right personally just because they you know you see this little I don't know what this is called like the inside of the nose but he liked the small part of the eye that you're in the nose but the synthetic data definitely had left or right so I would assume that the network would learn to differentiate that but I don't know I mean there is some notice here but yeah I can I can see how that might be a little ambiguous or confusing so the next question Teddy s is the history stored separately from their finer ie is there a new refiner created for each image pass there is not a new refiner trait created for each image pass it is the same refiner that gets is it sorry it is the same refiner that gets that gets trained I think I believe that the memory comes into play let's see yes so the memory comes into play as they I think they're just either I don't know if they're storing images or I think they're using like an LS TM layer to keep what's it called to keep a history of the weights oh sorry about that yeah they're actually using right I remember now you actually use a buffer of refined images to help with the refiner but the refiner stays the same every single time let's see what else we have @k meters on a walker oh yeah so I'm not sure yeah K mater I'll probably ask you in a little bit because you had like some visualization or a really quick out example that you could walk us through maybe we'll revisit that in a second let's see do you are else d'art dry L Wilson this seems like a good way to create realistic character images in a gaming simulation start with synthetic royalty-free image use the find a together from a very logical population create synthetic characters yeah no that's definitely it's definitely a pretty cool cool way to do it I don't like really just I think this we could be good for anything anytime that you just eat data could and it's hard to get data but I guess at this point we're into unless people have more just like straight up questions we're into a bit more of like discussion part of things so I don't know if Sam you want to talk for a second describe how you want the discussion process to go or if I should talk about it Oh actually I guess we have one more question did they compare performance against just adding dumb Gaussian noise and to synthetic images I don't know that might be good I don't think they did but like I think it would probably do a little it could do a little bit better but I don't know if I do perfectly better I mean am I so in my experience like I've come across that a lot where I just like I tried to make I've tried to make him just look more realistic I like changing the lighting or adding Gaussian noise or adding like some at changing the hue values here and there and they do do a little bit but this did like so like I guess from a practical perspective I have no date data I guess to back up what I'm saying for this but they just did so much better than the previous benchmarks I think that this would have been better than Gaussian noise but maybe that would have helped yeah so how is the gaze Direction fixed while generating data so that's what I talked about over here let's look at this are you yeah so when you train the refiner okay so when you're training the refiner you it takes into consideration and the loss function of the refiner the the absolute pixel differences like the softmax pixel differences between the synthetic and the refined image with the subtraction loss and that inherently for their application at least tries to preserve the gaze direction because you know the I guess I level right the the pupil is the pupil is dark so the dark pixel should be together in the work versus like the white pixels let's see I am a bit unsure of what is meant by adding history in the art I assume that it's a neural network the training an iterative DDR 2d the history start implicitly in are itself let's see I'm trying to remember exactly how they store the history I think what they ended up doing see history so I think what they actually ended up doing the ability of training by updating the stream using a history of refined image rather than the ones from the finder Network I think they just updated like they just had like a buffer of images that they have the discriminatory look at here we go let's see yeah so let's see um refined images with current art we create a mini batch Verde defined and I think oh I see as part of the mini batch Verde they they added some of the refined images back into that mini batch that it would iterate on so I would I would read the paper for some more for more specifics on that I'm sorry if I couldn't answer that as well as you want but from what this diagram sounds like and what from what I remember from the paper I shirt rather it a little bit more recently I apologize they create they skip that buffer for find images from previously and then they stored some inside of the mini batch that is created to train a discriminator so let's see is there a method for finding a hold on I'm trying to catch up on this question if anyone would like to ask a longer cute question via voice just type here and we can call you on for it oh sorry yeah so if anyone had if anyone has been reading if you type here into the slack then then I'll acknowledge you you love me yourself and then we'll add you to the discussion or you can ask question make comment whatever so so Dwayne corner says at gentie oh I think he just I think he just answered you yet it's my understanding is that ours changing over time and he is changing over time so as our changes G starts to women those changes anything that ever came out of our should always be label s inside equity however if D is only singing newer images from our then it stops seeing some of the older synthetic images yeah I mean that's pretty much it so history cools refining saying pictures on several times no history is just history just like keeping essentially a buffer of those of previously refined images and then having D run having D run it's a it's back propagation on them every so often from how from what I understand again I haven't read the paper in a couple days I've just been very swamped but yeah so and as far as history is concerned the paper goes into it a little bit more detail but let's see what else so yeah I think we have some more people who are asking questions some other things I can that are interesting about this paper is you have you also have these like synthetic and refined and Fellowes images and you can see that I mean I guess all the algorithm really did was just kind of make them a little bit blurry so in this case it wasn't as impressive which is why I didn't show it but it's still a really interesting paper yes correct so Dwayne corners just posted history is a random selection images that came out of are from previous iterations correct yeah yeah so Dwayne joins got it from unless I don't have it it's well but yeah and some other stuff I didn't talk about in the paper were just some of the implementation details but they're really for dnr but they're essentially just fully convolutional networks it doesn't the specifics of them don't matter as much Ocampo 22 or 9 oh sorry about that yeah I just missed it is there a method for finding the optimal size of localized adversarial loss for the D function occurs how the size on the sites s perimeters so I I'm also very nitpicky with this type of thing when whenever a co-worker comes up to me and asks like and he gives me an algorithm and there's like a magic number I'm always the one I'm always the first person to be like where did that magic number come from so it could I don't remember exactly where it came from but it could it could come from anywhere being they just picked it and it worked or they did k-fold cross-validation where they had like some sort of a validation set and they vary the size of those boxes until they max it or until they found the minimum amount of error between the sizes I'm unsure about what is considered to be the optimal size for those localization methods though usually they'll just say k-fold cross-validation in reality now a lot of people do it a lot of people say like this worked because I unfortunately but they could have done something smarter I don't remember exactly no problem so I think history is used to avoid our reintroducing artifacts that trick D in the past and then indeed forgot about them after a while when our to stop using them yeah that's like that's some decent yeah I think that's some good insight because you know it like if if d if the refiners try it does some create some weird artifact and you want to keep telling D D about it just so that D can what's it called can keep wearing the artifact keep telling our that hey this artifact you create is a bad artifact right like and keep reminding our of that so that it can get to its destination faster oh yeah let's see I think yeah so we have a few more questions coming in is there anything else that's interesting here here oh this is actually this is something Chubb talked about so this is the difference between so this is in the hand gesture data set and these are - I think yeah these are - or find images and this shows the differences between the global adversarial loss and the local adversarial loss and notice how the local adversary lost you have much more refined image under fire just because like the refiner learns about that you know there's much smaller nitpicky things whereas when you global adversary loss where as you say like I am and 20 percent sure that this image is correct that this image is fake or this image is real then you get these much more bulky like sections so and then I guess this kind of circles back to Sam's question can't I just add like some Gaussian noise you can but then sometimes like sometimes machine learning algorithms are very very particular about like what type of noise you had like in the we had like over here you have a much cleaner edge and like is little very high like these high-frequency little bumps over here and over here you have very low-frequency bumps and it's a lot less smooth so even though these look this very similar to us as people a machine wearing over them might pick up on these things and to a bit a little bit worse yeah so they talked about l1 for self regularization so I think it's so I think they chose l1 just because they want they didn't want to they didn't want to kill the weights they didn't want to kill the weights as fast I think yeah so l1 is used very commonly in a lot of in like a lot of in what's called more classical image processing approaches I think they describe they gave their here we go I even highlight it let's see a one or so let's see I remember remember they talked about it let's see here we go yeah semi-regular ization when the sympathetic and the real images have significant shift in the distribution a pixelize l1 difference may be restrictive oh right yeah because in such cases yeah because if you had so if you had an l2 norm in this case you would know like how far away you were from the right answer but you don't know which way you don't know if I should be like you know darker or lighter l1 norm I know I'm like I know if I'm darker layer just because it's like I subtract a difference are the generator and discriminate deep neural networks if so what kind yes the discriminator and the generator both deep neural networks they are fully they're just fully convolutional they're just fully convolutional networks they talk about their exact [Music] implementation the paper somewhere here but the other they're just standard fully convolutional networks I don't remember if they used like an Alex net or an FCN or whatever type of network whatever specific type of network they used for the back end but they're just some one of the convolutional networks you can just search for some words like telex neder SCN let's see resonant maybe here we go yeah yeah so with the president blocks we use a fully convolutional refiner network with ResNet blocks for all of our experiments so if you're unfamiliar ResNet is a type of it's like a convolutional network that has these like skip connections I would just Google ResNet resna if they if you're if you want to know a little bit more about that or you can just message me if you have other other random questions yeah actually actually that's really interesting to note Casey on their blog is also really iya you should also read their blog if you think the papers a little bit dense which really it's not terribly dense their blog does a really good overview of everything I just said and then even some better what's are called some better descriptions and it's a good high level overview of the work that they did so let's see if anyone else has any comments again you can just post here inside the Meetup I don't know if I have too much else to talk about I know that they they mostly stay in a black and white world but I did a little bit with color and you can see that over here so let's see does anyone else I guess here I can open up the floor to this so did anyone else have any other ideas as to like what this could be used for maybe hey Jess why don't we let Kevin jump in and take a few minutes to walk us through the implementation that he put together oh cool yeah I didn't know he was all oh sorry I didn't know he was all set up yeah it's a let me on and ready for that Kevin yeah maybe he can't hear us or maybe he's muted yeah that's possible you're making Kevin I just heard something okay I barely hear you that better that's a little better yeah okay I can try I'm holding it right next to my mouth okay so yeah Kevin just let us know when you're all set yeah I can uncheck my screen Teddy all my cell I'll respond to your message over the slack personally see um little buddy well I'm just I'm just trying to figure out and share myself I think at the top of the school let's see I can switch it here okay so we should all be seeing Kevin screen now you see things museum that's great yes yep so the link is in the slack if you click on it go to the kaggle data set basically it's this I gays ADA and so what we put together was more or less all the data that you need to kind of reproduce generally the results that they had because when you look at sort of this unity eyes project this is a windows only tool you need to set it up with the right parameters so it produces all of these images and you end up with hundreds of folders with thousands of JPEGs which is not necessarily the best starting point for trying to actually train these models then you have this data from sort of the Max Planck in Germany and you get data and sort of another format that's also not particularly easy to use so what we did for this kaggle is really package that together as hdf5 files so that you can actually start looking at it and playing around with the data and so hdf5 is just a file format that allows you to store really large arrays in kind of one compact place and have a number of different fields that sort of explain what's in there and so for this we have sort of the gaze dot h5 and the real gaze and so if you go to the notebooks or the kernels you can look at one called I over view where you basically open the HDF i5a gaze one you see that there's image and look back you can then show sort of what are all the images that you see and make sort of figures like they had in the paper and now you have this look back which is you know 50,000 because that's number of images here by four and what we do is just go through all the different look effects to kind of show you so this is the left right look and so this is the minimum value this is the maximum value this is one in the middle and then you have the up-down look and so the down the up and then somewhere in the middle the other two ones I wasn't really sure what it is but it's probably not very important for getting started we then have the sort of actual sim Gann notebook which is the much more interesting part where it's a slightly simplified implementation of what they had but basically it's the models that they use built up in Karis combined with this data to actually train it and so you can actually see how is the data loaded what models do you use what's this the batch size what are these D and G coefficients that we talked about earlier what happens when you actually change them and then of course a lot of utility functions to get everything to work correctly and so here we have sort of this image history buffer which was mentioned a couple times through the paper to keep track of sort of the record of images that we had we then have the actual networks here and so here you see it's very clearly a ResNet you have these ResNet blocks with convolutions activations convolutions and then these addition layers to combine them you then have four of these blocks sequentially and in this final output you then have the discriminator network which kind of does convolutions and max pooling and then you combine the different models together and this starts to get a little complicated because you're trying to bring both of the models together so that you can sort of train a refiner and a discriminator model and with Kerris what's nice is you can just print dot summary and then you see exactly what you've combined together so if we go down here after the error messages you can actually see the input layer what was the size how many parameters were there and what else what does it connect to and so you can quite easily go through and see you know here you had 64 kernels you could change that to 96 or 32 wouldn't see how the performance changes you then sort of have the other discriminators and then the whole model together you have the total parameters for the whole model you then if you've done this on your own computer can run this visualization with model 2 dot X Alesi the drawn out topology this doesn't work unfortunately with Kaggle and then you can define sort of the loss functions and here they're defined directly in tensor flow because you have a bit more flexibility there versus being defined in sort of Carus which is the ideal way to happen but we have the sort of self-regulation loss where we have sort of these two terms together and then the local adversarial loss you can then select sort of the optimizers and everything that's needed for compiling the model then set up the data generator so it actually provides the models with data to train on so this is sort of the X and the y components that are fed into the fitting functions you then do a very quick pre training on the data to kind of see what the results look like initially and then you can kind of do the full batch of training where you run it over many many iterations and you train everything in the right order so as kind of the paper showed there's a number of different steps which you do to train the model you know you don't just train everything at the same time you improve the generator and then you improve the discriminator and then kind of run that iteratively and this is more or less what's done there and so you can see this output where you see kind of how the models looked at the very initial stage and then after training at 20 steps which is of course very little in the deep learning world but an example you see it kind of starts to get better at looking at those final images and then you can save all of the the model weight so that you can reuse them or load them Kaggle isn't obviously ideal for training really large complicated models because you only have a limited amount of sort of training time and the computers aren't very powerful but it's a really good way to show that actually the model you built works and share it with other people because everyone starts at exactly the same point and has access to exactly the same data so here was a second notebook which was just trying to get an idea of like what the simple basic models are and so this one uses a random for us to try to predict the eye direction it's kind of a starting point so that you can compare that to the sim gaen to see how well that works so that was more or less what I did um I guess you can ask questions and slack or whatever and I can give the screen back to whoever wants to take that at question for Kevin yeah I guess that was for Kevin I was just like curious if you uh if you got rid of the south-- regularization loss and you got a bunch of images did you just see what they looked like or what happened just curious just out of curiosity oh I think you mean yeah I haven't played with it that much and changing the terms I've tried it with a couple other datasets but it's clearly I mean it's all there it's very easy to tweak there was another question about whose code is this it's not of course the author's original code because they didn't make a repository as far as I know but it's a fairly accurate recreation based on in the papers and so your recreation I borrowed some of the things from github repository but it's more or less my recreation data any more questions for either Josh or Kevin alright then so I will end the video portion of the call and I will be sticking around on slack you know the stack the slack will remain for any further comments and feedback thanks everyone yeah thanks anybody one has questions just shoot me a message on slack I'm Jay Manila

Original Description

This is a recap of the first monthly TWiML Online Meetup, held on Aug 16 2017. The focus of the meetup was the CVPR best-paper-award-winner "Learning From Simulated and and Unsupervised Images through Adversarial Training" by researchers from Apple (Link Below). Thanks again to community members Josh Manela who did a great job presenting this paper and to Kevin Mader for walking through a TensorFlow implementation of the model that he created! Make sure you Like this video, and Subscribe to our channel above! - Full paper: Learning from Simulated and Unsupervised Images through Adversarial Training: https://arxiv.org/abs/1612.07828 - Apple blog post: Improving the Realism of Synthetic Images: https://machinelearning.apple.com/2017/07/07/GAN.html To register for the next meetup, visit twimlai.com/meetup Subscribe! iTunes ➙ https://itunes.apple.com/us/podcast/this-week-in-machine-learning/id1116303051?mt=2 Soundcloud ➙ https://soundcloud.com/twiml Google Play ➙ http://bit.ly/2lrWlJZ Stitcher ➙ http://www.stitcher.com/s?fid=92079&refid=stpr RSS ➙ https://twimlai.com/feed Lets Connect! Twimlai.com ➙ https://twimlai.com/contact Twitter ➙ https://twitter.com/twimlai Facebook ➙ https://Facebook.com/Twimlai Medium ➙ https://medium.com/this-week-in-machine-learning-ai
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from The TWIML AI Podcast with Sam Charrington · The TWIML AI Podcast with Sam Charrington · 49 of 60

1 Engineering Practical Machine Learning Systems with Xavier Amatriain - #3
Engineering Practical Machine Learning Systems with Xavier Amatriain - #3
The TWIML AI Podcast with Sam Charrington
2 How to Build Confidence as an ML Developer with Siraj Raval - #2
How to Build Confidence as an ML Developer with Siraj Raval - #2
The TWIML AI Podcast with Sam Charrington
3 Open Source Data Science Masters, Hybrid AI, Algorithmic Ethics & More with Clare Corthell - #1
Open Source Data Science Masters, Hybrid AI, Algorithmic Ethics & More with Clare Corthell - #1
The TWIML AI Podcast with Sam Charrington
4 Interactive AI, Plus Improving ML Education with Charles Isbell - #4
Interactive AI, Plus Improving ML Education with Charles Isbell - #4
The TWIML AI Podcast with Sam Charrington
5 Machine Learning for the Stars & Productizing AI with Joshua Bloom - #5
Machine Learning for the Stars & Productizing AI with Joshua Bloom - #5
The TWIML AI Podcast with Sam Charrington
6 Generating Labeled Training Data for Your ML/AI Models with Angie Hugeback - #6
Generating Labeled Training Data for Your ML/AI Models with Angie Hugeback - #6
The TWIML AI Podcast with Sam Charrington
7 Explaining the Predictions of Machine Learning Models with Carlos Guestrin - #7
Explaining the Predictions of Machine Learning Models with Carlos Guestrin - #7
The TWIML AI Podcast with Sam Charrington
8 Deep Learning: Modular in Theory, Inflexible in Practice with Diogo Almeida - #8
Deep Learning: Modular in Theory, Inflexible in Practice with Diogo Almeida - #8
The TWIML AI Podcast with Sam Charrington
9 Emotional AI: Teaching Computers Empathy with Pascale Fung - #9
Emotional AI: Teaching Computers Empathy with Pascale Fung - #9
The TWIML AI Podcast with Sam Charrington
10 Statistics vs Semantics for Natural Language Processing with Francisco Webber - #10
Statistics vs Semantics for Natural Language Processing with Francisco Webber - #10
The TWIML AI Podcast with Sam Charrington
11 Building AI Products with Hilary Mason - #11
Building AI Products with Hilary Mason - #11
The TWIML AI Podcast with Sam Charrington
12 Reprogramming the Human Genome with AI, w/ Brendan Frey - #12
Reprogramming the Human Genome with AI, w/ Brendan Frey - #12
The TWIML AI Podcast with Sam Charrington
13 Understanding Deep Neural Networks with Dr. James McCaffery - #13
Understanding Deep Neural Networks with Dr. James McCaffery - #13
The TWIML AI Podcast with Sam Charrington
14 Scaling Deep Learning: Systems Challenges & More with Shubho Sengupta - #14
Scaling Deep Learning: Systems Challenges & More with Shubho Sengupta - #14
The TWIML AI Podcast with Sam Charrington
15 Domain Knowledge in Machine Learning Models for Sustainability with Stefano Ermon - #15
Domain Knowledge in Machine Learning Models for Sustainability with Stefano Ermon - #15
The TWIML AI Podcast with Sam Charrington
16 Machine Learning in Cybersecurity with Evan Wright - #16
Machine Learning in Cybersecurity with Evan Wright - #16
The TWIML AI Podcast with Sam Charrington
17 Interactive Machine Learning Systems with Alekh Agarwal - #17
Interactive Machine Learning Systems with Alekh Agarwal - #17
The TWIML AI Podcast with Sam Charrington
18 Location-Based Intelligence for Smarter Marketing with Klustera - #18
Location-Based Intelligence for Smarter Marketing with Klustera - #18
The TWIML AI Podcast with Sam Charrington
19 AI-Powered Customer Support with HelloVera - #18
AI-Powered Customer Support with HelloVera - #18
The TWIML AI Podcast with Sam Charrington
20 Using AI to Simplify the Programming of Robots with Cambrian Intelligence - #18
Using AI to Simplify the Programming of Robots with Cambrian Intelligence - #18
The TWIML AI Podcast with Sam Charrington
21 Increasing Efficiency of Healthcare Insurance Billing with NLP, w/ Behold.ai - #18
Increasing Efficiency of Healthcare Insurance Billing with NLP, w/ Behold.ai - #18
The TWIML AI Podcast with Sam Charrington
22 Creating a Worldwide Financial Knowledge Graph with AlphaVertex - #18
Creating a Worldwide Financial Knowledge Graph with AlphaVertex - #18
The TWIML AI Podcast with Sam Charrington
23 From Particle Physics to Audio AI with Scott Stephenson - #19
From Particle Physics to Audio AI with Scott Stephenson - #19
The TWIML AI Podcast with Sam Charrington
24 Selling AI to the Enterprise with Kathryn Hume - #20
Selling AI to the Enterprise with Kathryn Hume - #20
The TWIML AI Podcast with Sam Charrington
25 Engineering the Future of AI with Ruchir Puri - #21
Engineering the Future of AI with Ruchir Puri - #21
The TWIML AI Podcast with Sam Charrington
26 Deep Neural Nets for Visual Recognition with Matt Zeiler - #22
Deep Neural Nets for Visual Recognition with Matt Zeiler - #22
The TWIML AI Podcast with Sam Charrington
27 Introducing Psycholinguistics into AI with Dominique Simmons- #23
Introducing Psycholinguistics into AI with Dominique Simmons- #23
The TWIML AI Podcast with Sam Charrington
28 Reinforcement Learning: The Next Frontier of Gaming with Danny Lange - #24
Reinforcement Learning: The Next Frontier of Gaming with Danny Lange - #24
The TWIML AI Podcast with Sam Charrington
29 Offensive vs Defensive Data Science with Deep Varma - #25
Offensive vs Defensive Data Science with Deep Varma - #25
The TWIML AI Podcast with Sam Charrington
30 Global AI Trends with Ben Lorica - #26
Global AI Trends with Ben Lorica - #26
The TWIML AI Podcast with Sam Charrington
31 Intelligent Autonomous Robots with Ilia Baranov - #27
Intelligent Autonomous Robots with Ilia Baranov - #27
The TWIML AI Podcast with Sam Charrington
32 Reinforcement Learning Deep Dive with Pieter Abbeel  - #28
Reinforcement Learning Deep Dive with Pieter Abbeel - #28
The TWIML AI Podcast with Sam Charrington
33 Robotic Perception and Control with Chelsea Finn  - #29
Robotic Perception and Control with Chelsea Finn - #29
The TWIML AI Podcast with Sam Charrington
34 Natural Language Understanding for Amazon Alexa with Zornitsa Kozareva - #30
Natural Language Understanding for Amazon Alexa with Zornitsa Kozareva - #30
The TWIML AI Podcast with Sam Charrington
35 The Power of Probabilistic Programming with Ben Vigoda - #33
The Power of Probabilistic Programming with Ben Vigoda - #33
The TWIML AI Podcast with Sam Charrington
36 Intel Nervana Update + Productizing AI Research with Naveen Rao and Hanlin Tang - #31
Intel Nervana Update + Productizing AI Research with Naveen Rao and Hanlin Tang - #31
The TWIML AI Podcast with Sam Charrington
37 Video Object Detection at Scale with Reza Zadeh - #34
Video Object Detection at Scale with Reza Zadeh - #34
The TWIML AI Podcast with Sam Charrington
38 Enhancing Customer Experiences with Emotional AI, w/ Rana el Kaliouby - #35
Enhancing Customer Experiences with Emotional AI, w/ Rana el Kaliouby - #35
The TWIML AI Podcast with Sam Charrington
39 Expressive AI-Generated Music With Google's Performance RNN with Doug Eck  - #32
Expressive AI-Generated Music With Google's Performance RNN with Doug Eck - #32
The TWIML AI Podcast with Sam Charrington
40 Smart Buildings & IoT with Yodit Stanton - #36
Smart Buildings & IoT with Yodit Stanton - #36
The TWIML AI Podcast with Sam Charrington
41 Deep Robotic Learning with Sergey Levine - #37
Deep Robotic Learning with Sergey Levine - #37
The TWIML AI Podcast with Sam Charrington
42 Deep Learning for Warehouse Operations with Calvin Seward - #38
Deep Learning for Warehouse Operations with Calvin Seward - #38
The TWIML AI Podcast with Sam Charrington
43 Cognitive Biases in Data Science with Drew Conway - #39
Cognitive Biases in Data Science with Drew Conway - #39
The TWIML AI Podcast with Sam Charrington
44 Data Pipelines at Zymergen with Airflow, w/ Erin Shellman - #41
Data Pipelines at Zymergen with Airflow, w/ Erin Shellman - #41
The TWIML AI Podcast with Sam Charrington
45 Web Scale Engineering for Machine Learning with Sharath Rao - #40
Web Scale Engineering for Machine Learning with Sharath Rao - #40
The TWIML AI Podcast with Sam Charrington
46 Marrying Physics-Based and Data-Driven ML Models with Josh Bloom - #42
Marrying Physics-Based and Data-Driven ML Models with Josh Bloom - #42
The TWIML AI Podcast with Sam Charrington
47 Machine Teaching for Better Machine Learning with Mark Hammond - #43
Machine Teaching for Better Machine Learning with Mark Hammond - #43
The TWIML AI Podcast with Sam Charrington
48 LSTMs, Plus a Deep Learning History Lesson with Jürgen Schmidhuber  - #44
LSTMs, Plus a Deep Learning History Lesson with Jürgen Schmidhuber - #44
The TWIML AI Podcast with Sam Charrington
Learning From Simulated & Unsupervised Images through Adversarial Training - TWiML Online Meetup
Learning From Simulated & Unsupervised Images through Adversarial Training - TWiML Online Meetup
The TWIML AI Podcast with Sam Charrington
50 Jennifer Prendki Interview - Agile Machine Learning - TWiML Talk #46
Jennifer Prendki Interview - Agile Machine Learning - TWiML Talk #46
The TWIML AI Podcast with Sam Charrington
51 Evolutionary Algorithms in Machine Learning with Risto Miikkulainen - #47
Evolutionary Algorithms in Machine Learning with Risto Miikkulainen - #47
The TWIML AI Podcast with Sam Charrington
52 Learning Long-Term Dependencies with Gradient Descent is Difficult - TWiML Online  Meetup
Learning Long-Term Dependencies with Gradient Descent is Difficult - TWiML Online Meetup
The TWIML AI Podcast with Sam Charrington
53 Word2Vec & Friends with Bruno Gonçalves -#48
Word2Vec & Friends with Bruno Gonçalves -#48
The TWIML AI Podcast with Sam Charrington
54 Symbolic and Subsymbolic Natural Language Processing with Jonathan Mugan  - #49
Symbolic and Subsymbolic Natural Language Processing with Jonathan Mugan - #49
The TWIML AI Podcast with Sam Charrington
55 Bayesian Optimization for Hyperparameter Tuning with Scott Clark - #50
Bayesian Optimization for Hyperparameter Tuning with Scott Clark - #50
The TWIML AI Podcast with Sam Charrington
56 Intel Nervana DevCloud with Naveen Rao & Scott Apeland - #51
Intel Nervana DevCloud with Naveen Rao & Scott Apeland - #51
The TWIML AI Podcast with Sam Charrington
57 AI-Powered Conversational Interfaces with Paul Tepper - #52
AI-Powered Conversational Interfaces with Paul Tepper - #52
The TWIML AI Podcast with Sam Charrington
58 Topological Data Analysis with Gunnar Carlsson - #53
Topological Data Analysis with Gunnar Carlsson - #53
The TWIML AI Podcast with Sam Charrington
59 ML Use Cases at Think Big Analytics with Mo Patel & Laura Frølich - #54
ML Use Cases at Think Big Analytics with Mo Patel & Laura Frølich - #54
The TWIML AI Podcast with Sam Charrington
60 Ray:A Distributed Computing Platform for Reinforcement Learning with Ion Stoica -#55
Ray:A Distributed Computing Platform for Reinforcement Learning with Ion Stoica -#55
The TWIML AI Podcast with Sam Charrington

This video discusses a research paper on learning from simulated and unsupervised images through adversarial training. The paper presents a novel approach to generating realistic images of eyes and hand gestures using a refiner network and a discriminator network. The video provides an overview of the paper and its key contributions.

Key Takeaways
  1. Train a refiner network to refine synthetic images
  2. Train a discriminator network to distinguish between real and fake images
  3. Use adversarial training to improve the quality of synthetic images
  4. Implement the refiner and discriminator networks using Keras and TensorFlow
  5. Evaluate the performance of the model using Kaggle
💡 The use of adversarial training can significantly improve the quality of synthetic images, making them more realistic and useful for various applications.

Related AI Lessons

I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics
Medium · AI
ICMI 2026 Reviews [D]
Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances
Reddit r/MachineLearning
Workshop submission for main conference paper under review [D]
Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV
Reddit r/MachineLearning
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it
Reddit r/MachineLearning
Up next
Beyond Big Vendors: ERP Systems Explained #shorts
Digital Transformation with Eric Kimberling
Watch →