Diffusion Models Live Event (David Ha)

HuggingFace · Intermediate ·🎨 Image & Video AI ·3y ago

Skills: Image Generation Basics80%

Key Takeaways

Presents a framework for augmenting creative human expression with diffusion models, featuring David Ha from Stability AI

Full Transcript

foreign welcome everyone to the diffusion models uh live event this is part of um a course that we're offering um with uh Jonathan Whitaker and I'm really excited uh today we're going to have a range of speakers from the creators disabled Fusion to David ha from stability AI to Devi from met Ai and several other people so if you've got any burning questions about I don't know schedulers or noise or anything like this about diffusion models today is the chance to ask them and my name is Lewis I'm uh from hanging face I work in the open source team on Transformers primarily and I'm slowly getting converted to diffusion models which is why I'm here today and I'd like to introduce uh jono Whittaker who's the creator of the course and um he will talk a little bit now about what the course is about and then we'll dive in into the talk with David so with that I'll let you take a vision of them great thank you Lewis hello everybody and welcome to the ringing face diffusion models class um so this is something that sprung up predominantly because uh obviously you're all here because diffusion models are this very exciting kind of newly popular class of models for generating things and so everybody's excited about them and what I think there was a big gap especially when the first wave of excitement first rose was that um you know everyone's excited about this but not too many people are able to then take ownership of that and start building on it because it's all quite new so this costs you very excited just to try and introduce these ideas to get you up and running um specifically on like using the diffusers library in this case but just more generally trying to understand the different concepts from the basics all the way up to something like stable diffusion taking it fine-tuning it adapting it and really understanding all of the pieces that go into some big complex system like this so at the moment it's just unit one that's out very much the introductory material but hopefully enough to get you up to speed with the core ideas and then from here we're just going to build on and build on probably every couple of weeks in your unit and I think the first four are listed but there's hopefully more in the future as well as things like diffusion for audio for example becomes more mature and we're able to like capture that into something that that's um with with learning rather than something that's too too rapidly changing so very excited to share this with you all and also super excited for today to um yeah see what all of our guests have to share with us um which brings me I guess to introducing the first speaker so um very much a warm welcome to David ha until recently David was a researcher at Google on the Google brain team in Japan but just in the last few months he has switched over and is now head of strategy at stability AI so very excited to hear from him and his his research interests are you know complex systems and self-organization and also just all of these creative applications of machine learning so David very great to have you and I look forward to seeing what you have to share with us oh thank you I'm gonna share my slide now can can you see my full screen slide yes we can okay thanks uh oh thanks a lot hug and face for entry uh for your kind introduction and for letting me give a talk just as a warning this uh the rest of this uh today with the many talks of like many technical talks but this talk is going to be a general non-technical talk because uh I've been invited to give a general talk about creative AI in this course uh so I'm David ha I currently work at stability AI um and I joined around actually only one and a half months ago and but somehow it feels like a whole year has gone by this this space is happening um you know there's so much going on right now things are you know going really fast and before joining stability AI I worked at Google brain as a researcher for six years and you know in this talk I I really want to dive in the topic about collect active intelligence and how that is related to creative Ai and the material in this talk sprang from a conversation between myself and my collaborator Dr Judy fan who's a also a machine learning and researcher but also a psychologist in UCSD and also with he's going to Stanford next year and basically with everything going on in uh like I started doing machine learning research around seven years ago and also started working on Creative AI like working on on generative models for music and sketches and so on but it's really that this year things really started to explode with the Advent of text to image models and diffusion models and and things are happening at such a fast speed that like I I always questioned and I was discussing with Judith earlier this year like like what's what's going on we want to make sense of it uh and and we we had lots of discussions we dived into some of the literature in in Behavioral Studies in the past and I think we really have to understand how how people in general the population or the collective intelligence is using these Technologies and and interacting with them to really understand how why you know uh things are Iraq going so rapidly and things are being adapted and such an exciting space so in this talk I'm gonna mainly focusing on on my experiences with this technology and uh through understanding uh the collective intelligence or how people collectively use it it might help us develop a framework for understanding how these texture image models or more broadly AI tools augments human creative expression and may help us understand why you know things are really taking off this year so so let's let's dive in so I I really like to study complex systems and collective intelligence because I think these systems are beautiful and they lead to lots of like resilience and robust properties uh one of my favorite quotes is from this book from Andrew Pickering in the cybernetic brain where he he mentions uh Bridges and buildings are all designed to be indifferent to their environment to withstand frustrations and not to adapt to them the best bridge is one just just stands there whatever the weather so I think you know the the contrast between a really well-built Bridge versus like a bridge that's built by a bunch of agents in this case ants that were formed through through immersions is a sharp one and and it's a metaphor for what's going on in the technology space right now so on one hand you have a very large structured models that are designed to do do things really well and on on the other hand like uh you can have like like an open source models or or people hundreds or thousands of people hacking around models uh taking them apart and making things out of them so I think you know these two approaches are is kind of like a at the core of what's happening in the open source Community right now and it's gonna be a theme of this talk and I when I think about image generation from my collective intelligence you know we can think about Gans or vaes or stable diffusion but one one one uh experiment I really like is the image generation experiments uh using humans as an algorithm on Reddit uh in in this art Place experiment so this is if you haven't heard of our place before it's it's basically collective intelligence applied to image generation uh it happens around once a year on some years and they basically set up a 1000 by 1000 pixel canvas so there's a million pixels and each Reddit user is only allowed to draw one pixel every five minutes and the entire experiment lasts for a week so so the user uh in terms of the exploration space has like a million times 256 possible color possibilities so there's a wide range of exploration space from the user's point of view the user also wants to to share and collaborate with other users and and what's more interesting is uh the the network Dynamics also affects what is drawn as well so there's three properties the the exploration space there's also the the drive to share and to to act in the environments and for the user to to get feedback from this complex network uh in in this million pixel canvas so so what happens is like um the users on Reddit uh without any explicit reward signal even coordinate some strategy on the Reddit discussion forum to to defend some design to attack other designs and to form alliances and this you know seemingly like a non-objective test leads to these awesome you know lots of these patterns and resilience robustness going on and and I I think you know this kind of epitome optimizes uh many things that are happening right now in the in the diffusion models based in the open source uh creative community so um I think this this is something that uh it really resonates with with what's going on in the in the machine learning space as well so the the um if we talk about uh machine learning in particular like generative AI algorithms uh the collective intelligence or like a collaborative uh generative AI is not something that's so new uh even back in 2007 uh Ken Stanley published a paper called composition and pattern producing Network or cppns uh which are like a method to evolve neural networks that can generate patterns and and what he did back then was uh he he put this algorithm on the internet back then when there's not no JavaScript but it was based on Java applets and he Let it Loose so Ken Stanley and his collaborators put the cppn algorithm on the internet and users started to evolve networks that generate these cool patterns and what what is interesting about these networks is some user can take some of these networks and continue to evolve them in some way so they can they can kind of innovate and build on top of them to generate other algorithms as well so this leads to more and more complex but interesting patterns being discovered by by like a community of users around this platform which he called a pick breeder so when I first got into machine learning around seven years ago or seven or eight years ago I played around with this cppn algorithm as well I had already re-implemented it in JavaScript and this is kind of how it looks like you evolve a network on the rights and you feed in the the X and Y and the distance from the center coordinate and the network outputs the RGB value the color so in principle this can generate very high resolution images because you're not confined with about how many pixels you are you just put in the coordinates it could be any coordinate you want and you get a color and as these things complexify I was able to generate very complicated images that look kind of cool and this was back in 2015 and like uh when Gans or you know c410 was at 32 by 32 resolution and I thought it was kind of cool to have a neural network generally you know 2048 by 2048 pixel images back then so when I got into deep learning the first thing I did was to apply this cppns to MNS because I was sick and tired of emness it was like 28 by 28 pixels it's so uninspiring and low resolution so I immediately used these methods to generate a thousand by thousand pixel images of remnants even and also evolving all sorts of abstract patterns using these cppns so I was fascinated by looking at what has been created from the from this sort of a collective or communal AI algorithms and how they can be applied to deep learning and later on uh when I joined Google and I also worked with a team in called the Google creative lab and they they also built something very inspiring uh they they built a viral game called Quick Draw and they collected basically the collective knowledge of how the doodle things on earth using a quick draw game where basically they have around like hundreds of categories and users were asked to draw things do little things in 20 seconds that resembled that category and somehow they collected hundreds of millions of images for some reason people were addicted to or or really enjoyed playing with neural networks and this is a theme that's kind of recurring right now as well people like to have an interface to work with an artificial neural network just to see what happens with that artificial neural network and they get they like to get their feedback from that artificial neural network so they can they can play a game again it's kind of like what's happening now with the fusion models but that back then it was a lot simpler uh and and here you know uh the Google open source the data sets of like tens of millions of these Doodles uh and and it was pretty awesome it was like the first large Collective uh data set of Doodles all around the globe and of course me the first thing I did was I took this data set and and trained a generative model on it uh so uh I trained the algorithm I recruit neural network called sketch RNN that can that can take Doodles and and continue to draw them uh for for the uh so you can draw things like continue to draw Gardens faces and insects and buses and so fire trucks and so on and I I really like playing around these uh models interactively just like how now when people use uh diffusion models uh do you like to have a very short feedback between seeing the output and typing another prompt back then you know these I made these models work in the web browser so you get instant feedback so after you finish drawing that the machine will take over right away and show you what's going to happen next and I think this this short feedback cycle is is really important for a machine human HCI uh when when we're trying to to get people to to creatively interact with these algorithms I started playing around with extending this to multiple inputs so that you can you can see you can sample different seeds like how when when you're using diffusion models now you don't just you generate one you want to generate 9 or 16 and choose the best stuff so so this is kind of like back then you want to generate a lot of these and get inspiration from as as many Randomness as possible uh and and just like now uh back then I I also tried to open source these models and release it for everyone to use and it was just amazing to see how the community took apart all of these neural networks in the wild and and one of the things I saw was that people deployed them uh on on robots and they were drawing Doodles of cats and birds and and even someone took the sketch RNA model and projected some outputs to to a building and you know like that was pretty cool uh so uh what was really interesting uh back then was that even in 2018 2019 um the there was a student at Berkeley Forest Wong who who actually uh looked at Quick Draw data set and sketch RNN and and use uh the concepts of prompts and put them into embeddings and and he he had a paper that took a short text description and and tried to create a doodle based on this description so this is a very rudimentary version of modern texture image but it was text to Doodles and I think that one of the the things I liked was that Doodles were really simple so he didn't need that much resources to train this model and and able to to you know like test out these early versions of text to image algorithms uh of course three years later the world has completely changed but I thought this was a pretty cool work back then before texture images really took off so around that time you know I was working on many research problems and and fast forward a few years you know just from from April of this year um when when open AI Google started um demoing their text image models I I had fantastic experience playing around with text image models uh you may you may look at my Twitter and my Twitter repeat was full of these examples so so I I really like to play around with Daddy too and imagine at the beginning um I when I started using daddy 2 and imagine I I knew this was gonna be a game changer it's going to change your world like I I actually thought they were a bigger deal than pure text generation models like GPT because somehow the you know the human perception system uh the visual system is so attached to our senses to to our inner self that we we feel a lot more motivated and you have a stronger emotional response when we when we look at the images compared to reading text and and I think uh you know that there's there's a two mainly uh like two types of uh initial applications that people were making arts uh using these models and also making making photos of weird stuff like a photograph of a strawberry frog and what was really weird uh it seems it was only this year but it seems like so many such a long time ago that like remember back in April only a small minority of people had access to Daddy too or imagen um I I remember back then it's not like now everyone can download stable diffusion or whatnot or get get invites to the journey but like I I was only a handful of people in the world who had access to Daddy to an Imogen and it felt weird it's it's like there's only a small number of users and I felt like something was was weird and kind of wrong because I think this amazing technology should just be be available to everyone but you know I had access to it so I played around with it and um I started off making images like like many of you with known artistic style like having cyberpunk digital art or or making Banksy images of Totoro uh then I tried to create some images of some degree of realism uh but it's kind of like a fantasy like like the robots romance or or beers like Bears drinking beer and eating Ramen and Shinjuku these were kind of cool uh and and then you can try to create some images with some degree of emotion which which is possible because of a diverse data sets and embeddings and you can have like you know robot stating or cats trading on the stock market and so on and what what I what I captivated me about these models is their ability to learn abstraction so one one of my motivation on the earlier sketch RNN and quick draw experiment was I wanted to understand how humans like ourselves develop abstract concepts of things that and then we do the very simple representations of those things but somehow these models can just learn that like we can ask the daddy to to to create a minimal line drawing of New York City and it does that it can create like Lego figures of Jay-Z Jeff Bezos Arnold Schwarzenegger and Rihanna and it has to kind of learn some abstract representation of Rihanna or Jeff Bezos or Jay-Z to to make Lego figures from them so I think these are pretty convincing example that they're able to learn abstractions uh I you know as you know I I post a lot of these images on social media earlier this year and I kind of monitor some of the engagement I noticed that the images with the highest engagements are often not the ones that look the most beautiful or realistic like like good art doesn't get retreated that much is the ones that have have involved cultural factors or kind of memes so like one one of the the my favorite one is this photograph of a their stock market in the 1930s as you know this year sucked for stocks and everyone's losing money and and you know I tweeted that out and uh the fact that uh the bear Market got interpreted as a bear uh and then the market was actually going down you know made this popular made this image with this kind of a misinterpretation really popular and a Bitcoin was also tanking and somehow I was like people like Japanese art so Japanese Yuki always grows where Bitcoin was also a trending thing and these models were also great at making novel combinations of very distant Concepts like no one expects Darth Vader to be on the cover of Vogue magazine and and maybe no one expects like maybe some of you about Darth Vader at a Tech conference you know maybe if he's trying to pitch some deal in in a bad Market environment and uh one of the things that really shows the the Brilliance of the community is the ability to reverse engineer uh close models like I'm sure many of you know that Daddy too has some uh the safeguards or or like a limitations uh or where you can't type in profanity and some users not me like type things like you know you and get Elmo to to generate that as a text image it's kind of funny and uh and people also reverse engineered uh how uh the the system was inserting prompts into the algorithm so they they had a problem that says a person holding a sign that says and then suddenly the there will be a sign that says female or black as a way to de-bias some of the images so the community is pretty smart humans are pretty smart as a group to figure stuff out so by by June of this year I was thinking what's next right these are pretty amazing stuff this is the was at that time I was thinking what is the future of text to image models dominated by uh big companies like open AI Google who have the resources to train bigger models on bigger data sets and there's a plot twist um in the middle of this year uh there's a huge spike in the usage of this model called daily mini so Deli mini uh is uh is a community effort funded by a hug and face I think it's a hackathon by Boris and led by Boris and they try to reproduce daddy one and they they were successful you know they created a model that can generate stuff from text generate images from text that it's not a high resolution it's like 256 by 256 and hug and face served this model uh on on the hug and face gradio or whatever the demo and it became a viral hits uh what was amazing is uh it's at some point this year daddy two not Daddy mini became an absolute viral internet meme like you know we're on ML Twitter and ml Reddit and so on but none of that matters like look like much broader than that people actually like Associated text to image with Daddy too uh and many people were kind of surprised by that but it turns out that Daddy too because of the low resolution and the three by three grid is is great at making memes like you know Eminem as an m m it actually makes an m m as an m m so people people loved it it gotten so popular at one point that the the New Yorker made a cartoon that featured this iconic 3x3 grid uh in on the tweets and and it was it was written up on on Newsweek because you're able to to make funny images of trump and Biden using daddy too which which was probably prohibited from Daddy uh I'm using daddy Mindy which is probably prohibited from Daddy too so people even made a meme for this daddy mini like uh I I like at the beginning daily mini daddy 2 was rolled out to a selected group of artists but then at the same time people started using Daddy to make minions at Cross burning jar jar brings that Nuremberg Trial and and doing things like gender review 911 but it's kind of funny and I think when Daddy 2 came out it really offers a hint of creativity of the masses being Unleashed and and this sort of a casual creativity is something that's really apparent in these text to image models like like people can take these Concepts like gender reveal and make death star explosion gender reveal a funeral at Walmart or neurosurgeons putting Ramen inside a man's head as a joke and it would generate it and this is hilarious and I think you know what what made these viral is it's these models are it's not just about scaling the model size or compute I mean that's important to get some quality but just as importantly I think uh the success of daddy mini is about scaling the number of human users and developers that's using this stuff like there's like an evolutionary approach like people play with this model and upload their favorite ones and the community will upvote the ones that they think are the most popular and hence you get these these meme images that come out on top so and what's also Apparent at the time many of you know now is like what works on one model does not work on others like when you type in YouTuber during a funeral unboxing on daily mini it's pretty funny stuff you get this like you know Tech Pro unboxing uh uh like a funeral uh like a coffin but if you put the same problem in daddy too um it doesn't really work right so so these images uh the best images are are actually a result of not just the model but a complex feedback loop between a human uh or or the community of humans and a neural network model so um another issue uh thing that made that immediately popular is because it's it can be used to portray cultural items to reflect current issues like you know this year people generated a fixture price my first hand grenade or Supreme Court dressed as clowns that you can probably it's self-explanatory and and like when I was discussing this with my collaborator uh Judy I think like this we think there's many uh we can distill it down to three processes there's the motivation to share so after exploration uh using a text image model we want to people want to share them and through sharing the images in multiple variants you get some feedback from the network that you share too and and think people build build on top of it so there's really there we think there's three stages in this framework there's individual exploration stage of using the tool or the model to create an artifact there's a drive to share and and there's a network of dynamics that will allow people to build on top of your Innovation just like the pick breeder experiment or cppn experiment I discussed at the beginning of this talk and then you get the feedback back to the user so the first uh is many enhancing the person's ability to efficiently explore a wide space of possibilities like individual exploration there's also the the model or the the the the framework or the like text image model is basically heightens a person strive to share what they have created with other people maybe because text image models is just so new and cool people just want to share these things maybe it won't be the case in 10 years from now or probably not and number three the important thing is unlocking the feedback dynamics that accelerate Innovations across large communities so you share them and you get feedback whether on social media or or whatnot and an example of this is like these Innovations in the model are learned and passed on other users and are built upon like people discovered that daily mini was good at drawing sketches of courtrooms so on dinosaur and people copy this concept and made a bottle of ranch dressing testifying at court or alien appearing in court for damages caused by abductions this is pretty hilarious people discovered that PlayStation is pretty good as well it's Grand Theft Auto 10 and square Edition and American Psycho for PS2 so this is only one part of the puzzle I think so there's the individual creator of artists using these tools uh we also think this framework can be applied more broadly uh so um there in this space there's the users and creators and artists using these uh texture image models or AI tools to create artifacts there's also the developers that are working on these models or new ways to use these models or app applications and and hence creating applications as an artifact as well and also uh sometimes the artists can also be the developer so they can wear multiple hats but there's also so there's like these two layers of of Loops going on and not just users but also Developers uh so I'm gonna talk about uh this Parts using stable diffusion as a case study where we see things like creative applications model hackers and research and boy the you know the we can step from daily mini to stay with the fusion Community it's basically going from AI memes to AI Arts like I think I have five ten minutes left because it's okay so like uh the the disabled diffusion Community is not just satisfied with text image they want to create arts and fantastic images from it and boy this community is weird like you you get on the left like the the model like uh the the model hackers that want to to Really make things look perfect and kind of criticize every single flaw you get the normal users that just wants to make memes like Paris and Hilton and Albert Einstein you know being wet to each other and you also get the developers like people hacking on dreambooth to make pictures of their friend's wedding uh so I'm going to talk about these three uh these three categories and of this group so in in the area of creative developers uh because the model is open source uh it led to things like you know like hundreds of apps like Photoshop figma blender and people were writing plugins for all of these apps and it's really difficult for one company to build plugins for everything you really need a community of hundreds of thousands of developers like within days of stable diffusion initial launch people were building uh people build up a figma plug-in that you can just use the stable diffusion model type in a text prop in in Sigma and then you would you would get like the the the realistic looking looking design uh from from a text to prompt and conditioned on on on an SVG drawing or like a rendered SVG trying uh blender as well people were able to quickly develop plugins or various plugins for stable division for blender because blender is the open source standard tool for creating 3D scenes people are able to take these like renderings and 3D wireframe images type in a text prompt and make them into various you know real like art pieces using stable diffusion and like the the other part I was talking about is the the model hackers there's so many model hackers out there uh and this community really discovered novel ways of using stable diffusion things like negative prompts how to combine models fine-tuning to model distillation like I think uh negative prompts I I I'm not sure what it where it was discovered but it was certainly made popular in stable diffusion like you type in the prompts that you don't want to uh to see and suddenly it makes your Burgers look much better um you can we can chain together daily mini to stable division models because maybe that immunity is creates more meme images and you can create a higher quality version of gaming toilet made from Nvidia RTX 3090 gpus there's also a whole bunch of hackers making fine-tuning custom models using dreambooth or other techniques to customize models or on a particular data set and this is something only possible with open source models because with a closed Source but very large model it's probably it probably doesn't know who you are or probably doesn't know any some Niche concept where like you actually want a smaller model that's good enough and you can just fine tune it on on your your small data set to it to have a like a diffusion model that's that's too suitable for your particular application and and what's funny is uh people actually like uh try to get stable diffusion to generate styles that look like a mid journey by by simply like collecting mid-journey images and and fine-tuning stable diffusion model to to mimic the prompt to mid-journey images and you can kind of get uh mid-journing uh you can they kind of discovered the concept of model distillation rediscovered it by fine-tuning uh stable diffusion into mid-journey Data that was kind of cool and then lastly I I think I I'll go two minutes left I'll talk about how I think open source diffusion models like stable division uh will become a foundation of the research Community there's so many works right now that's actually using stable diffusion as The Benchmark model to communicate results uh in in the academic community and in the research Community as a whole like for instance there's this amazing paper called dream of many of you may have used it to fine tune your stable delusion models it came out by this fantastic group from Google research I know everyone here and like it was originally written to be used with the image and model inside Google but right now if you say dream Booth people will automatically think that you're using dream proof with stable diffusion because it's it's what works out there and because of the open source model another recent cool work is this instruct fix to picks uh written written from Tim Brooks and his collaborators at Berkeley and they were able to combine a GPT type model language model with stable diffusion to be able to to create to instruct uh the the model how to edit the images using natural language and I think these are things that are easily doable you know with open source model compared to a large closed Source model uh even in the area of uh interpretability in machine learning people are using stable division as a base when communicating uh you know like the results like they're using uh that someone wrote a paper on interpreting and stay with the vision with attention maps and so on and I think that Robin may talk about this later in this talk but we think that stable division is also going to become a popular model of choice for machine learning researchers working with diffusion models and we think that the best model to conduct research on will become something like stable division or an open source model because precisely it's open source we actually think a stable division it will probably become something like a vgg or Alex net model in the academic and Industry ml research Community because people can communicate and speak in the same language and even lastly some some areas outside of ml research like Health are using stable diffusion there's a paper called a Rowan gen or someone built a team build a generative model uh by fine-tuning stable diffusion on a large X-ray and Radiology report data set and they're they're creating new generating new X-ray and Radiology images using like new like by modifying the text prompt so I'm not sure whether how impactful that is but this stuff is going on and I think I don't want to go out of time but this I think this is just the beginning and I think with the trajectory of this open source and community-led development both on the user side on the developer side and the research side the future is going to be really bright so I don't want to go out of time thank you started there we go it's uh awesome talk uh I think maybe the most number of like memes like actual memes um I've seen in a talk so uh really really cool um we've got a few questions from the audience if you if you've got a few minutes um to sure I don't want to go over time because I I think maybe we're still under time a few questions just to see so there's one here maybe you can choose one shoot one or two and I can answer them so the first question is from IQ mates who's asking are these models as these models become more capable how do we protect against deep fakes and I think this is like a really important question that is on the mind of you know everyone worried about generative AI I think so too like uh like there's many opinions out there uh but the my opinion and my view is like uh we should release these models as frequently as possible to the community and I think that the community like at large will be able to look at the dangers of these models or or preventing defects and looking at the dangers collectively and finding ways to mitigate them I I think this approach is much preferable than having one institution taking the responsibility of protecting the model because there's so many there's so many instances of History where like that doesn't work like look at cryptography yeah and so on like why are these models are not that like perfectly realistic generating you know hands with six fingers I say just release them put them out there get people to to work on safety aspects collectively and and I think that the red team has to be in the open source world as well uh to work together with the model development team uh to to like uh push forward otherwise uh like uh you would just end up confining a model inside an organization and it'll never see the light of day and that model will eventually be released somehow or people will reproduce it with once it's very high quality and people will the community will not have enough time to develop um the the protection against it yeah that's totally great I think I'm also at hiking face this is something that we're very aligned with about you know doing things in the open right because what we learned also in NLP is that when you have open source language models people can actually then inspect them and you know attack them which is cool right I mean it's not easy to do through an APA yeah totally and as I discussed in the last part of the slides the interpretability research are using stable diffusion models as the best model yeah so this is actually happening right now cool so maybe we could take one more question which is a bit of a eye to the Future so um will text to GIFS or text to videos become the natural extension to the sort of current family of generative models or is this still a difficult problem to solve I I think it's a natural extension like uh if if you follow the literature uh Google has already published a email gen video uh work and that that's pretty awesome you know I used it when I was there you know it works and it's only a matter of time when these results get reproduced by the community at large and extended upon and then like hope hopefully uh when like the way I approach research or we approach research as stability AI is not necessarily just reproducing stuff where we want to be able to innovate and optimize for efficiency like like where you don't actually need like a TPU pod to run these models and we want to make sure that maybe you need a decent GPU might work uh the quality might not be state-of-the-art but at least people can try to make gifts from it exactly I think it was you who said on Twitter that like the the metric today is like can it run on collab and then how many images per second it's like like yeah so it's a it's a multi-objective optimization I think when we're making these models and we just gotta get the best bang for the buck awesome so maybe we can uh in the broadcast there so uh thank you so much David for joining us it's been a real pleasure um thank you I would really enjoyed the talk and um to everyone else who's in the audience we're going to take a break now and we'll be back uh at six o'clock uh central eastern time so catch you later foreign

Original Description

To celebrate the launch of Hugging Face's Diffusion Models Class, we're hosting a live event with the makers and shakers of Stable Diffusion! In this broadcast we'll host David Ha from Stability AI, who will talk about "A framework for augmenting creative human expression"

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from HuggingFace · HuggingFace · 0 of 60

← Previous Next →

The Future of Natural Language Processing

The Future of Natural Language Processing

Trends in Model Size & Computational Efficiency in NLP

Trends in Model Size & Computational Efficiency in NLP

Increasing Data Usage in Natural Language Processing

Increasing Data Usage in Natural Language Processing

In Domain & Out of Domain Generalization in the Future of NLP

In Domain & Out of Domain Generalization in the Future of NLP

The Limits of NLU & the Rise of NLG in the Future of NLP

The Limits of NLU & the Rise of NLG in the Future of NLP

The Lack of Robustness in the Future of NLP

The Lack of Robustness in the Future of NLP

Inductive Bias, Common Sense, Continual Learning in The Future of NLP

Inductive Bias, Common Sense, Continual Learning in The Future of NLP

Train a Hugging Face Transformers Model with Amazon SageMaker

Train a Hugging Face Transformers Model with Amazon SageMaker

What is Transfer Learning?

What is Transfer Learning?

The pipeline function

The pipeline function

Navigating the Model Hub

Navigating the Model Hub

Transformer models: Decoders

Transformer models: Decoders

The Transformer architecture

The Transformer architecture

Transformer models: Encoder-Decoders

Transformer models: Encoder-Decoders

Transformer models: Encoders

Transformer models: Encoders

Keras introduction

Keras introduction

The push to hub API

The push to hub API

Fine-tuning with TensorFlow

Fine-tuning with TensorFlow

Learning rate scheduling with TensorFlow

Learning rate scheduling with TensorFlow

TensorFlow Predictions and metrics

TensorFlow Predictions and metrics

Welcome to the Hugging Face course

Welcome to the Hugging Face course

The tokenization pipeline

The tokenization pipeline

Supercharge your PyTorch training loop with Accelerate

Supercharge your PyTorch training loop with Accelerate

The Trainer API

The Trainer API

Batching inputs together (PyTorch)

Batching inputs together (PyTorch)

Batching inputs together (TensorFlow)

Batching inputs together (TensorFlow)

Hugging Face Datasets overview (Pytorch)

Hugging Face Datasets overview (Pytorch)

Hugging Face Datasets overview (Tensorflow)

Hugging Face Datasets overview (Tensorflow)

What is dynamic padding?

What is dynamic padding?

What happens inside the pipeline function? (PyTorch)

What happens inside the pipeline function? (PyTorch)

What happens inside the pipeline function? (TensorFlow)

What happens inside the pipeline function? (TensorFlow)

Instantiate a Transformers model (PyTorch)

Instantiate a Transformers model (PyTorch)

Instantiate a Transformers model (TensorFlow)

Instantiate a Transformers model (TensorFlow)

Preprocessing sentence pairs (PyTorch)

Preprocessing sentence pairs (PyTorch)

Preprocessing sentence pairs (TensorFlow)

Preprocessing sentence pairs (TensorFlow)

Write your training loop in PyTorch

Write your training loop in PyTorch

Managing a repo on the Model Hub

Managing a repo on the Model Hub

Chapter 1 Live Session with Sylvain

Chapter 1 Live Session with Sylvain

Chapter 2 Live Session with Lewis

Chapter 2 Live Session with Lewis

The push to hub API

The push to hub API

Chapter 2 Live Session with Sylvain

Chapter 2 Live Session with Sylvain

Chapter 3 live sessions with Lewis (PyTorch)

Chapter 3 live sessions with Lewis (PyTorch)

Day 1 Talks: JAX, Flax & Transformers 🤗

Day 1 Talks: JAX, Flax & Transformers 🤗

Day 2 Talks: JAX, Flax & Transformers 🤗

Day 2 Talks: JAX, Flax & Transformers 🤗

Day 3 Talks JAX, Flax, Transformers 🤗

Day 3 Talks JAX, Flax, Transformers 🤗

Chapter 4 live sessions with Omar

Chapter 4 live sessions with Omar

Deploy a Hugging Face Transformers Model from S3 to Amazon SageMaker

Deploy a Hugging Face Transformers Model from S3 to Amazon SageMaker

Deploy a Hugging Face Transformers Model from the Model Hub to Amazon SageMaker

Deploy a Hugging Face Transformers Model from the Model Hub to Amazon SageMaker

Run a Batch Transform Job using Hugging Face Transformers and Amazon SageMaker

Run a Batch Transform Job using Hugging Face Transformers and Amazon SageMaker

[Webinar] How to add machine learning capabilities with just a few lines of code

[Webinar] How to add machine learning capabilities with just a few lines of code

Hugging Face + Zapier Demo Video

Hugging Face + Zapier Demo Video

Hugging Face + Google Sheets Demo

Hugging Face + Google Sheets Demo

Hugging Face Infinity Launch - 09/28

Hugging Face Infinity Launch - 09/28

Build and Deploy a Machine Learning App in 2 Minutes

Build and Deploy a Machine Learning App in 2 Minutes

Hugging Face Infinity - GPU Walkthrough

Hugging Face Infinity - GPU Walkthrough

Otto - 🤗 Infinity Case Study

Otto - 🤗 Infinity Case Study

Workshop: Getting started with Amazon Sagemaker Train a Hugging Face Transformers and deploy it

Workshop: Getting started with Amazon Sagemaker Train a Hugging Face Transformers and deploy it

Workshop: Going Production: Deploying, Scaling & Monitoring Hugging Face Transformer models

Workshop: Going Production: Deploying, Scaling & Monitoring Hugging Face Transformer models

🤗 Tasks: Causal Language Modeling

🤗 Tasks: Causal Language Modeling

🤗 Tasks: Masked Language Modeling

🤗 Tasks: Masked Language Modeling

More on: Image Generation Basics

View skill →

ULTIMATE FREE NSFW LTX 2.3 LORA TRAINING! VIDEO & VOICE!

ULTIMATE FREE NSFW LTX 2.3 LORA TRAINING! VIDEO & VOICE!

Create and Master 3D Assets in Blender from Scratch

Create and Master 3D Assets in Blender from Scratch

ControlNet and Stable Diffusion Local Step by Step Installation Guide

ControlNet and Stable Diffusion Local Step by Step Installation Guide

Onur Yuce Gun, PhD

Qwen 2.5 AI: Complete Beginner Tutorial [100% Free and OpenSource]

Qwen 2.5 AI: Complete Beginner Tutorial [100% Free and OpenSource]

FREE Video AI - Deforum Local Install - Super Easy!

FREE Video AI - Deforum Local Install - Super Easy!

GEN-3 gives live to Midjourney images

GEN-3 gives live to Midjourney images

Related Reads

Topaz Gigapixel Review 2026: The Upscaler Is Good, But Not Magic

Learn the limitations of AI upscaling tools like Topaz Gigapixel and how they differ from real image information

Building a color replacer: why RGB matching fails on photos and how HSV fixes it

Learn why RGB matching fails for color replacement in photos and how HSV can fix it, improving your image processing skills

Dev.to · Irrational Apps

I Built an Image Steganography Tool — Hide Any File Inside a PNG with AES-256 Encryption

Learn to build an image steganography tool that hides files inside PNGs with AES-256 encryption, enhancing security and privacy

Dev.to · Rishu

Histogram-constrained Image Generation

Learn to generate images with histogram constraints using diffusion models for better control over output

OpenAI Kills Sora then Descends into Chaos