The AI-First Graphics Editor - with Suhail Doshi of Playground AI

Latent Space · Beginner ·🎨 Image & Video AI ·2y ago

Skills: Image Generation Basics60%AI Design Tools50%

Key Takeaways

Introduces the AI-first graphics editor with Suhail Doshi of Playground AI, exploring image generation and AI-powered tools

Full Transcript

[Music] hey everyone welcome to the laden space podcast this is celesio partner and CTO and Resident at debel partners and I'm joined by my cost of swix founder of small AI hey and today in the studio we have soil Doshi welcome yeah thanks thanks for having me among many things you're a COO and co-founder of mix panel U and uh I think about 3 years ago you you left to start might Mighty mhm um and more recently I think about a year ago uh it transitioned into playground uh and You' just announced your your new round I just like to start touch on mix panel a little bit because it's obviously like one of the more uh sort of successful U analytics companies uh we previously um had amplitude on um and I'm curious if like you had any sort of uh Reflections on like just the that overall the interaction of like that that that amount of data um that people would want to use for AI like I don't know if there's there's still a part of you that stays in touch with that world yeah I mean it's I mean you know the short version is is that um maybe back in like 2015 or 16 I don't really remember exactly because it was a while ago we had an ml team at mix panel and um I think this is like when maybe deep learning or something like really just started getting kind of exciting and we were thinking that maybe we you know given that we had such vast amounts of data perhaps we could predict things so we built you know two or three different features I think we built a feature where we could predict whether users would churn from your product uh we made a feature that could predict whether users would convert we tried to be built a feature that could do anomaly detection like if something occurred in your product that was just very surprising maybe a spike in traffic in a particular region could we tell you in advance could we tell you that that happened because it's really hard to like know everything that's going on with your data could we tell you something surprising about your data and we tried all of these various features most of it boiled down to just like you know using uh logistic regression and it never quite seemed very groundbreaking in the end and so I think um you know we had a four or five person ml team and um yeah I think we never expanded it from there and I did all these fast AI courses trying to learn about ML and that was the that's that was the first time you did Fast AI yeah that was the first time I did Fast AI yeah I think I've done it now three times maybe oh okay I did that was the third okay um no no I just me reviewing it is maybe three times but yeah yeah yeah yeah I mean um I think you you mentioned prediction but honestly like it's also just about the feedback right the the quality of feedback from uh from from users I think it's uh is useful for anyone building AI applications yeah yeah self-evident yeah I think I think I haven't spent a lot of time thinking about mix panel because it's been a long time but yeah I wonder I wonder now given everything that's happened like you know sometimes I sometimes I'm like oh I wonder what we could do now and then I kind of like move on to whatever I'm working on but but things have changed significantly since so uh yeah yeah awesome um and then maybe you'll touch on Mighty a little bit uh mighty was very very bold uh it was basically well my framing of it was uh you will run our browsers for us because yeah um everyone has too many tabs open I have too many tabs open and slowing down your machines that you can do it better for us in a centralized data center yeah we were first trying to make a a browser uh that we would stream from a data center to your computer at extremely low latency um and but the the real objective wasn't trying to make a browser or anything like that the real objective was to try to make a new kind of computer and the thought was just that like you know we have these computers in front of us today and we upgrade them or they run out of Ram or they don't have enough RAM or not enough dis or you know there's some limitation with our computers uh perhaps like data locality is a problem um could we you know why why do I need to think about upgrading my computer ever and so you know we just had kind of observed that like well actually it seems like a lot of applications are just now in the browser you know it's like how many real desktop applications do we use relative to the number of applications we use in the browser so there's just this realization that actually like you know the browser was effectively becoming more or less our operating system over time and so then that's why we kind of decided to go hm maybe we can stream the browser fortunately the idea did not work for for a couple different reasons but uh yeah but the objective is try to make a new new true new computer yeah very very bold very bold yeah and uh I was there at at YC demo day when you first announced it was I think the last or one of the last iners like the pr 34 Mission b um how do you think about that now when everybody wants to put some of these models and people's machines and some of them want to stream them in do you think there's maybe another wave of the same problem before it was like browser apps too slow now it's like model's too slow to run on device yeah I think you know we obviously pivoted away from mighty but a lot of what I somewhat believe believed at Mighty is like somewhat very true what maybe why I'm so excited about Ai and what's happening a lot of what Mighty was about was like moving compute somewhere else right right now applications they get limited quantities of memory disk uh networking whatever your home network has uh Etc you know what what if these applications could somehow if we could shift compute and then these applications have vastly more compute than they do today uh right now it's just like client backend services but um you know what if we could change the shape of how how applications uh could interact with things and it's changed my thinking in some ways AI is like a bit of a a continuation of my belief that like perhaps we can really shift compute somewhere else one of the problems with with Mighty was that um JavaScript is single-threaded in the browser and what we learned you know the reason reason why we kind of abandoned Mighty was because I didn't believe we could make a new kind of computer we could have made some kind of Enterprise business probably it could made could have made maybe a lot of money but it wasn't going to be what I hoped it was going to be and so once once I realize that most of a web app is just going to be single-threaded JavaScript then the only thing you could do largely uh withstanding changing JavaScript which is a Fool's errand most likely uh is make a better Pro CPU right and there's like three CPU manufacturers two of which sell that you know you know big ones you know AMD Intel and then of course like apple made the M1 and it's not like single threaded CP CPU core performance single core performance was like very increasing very fast it's plateauing rapidly and even these different like companies were not doing as good of a job you know sort of with the continuation of mors law but what happened in AI was that you got like like if you think of the AI model as like a computer program like just like a compiled computer program it is literally built and designed to do massive parallel computations and so uh if you could take like the universal approximation theorem to its like kind of logical complete point point you know you're like wow I can get make computation happen really rapidly in parallel somewhere else M um you know so you end up with these like really amazing models that can like do anything it just turned out like Perhaps Perhaps the new kind of computer uh would just simply be shifted um you know into these like really amazing AI models in reality yeah like I think uh Andre kapati has always been has been making a lot of analogies with the lmos yeah I saw his yeah I saw his video and I I watched that you know maybe two weeks ago or something like that I was like oh man this I very much resonate with this like idea why I see this three years ago yeah I I think I think there still will be you know local models and then there'll be these very large models that have to be run in data centers um you I think it just depends on kind of like the right tool for the job like any like any engineer uh would probably care about but I think that uh you know by and large like if if the models continue to kind of keep getting bigger it's going to it's you're always going to be wondering whether you whether you should use the big thing or the small you know the tiny little model um and it might just depend on like you know do you need 30 FPS or 60 FPS maybe that would be hard um to do you know over over over a network yeah you tackled a much uh harder problem latency wise um you know than than the AI models actually require yeah so yeah you can do quite well you can do quite well uh you we definitely did um 30fps video streaming did did very crazy things to make that work so I I'm actually quite bullish on the kinds of things you can do with networking yeah right maybe someday you'll come back to that at some point um but so for those that for those who don't know you're very transparent on Twitter uh it's very good to follow you just to just to learn your insights and you actually published a postmortem on Mighty that people can read up and willing to um and so there was a bit of an overlap uh you started exploring um the AI stuff in June 2022 which is when you started saying like I'm taking fast AI again maybe was there more context around that um yeah uh I think I think I was kind of like waiting for the team at Mighty to finish up you know of something and I was like okay well what can I do uh I guess I will make some kind of like address bar predictor in the browser so we you know we had forked Chrome and chromium and um I was like you know one thing that's kind of lame is that like this browser should be like a lot better at predicting what I might do where I might want to go you know it's struck me as really odd that you know Chrome had very little AI actually or ml inside this browser and for a company like Google you'd think there's a lot it's actually like it's actually just like the code is actually just very you know it's just a bunch of if then statements is more or less the address bar so it seemed like a pretty big opportunity and that's also where a lot of people interact with the browser so you know long story short I was like hm I wonder what I could build here so I started to yeah take some AI courses and try to get take review the material again and get back to figuring it out but I think that was somewhat serendipitous because um right around April was I think a very big watershed moment in AI cuz that's when Dolly 2 came out and I think that was the first like truly big viral moment uh for generative AI because of the avocado chair because of the avocado chair and uh yeah exactly it wasn't as big for me as table diffusion like really yeah I don't know Dolly was like all right that's that's cool I don't know yeah I mean they they had some flashy videos but like I never really like it didn't really register just that moment of images was just such a viral novel moment I think it just blew people's mind um yeah I mean it's the first time I like encountered Sam Alman because like they had this like dolly2 hackathon open they opened up the open the eye office for developers to walk in that back when you know it wasn't as uh I guess much of a security issue as it is I see today yeah maybe take us through like the journey to to decide to Pivot into into this and but and and also like choosing images obviously you you you were inspired by Dolly yeah but there could be any number of um AI companies and businesses that you could start like in white this one right yeah um well there must be an idea maze from June to September yeah yeah yeah there there definitely was so I think at that time Mighty we were Mighty and openi was you know not quite as popular as it is all of a sudden now these days but back then I think they were they were like more than H they they had a lot more banwi to like kind of help anybody and so you know we had we had been talking with um the team there around like trying to see if we could do like really fast low latency uh address bar prediction with like GPT 3 and 3 3.5 and that kind of thing and so um you we're sort of figuring out how how could we make that low latency um I think that just being able to talk to them and kind of being involved gave me a bird ey view into a bunch of things that started to happen um you know obviously first was the dolly dolly2 moment but then stable diffusion came out and that that was a big moment for me as well and I remember just kind of like sitting up one night thinking I was like you know what what are the kinds of companies one could build like what matters right now one thing that I observed is that I find a lot of great I find a lot of inspiration when I'm working in a field in something and then I can identify a bunch of problem like for mix panel I was an intern at a company and I just noticed that they were doing all this data analysis and so I thought hm I wonder if I could make a product and then maybe they would use it and in this case you know the same thing kind of occurred it was like okay there are a bunch of like infrastructure companies that are you know uh doing they put a model up and then you can use their API like replicate is a really good example of that um there are a bunch of companies that are like helping you with training model optimization um mosaic at the time uh I mean probably still you know was doing stuff like that so I just started listing out like every category of everything of every company that was doing something interesting obviously weights and biases um I was like oh man weights and biases is like this great company if uh do I want to compete with that company I might be really good at competing with that company because of mix panel because it's so much of like analysis um but I was like no I don't want to do anything related to that that would I think that would be too boring now at this point um but uh um so I started to list out all these ideas and one thing I observed was that at open AI they had like a playground for gpt3 mhm right and all it was was just like text box more or less and then there were some settings on the right like temperature and whatever top K top yeah topk uh you know what's your end stop sequence I mean that was like their product before chat GPT you know really difficult to use but fun if you're like an engineer and I just noticed that their product kind of was evolving a little bit where the interface kind of was getting more and more a little bit more complex they had like a way where you could could like generate something in the middle of a sentence and all those kinds of things and I just thought to myself I was like you know there's not everything is just like this textbox and you generate something and that's about it and stable diffusion had kind of come out and it was all like hugging face and code nobody was really building any UI and so I had this kind of thing where I wrote prompt Dash like question mark in my notes and I didn't know what was like the product for that at the time I it seems kind of trit now um but yeah just like wrot pomp what's the thing for that manager promp prompt manager do you organize them like do you like have a UI that Canary play with them yeah like a library what what would you make yeah uh and so then of course then you thought about what would the modalities be given that how would you build a UI for each kind of modality uh and so there are a couple people working on some pretty cool things um and uh and I and I and I basically chose Graphics because it seemed like the most obvious place where you could build build a really powerful complex UI that's not just only typing in a box that you it would very much evolv beyond that like what would be the best thing for something that's visual probably something visual um so yeah I I think that just that progression kind of happened and it just seemed like there was a lot of effort going into language but not a lot of effort going into graphics and then the maybe the very last thing was I I think um I was talking to a mes who is the co co-creator of Dolly 2 and Sam and I just kind of went to these guys and I was just like hey are you going to make like a UI for this thing like a true UI are you going to go for this are you going to make a product for Dolly yeah for Dolly yeah uh are you going to do anything here um because if you're not going to do it if you are going to do it just let me know and I will stop and I'll go do something else but if you're not going to do anything I'll just do it um and so we had a couple conversations around what what what what that would look like um and then I think ultimately they decided that they were going to focus on language primarily um and uh yeah I just felt like it was going to be very underinvested in yes there there's there's that sort of um underinvestment from open ey which I can see that um but also it's a different type of customer than you're used to uh presumably you know and mix panel very very good at selling to be to be developers uh with playground you're not was that was that not a concern well not not so much because I think that um you know right now I would say Graphics is in this very nent phase like most of the customers are just like hobbyists right like they it's a little bit of like a novel toy as opposed to being this like very high utility thing but I think ultimately if you believe that you could make it very high utility the probably the next customers will end up being B2B it it'll probably not be like consumer like there are there will certainly be a variation of this idea that's in consumer if you if your quest is to kind of make like a a super uh something that surpasses um human ability for graphics like ultimately it will end up being use for business yeah so I I think it's maybe more of a progression in fact for me it's maybe more like mix panel started out as SMB and then very much like ended up starting to grow up towards Enterprise so for for me it's a little it's I think it will be a very similar progression yeah yeah yeah but yeah I mean the reason why I was excited about is because it was a creative tool I make music and uh it's AI it's like something that I could I know I could stay up till 3:00 in the morning doing those are kind of like very simple bars for me yeah yeah it's good decision criteria um so you mentioned Dolly stable diffusion you just had playground V2 come out two days ago yeah two days ago two days ago so this is a model you train completely from scratch so it's not a cheap fine tune on on something you open source everything including the weights um why did you decide to do it I know you supported stable diffusion Xcel in playground before right yep um yeah what what made you want to come up with B2 and maybe some of the interesting you know technical research work you've done yeah so I think I think that we we continue to feel like graphics and these Foundation models for uh anything really related to pixels but also definitely images continues to be very underinvested in it feels a little like Graphics is in like this gpt2 moment right like even gpt3 even when gpt3 came out was exciting but it was like what are you going to use this for you know yeah we'll do some text classification and some semantic analysis and maybe it'll sometimes like make a summary of something and it'll hallucinate but no one really had like a very significant like business application for gpt3 um and in images we're kind of stuck in the same place we're kind of like okay I write this thing in a box and I get some cool piece of artwork and the hands are kind of messed up and sometimes the eyes are a little weird uh maybe I'll use it for a blog post you know that kind of thing the the utility feels so limited and so you know and then we you sort of look at stable diffusion and we definitely use that model in our product and our users like it and use it and love it and enjoy it but it hasn't gone nearly far enough so we we were kind of faced with the choice of you know do we wait for Progress to occur or do we make that progress happen uh so yeah we we kind of embarked on a plan to just decide to go train these things from scratch and I think the community has given us so much the community for stable diff I think is one of the most vibrant communities on the Internet it's like amazing it feels like if I hope this is what like Homebrew Club felt like when computers like showed up because it's like amazing what that Community will do and it moves so fast I've never seen anything in my life where so far and heard other people's stories around this where a research an academic research paper comes out and then like two days later someone has sample code for it and then 2 days later there's a model and then 2 days later it's like in nine products yeah you know they're all competing with each other it's incredible to see like math symbols on a academic paper go to yeah features well-designed features in a product so um I think the community has done so much so I think we wanted to give back to to the community kind of on our way we knew we knew it wasn't going to be it we knew it was not ever going to be certainly we would train a better model than than what we what we gave out on Tuesday but we definitely felt like uh there needs to be some kind of progress in these open source models the last kind of Milestone was in July when Sable diffusion XL came out but there hasn't been anything really since right there's XEL turbo now well XEL turbo is like this distilled model right so it's like lower quality but fast so you have to decide you know what your trade-off is there and it's also a consistency model it's not I don't think it's a consistency model it's like it's they did like a different thing yeah I think it's like I don't I don't want to get quoted for this but it's like something called ad like adversarial something or another that's exactly right um yeah I think it's it's I've read something about that maybe it's like closer to Gans or something but I didn't really read the the full paper but but yeah there hasn't been quite enough progress in terms of you know there's no multitask image model you know the closest thing would be something called like emu edit but there's no model for that it's just a paper that's within meta so um we did that and we also gave out uh pre-train weights which is very rare um usually you just get the aligned model and then you have to like see if you can do anything with it we actually gave out um there's like a 256 pixel pre-trained stage and a 512 and we did that for academic research because there's a whole bunch of we come across people all the time in Academia and they have like they have access to like 1 a100 or eight at best uh and so if we can give them kind of like a 512 pre-train model it might our hope is that there'll be interesting novel research that occurs from that what research do you want to happen I would love to see more research around uh you know things that users care about t to to be things like character consistency uh between frames for video more like if you have like a face yeah yeah yeah basically between frames but more just like you know you have your face and it's in you know one image and then you want it to be like in another and users are very particular and sensitive to faces changing because we know we know it you know we're trained on faces as humans um and you know that's something U I don't not seeing a lot of innovation enough Innovation around multitask editing you know there are two things like instru picks to pick and then the Emu edit paper that are may be very interesting um but uh we we certainly are not pushing the fold on that in that regard um yeah you just all kinds of things like around that rotation um you know being able to keep coherence across images style transfer is still very limited um just even reasoning around images you know what's going on in an image that kind of thing uh things are still very very underpowered very nent so so therefore the utility is very very limited on the 1K prom Benchmark you are 2.5x prefer to stable diffusion Excel how do you get there is it better images in the training Corpus is it yeah can you can you maybe talk through the improvements in the model I think they're still very early on in the recipe um but I think it's a lot of like little things and you know every now and then there are some big important things like certainly uh your data quality is really really important so we spend a lot of time uh thinking about that um but I would say it's a lot of lot of things that you kind of clean up along the way as you train your model everything from captions to the data that you align with um after pre-train to how you're picking your data sets um how you filter your data sets there's a lot I feel like there's a lot of work in AI That's like doesn't really feel like AI it just really feels like just data set filtering and systems engineering and just like you know and the recipes is all there but it's like a lot of extra work to do that um so I think these these models I think I think whatever version I think we we plan to do a playground V uh 2.1 maybe either by the end of the year or early next year and we're just like watching what the community does with the model and then we're just going to take a a lot of the things that they're unhappy about and just like fix them um you know so for example like maybe the eyes of people in an image don't feel right they feel like they're a little misshapen or they kind of blurry feeling that's something that we already know we want to fix so I think in that case it's going to be about data quality um or maybe want to improve the kind of the dynamic range of color you know we want to make sure that that's like got a good range in any image so what technique can we use there there's different things like offset noise pyramid noise uh terminal zero SNR like there all these various interesting things that you can do so I think it's like a lot of just like tricks some are tricks some are data and some is just like cleaning yeah specifically for faces it's very common to use a pipeline rather than just train train the base model more uh do you have a strong belief either way on like oh they should be separated out to different stages for like improving the eyes improving the face or and the hands whatever or do you think like it can all be done in one model I think we will make an unified model okay yeah I think it'll I think we'll certainly in the end ultimately make a unified model um you know there there there's not enough enough research about this there maybe there is something out there that we haven't read There are some bottlenecks like for example in the vae um like the V are ultimately like compressing these things and so you don't know and then you might have like a big informational information bottleneck so maybe you would use a pixel based model perhaps um you know there's a lot of belief I think we've talked from to people everyone from like rombach to various people Rach trained stable diffusion you know there's I think there's like a big question around the AR arure of these things it's still kind of unknown right like we've got Transformers and we've got like a GPT architecture model but then there's this like weird thing that's also seemingly working with diffusion and so you know we going to use Vision Transformers are we going to move to pixel based models is there a different kind of architecture we don't really I don't think there have been enough experiments in this oh my God yeah that's surprising yeah I think it's very computationally expensive to do a pipeline model where you're like fixing the eyes and you're fixing the mouth and you're fixing everyone does as far as I understand well I I'm not sure I'm not exactly sure what you mean but if you mean like you get an image and then you will like make another model specifically to fix a face yeah I don't I think that's very computationally that's fairly computationally expensive and I think it's like not probably not the right right way yeah yeah and it doesn't generalize very well now you have to pick all these different things yeah you're just kind of glaming things on together like when I look at AI artists like that's what they do so ah yeah yeah yeah they'll do things like you know um I think a lot of ARS will do you know control net tiling to do kind of generative upscaling of all these different pieces of the image yeah I mean to me these are all just like they're all hacks yeah ultimately in the end I mean it just to me it's like let's go back to where we were just three years four years ago with where deep learning was at and where language was at you know it's the same thing it's like we were like okay well I'll just train these very narrow models to try to do these things and kind of Ensemble them or pipeline them to try to get to a best in-class result and uh and and here we are with like where the the models are gigantic and like very capable of solving huge amounts of tasks uh when given like lots of great data so Mak sense um you also released a new Benchmark called mjq 30k for automatic evaluation of a model's aesthetic quality um I have one question um the data set that you use for The Benchmark is from mid Journey yes you have 10 categories um how do you think about the playground model ma Journey you know there are a lot of people a lot of people in research like to come up with uh they like to compare themselves to something they know they can uh beat right but um maybe this is the best reason why uh it's can be helpful to not be a researcher also sometimes like I'm not I'm not like trained as a researcher I don't have a PhD and anything AI related for example um but I I think if you care about products and you care about your users then the most important thing that you want to figure out is like we we have everyone acknowledge that mid journey is very good you know they they are the best at this thing we would I would happily am happy to admit that I have no no problem admitting that uh it's just it's just easy it's very visual um to tell so you know I think it's incumbent on us to try to compare ourselves to the thing that's best even if we lose even if we're not the best right and um you know at some point if we are able to surpass M Journey then you know we only have ourselves to compare ourselves to but on first blush you know I think it's worth comparing yourself to maybe the best thing and try to find like a really fair way of um of doing that so I think I think more people should try to do that I definitely don't think you should be kind of comparing yourself on like some Google model or some old SD you know stable diffusion model and be like look we beat you know stable diffusion 1.5 I think I think users you ultimately want care you know how close are you getting to the thing that like I I also you mostly people mostly agree with so we put out that Benchmark not because for no other reason to say like this seems like a worthy thing for us to at least try you know for people to try to get to uh and then if we surpass it great we'll come up with another one yeah no that's awesome and um you kill stable diffusion Excel and everything um in The Benchmark chart it says playground V2 1024 pixel Das aesthetic do you have um kind of like yeah style fine tunes or like what's the dash aesthetic for we debated this maybe we named it wrong or something but we were like how do we help people realize uh you know the model that's that's aligned versus the models that weren't so because because we gave out pre-train models we didn't want people to like use those um so we that's why they're called Bas and then the aesthetic model yeah we wanted people to pick up the thing that we thought would be like the thing that makes things pretty um who wouldn't want the thing that's aesthetic but if if there's a better name we're we definitely are open to feedback no no cool um I was using the product you also have the style filter and you have all these different style and it seems like the styles are tied to the model so there's some like s sdxl Styles there's some playground V2 Styles um can you maybe give listeners a overview of how that works because in in language there's not this idea of like style right versus like in in Vision model there is and you cannot get certain Styles in different models um how do Styles emerge and how do you categorize them and find them yeah I mean it's it's so fun having a community where they people are just trying a model like it's only been two days for playground V2 and we actually don't know what the models capable of and not capable of you know we certainly see problems with it but we have yet to see uh what emergent behavior is and we we've just sort of discovered that it takes about like a week before you start to see like new things but I think like a lot of that style kind of emerges after that week where you start to see you know there's some styles that are very like wellknown to us like maybe like pixel art is a well-known style but then there's some style photo realism is like another one that's like well known to us but there are some styles that cannot be easily named uh you know it's not as simple as like okay that's an anime style yeah uh it's very Visual and and in the end you end up making up the name uh for what that style represents and so the community kind of shapes itself around these different things and so if you if anyone that's in into stable diffusion and into building anything with graphics and stuff with these models you know you might have heard of like Proto Vision or dream shaper some of these weird names uh but they're just you know invented by these authors but they have a sort of genes qua that you know appeals to users um because it like roughly embeds to what you what you want it's I guess so I mean it's like you know there's one of my favorite ones that's fine tuned is not made by us it's called like Starlight XL um is just this beautiful model it's got really great color contrast and and visual elements and the users love it I love it and yeah it's it's so hard I think that's like a very big open question with Graphics that I'm not totally sure how we'll solve um yeah I think a lot of styles are sort of I don't know it's it's like an evolving situation too because Styles get boring right they get fatigued like it's like listening to the same style of pop song I kind of I try to relate to Graphics a little bit like with music because I think it gives you a little bit of a different shape to things like in music it's not just it's not as if we just have pop music and you know rap music and country music like there all of these like the EDM genre alone has like sub genres and I think that's very true in in graphics and painting and art and anything that we're doing there's just these sub genres even if we can't quite always name them yeah uh but I think they are emergent from the community which is why we're so always happy to work with the community yeah that is a struggle you know coming back to this like B2B versus b2c thing uh B b2c you're going to have a huge amount of diversity and then it's going to reduce as you get towards more sort of B2B type use cases I I I'm making this up here tell me if you disagree um so like you might be optimizing for a thing that you may eventually not need yeah possibly yeah possibly um yeah I try not to share I think like a simple thing with startups is that I worry sometimes by like like by trying to be uh overly ambitious and like really scrutinizing like what something is in its most nent phase that you miss the most ambitious thing you could have done like just having like very basic curiosity um with something very small um can like kind of lead you to something amazing like Einstein definitely did that and then when and then he like you know he basically won all the prizes and got everything he wanted and then basically did like kind didn't really he kind dismissed Quantum uh and then just kind of was still searching you know for the unifying Theory and he like had this Quest I think that happens a lot with like Nobel Prize people I think there's like a a term for it that I forget um I actually wanted to go after a toy almost intentionally um so long as that I could see I could imagine that it would lead to something uh very very large later and so yeah it's a very like I said it's very hobbyist but you need to start somewhere you need to start with something that there has a big gravitational p pull um even if these hobbyists are aren't likely to be the people that you know have a way to monetize it or whatever even if they're but they're doing it for fun so there's something something there that I think is really important but I agree with you that you know in time we're going to have to focus we will absolutely focus on um more utilitarian things like things that are more related to editing Feats that are much harder but and so I think like a very simple use case is just you know I'm not a graphics designer um I don't know if I don't know if you are um but it sure you know it seems like very simple that like you if we could give you the ability to do really complex Graphics without skill wouldn't you want that you know like my wife the other day was said you know said ah I wish playground was better because I wish that you know don't when are you guys G to have a feature where like we could make my son his name is Devon smile when he was not smiling in the picture for the holiday card right you know just being able to highlight his his mouth and just say like make him smile like why can't we do that with like High Fidelity and coherence uh little things like that all the way to um you know uh putting you in completely different scenarios is that true can we not do that in painting you can do in painting but it's the quality is just so bad yeah it's just really terrible quality you know it's like you'll do it five times and it'll still like kind of look like crooked or just the artifact part of it like you know the lips on the face are so there's such it g there's such little information there so small that the models really struggle with it yeah make the picture smaller and you w see it I think I think that's my trick I don't know yeah yeah that's true or you know you could take that region and get really big and then like say it's a mouth and then like shrink it it feels like you're wrestling with it um more than it's doing something that kind of uh surprises you yeah it it feels like you are very much the internal taste maker like you carry in your head this vision for what a good art model should look like mhm um um is it do you find it hard to like communicate it to like your team and and you know other other people just CU it's obviously it's it's hard to put into words like we just said yeah it's it's very hard to explain uh like images have such like such a high bit rate compared to just words and we don't have enough words to describe um these these things difficult I think everyone on the team if they don't have good kind of like judgment taste or like an eye for some of the things they're like steadily building it cuz they have no choice right so in that realm I don't worry too much actually like everyone is kind of like learning uh to to get the eye is what I would call it but I also have you know my own narrow taste like I'm at my you know I'm not I don't represent the whole population either true so um when when you Benchmark models you know like this Benchmark we're talking about we use FID uh for input distance um okay it's one one measure but like doesn't capture anything you just said about smiles yeah fit fit is fit is generally a bad metric um you know it's good up to a point and then it kind of like is irrelevant yeah yeah and then so are there any other metrics that you like um apart from Vibes I'm always looking for alternative to vies because Vibes don't scale you know you know it might be fun to kind of talk about this um because it's actually kind of fresh so up till now we haven't needed to do a ton of like benchmarking because it's we hadn't trained our own model and now we have so now what what does that mean how do we evaluate it you know we're kind of like living with the last 48 72 hours of going did the way that we Benchmark actually succeed did it deliver right you know like I think Gemini just came out they just put out a bunch of benchmarks but all these benchmarks are are just an approximation of how you think it's going to end up with real world performance and I think that's like very fascinating to me um so if you fake that Benchmark you'll you'll still end up in a really bad scenario at at the end of the day and so you know what one of the benchmarks we did was we did a we kind of curated like a thousand prompts that's what that's what we published in our blog post you know of all these tasks that we a lot of some of them are curated by our team where we know the models all suck at it like my favorite prompt that no model is really capable of is a a horse riding an astronaut y the inverse one and it's really really hard to do um not in the data you know another one is like a giraffe underneath a microwave how does that work right there's so many of these little funny on we do we have prompts that are just like misspellings of things right just to see if the models will figure it out uh so that's easy that's that should embed to the same space yeah and and and just like all these very interesting weird weirdo things and so we have so many of these and then we kind of like evaluate whether the models are any good at of it and the reality is that they're all bad at it and so then you're just picking the most aesthetic image but uh but I think you know we're just we're still at the beginning of building like our like the best Benchmark we we can that aligns most with just user happiness I think cuz we're not we're not like putting these in papers and trying to like win you know I don't know awards at iccv or something if they have Awards sorry if they don't um and um you could that's absolutely a valid strategy yeah you you could I don't think it could correlate necessarily with the impact we want to have on Humanity I think we're still evolving whatever our benchmarks are so the first Benchmark was just like very difficult tasks that we know the models are bad at can we come up with a thousand of these um whether they're hand rded and some of them are generated uh and then can we ask the users like how do we do um and then we wanted to use a benchmark like party prompts so that people in academ we mostly did that so people in Academia could measure their models against ours versus others um and uh but yeah I mean fit fit is pretty bad and I think yeah you in terms of Vibes it's like you put out the model and then you try to see like what users make and I think my sense is that we're going to take all the things that we notice that the users kind of we failing at um and try to find like new ways to measure that whether that's like a smile or you know color contrast or lighting um one benefit of playground is that we have users making millions of images um every single day and so we can just ask them um and like a for like a post generation feedback yeah we can just ask them we can just say like how how good was the lighting here how was um how was the subject how was the background yeah oh like a for like a proper form of like it's just like you make you come to our site you make an image and then we say and then maybe randomly you just say hey you know like how how was the color and contrast of this image and you say it was it was not very good and you just tell us so I think I think we can get like tens uh tens of thousands of these uh evaluations every single day to to truly measure real world performance as opposed to just like Benchmark performance hopefully next year I think we will try to to publish kind of like a like a a benchmark that anyone could use that we evaluate ourselves on and that other people can that we think does a good job of approximating real world performance because we've tried it and done it and noticed that it did yeah I think I think we will do that yeah um we're I think we're going to ask a few more like sort of producty questions um I and I I personally have a few like categories that I consider special among you know you know you have like animals art fashion food um there some categories which I consider like a different tier of image U so the top among them is text in images um how do you think about that um so one of the Big W moments for me something I've been looking out for the entire year is just the progress of text and images like do you can you write in an image or um an ideogram I think came out recently which had decent but not perfect text and images um Dolly 3 had improved uh some and ALDI said in their their uh paper was that they just included more text in the data set and it just worked I was like that's just that's just lazy I know but anyway do you care about that like because I don't see any any of that in like your samples yeah yeah we're are yeah the the V2 model is um was was mostly focused on um image quality versus like the feature of uh text synthesis yeah because well as a business user I care a lot about that yeah right yeah I'm very excited about text synthesis and yeah I think ideogram has a good job of maybe the best job Dolly kind of has like a it has like a hit rate you know you don't want just text effects I think where this has to go is it has to be like you could like write little tiny pieces of text like on like a milk carton yeah that's maybe not even the focal point of a scene yeah I think that's like a very hard task that um you know if you could do something like that then there's a lot of other possibilities well you don't have to zero shot it you can just be like here focus on this sure yeah yeah definitely yeah yeah so I think teex synthesis would be very exciting yeah uh and then also I also flag that um max wolf min max here which you must have come across his work um he's done a lot of stuff about using like logo masks that then map onto like a like food or vegetables and and it looks looks like text U which which can be pretty fun yeah yeah I mean it's very interesting to that that's the wonderful thing about like the open source Community is that you get things like control net and then you see all these people do these just amazing things with control net and then you wonder I think from our point of view we sort of go that that's really wonderful but how how do we end up with like a unified model that can do that what are the bottlenecks what are the issues um because the community ultimately has very limited resources yeah and so they they need these kinds of like workaround um workaround research ideas to get there um but yeah yeah are techniques like control net portable to your architecture definitely yeah we kept the playground V2 are exactly the same as sdxl not because not out of laziness but just because we wanted we knew that the community already had tools yeah it's you know all you have to do is maybe change a string in your code and then you know retrain a control net for it so it was very intentional to do that we didn't want to fragment the community with different architectures yeah yeah uh I I have more questions about that I don't know I don't I don't want to Dos you with h topics but okay I was basically going to go over three more categories one is uis like um app uis like mock uis uh third is uh not safe for work obviously and then copyrighted stuff um I don't know if you care to comment on any of those the NSFW kind of like safety stuff is really important um part part of I I kind of think that one of the biggest risks kind of going into maybe the US election year will probably be inter very interrelated with like Graphics audio um video I think it's going to be very hard to explain you know to a family relative who's not kind of in our world and our our world is like sometimes very you know we think it's very big but it's very tiny compared to the rest of world some like there's still lots of humanity who have no idea what chat PT is and I think it's going to be very hard to explain you know to your uncle aunt whoever you know hey I saw you know I saw President Biden say this thing on a video you know I can't believe you know he said that I think that's going to be a very troubling thing going into to um going into the world next year the year after oh I didn't that that's more of like a risk thing or like defix well faking political faking but there's just there's a lot of um studies on how um yeah for most businesses you don't want to train on not for workk images except that it makes you really good at bodies yeah I mean uh yeah I mean we personally we filter out um NSFW type of uh images in our data set so that it's you know so our safety filter stuff doesn't have to work as hard but you you've heard this argument that it get it makes you worse at because obviously not for work images are very good at uh human anatomy which you do want to be good at yeah it's not about like it's not like necessarily A Bad Thing to train on that data it's more about like how you go and use it that's why I was kind of talking about safety um I see you know in part because there are very terrible things that can happen in the world if you have a sufficiently you know extremely powerful Graphics model you know suddenly like you can kind of imagine you know now if you can like generate nudes and then there's like you can do very character consistent things with faces like what does that lead to I think it's like more what occurs after that right even if you train on let's say you know new data if it does something to kind of help there's nothing wrong with the human anatomy um it's very valid for a model to learn that uh but then it's kind of like how does that get used and uh you know I I won't bring up all of the very very unsavory terrible things that we see uh on on a day basis on the site I think it's more about what what occurs and so we you know we just recently did like a big sprint on safety internally around and it's very it's very difficult with graphics and art right because there is tasteful art that has nudity yeah right they're all over in museums like you know there's very very valid situations for that and then there's you know there's the things that are the gry line of that you know what I might not find tasteful someone might be like that is completely tasteful right and then and then there are things that are way over the line um and then there are things that are you know maybe maybe you or you know maybe I would you know be okay with but Society isn't yeah I think it's really hard with art I think it's really really hard sometimes even if you have like even if you have um things that are not new if a child goes to to your site Scrolls down some images you know classrooms of kids you know using our product it's really difficult problem and um and and it stretches mostly culture Society politics everything yeah okay um another favorite topic of our listeners is um ux in Ai and I think you're probably one of the best all-inclusive editors for these things so you don't just have the you know prompt images come out you pray and if now you do it again uh first you let people um pick a seed so they can kind of have semi- repeatable Generation Um you also have yeah you can pick how many images and then you leave all of them in the canvas and then you have kind of like this box the generation box and you can even cross between them and out pain there's all these things how did you get here you know most people most people are kind of like give me text I give you image you know you're like these are all the tools for you even though you're trying to make um uh a graphics Foundation model I think we think that we're also trying to like reimagine like what a graphics editor might look like given the change in technology so you know I don't think we're trying to build Photoshop but it's the only thing that we could say that people are you know largely familiar with oh okay there's Photoshop uh I think you know I don't think you would think of Photoshop without like the you know you don't you wouldn't think what would Photoshop compare itself to pre pre-computer I don't know right it's like or kind of like a a canvas but you know there's these menu options and you can use your mouse what's a mouse um so I I think that we're trying to make like we're trying to reimagine what a graphics editor might look like not not just for the fun of it but because we kind of have no choice like there's this idea in in image generation where you can generate images that's like a super weird thing what is that in Photoshop right you have to wait right now for the time being um but the waight is worth it often for a lot of people because they can't make that with their own skills so I think it goes back to you know how we started the company which was kind of looking at GPT 3's playground that the reason why we're named playground is is a homage to that actually um and you know it's like shouldn't these products be more visual shouldn't you know shouldn't they these prompt boxes are like like a terminal window MH right we're kind of at this weird point where it's just like CLI it's like MS DOS I remember my mom using MS DOS and I memorize the keywords like di LS all those things right it feels a little like there right prompt engineering is mean the shirt I'm wearing you know it's it's it's a bug not a feature yeah exactly parentheses to say beautiful or whatever which weights the word token more in the model or whatever um yeah it's that that's like super strange I think that's not I think everybody I think a large portion of humanity would agree that that's not user friendly right so how do we think about the products to be more user friendly well sure you know sure it would be nice if I could like you know if I wanted to get rid of like the headphones on my my head you know be nice to mask it and then say you can you remove the headphones um you know if I want to grow the expand the image sure you know how can we make that feel easier without typing lots of words and being really confused and by no by no stretch of the imic I don't even think we've nailed the uiux yet um part of that is because we don't we're still experimenting and part of that is because the model and the technology is going to get better and whatever felt like the right ux six months ago is going to feel very very broken now um and uh so that that's a little bit of how we got there is kind of saying does everything have to be like a prompt in a box or can we do can we do things that make it very intuitive for users how do you decide what to give access to so you have things like um expand prompt uh which di tree just does it doesn't let you decide whether you should or not um isn't like a rewrites your prompts for yeah yeah yeah for that feature I I think we'll probably I I think once we get it to be uh cheaper we'll probably just give it out we probably just give it away but we also decided something that we that might be a little bit different we notice that most of image generation is just like kind of casual you know it's in WhatsApp it's you know it's in a Discord bot somewhere with M Journey it's in chat GPT one of the differentiators I think we provide is at the expense of just lots of users necessarily mainstream consumers is that we provide as much like power and tweakability and configurability as possible so the only reason why it's a to it's a toggle because we know that users might want to use it and might not want to use it right there are some there are some really powerful power user hobbyists that know what they're doing and then there's a lot of people that um you just want something that looks cool but they don't know how to prompt and so I think a lot of playground is more about um going after that Poe user base that like knows has a bit more savviness uh and how to use these tools yeah so they might not use like these users probably you know the average Dolly user is probably not going to use control net they probably don't even know what that is um and so I think that like as the models get more powerful as there's more tooling um yeah I think you could imagine it hopefully you imagine a new sort of AI first graphics editor that's just as like powerful and configurable as Photoshop uh and you might have to master a new kind of tool yeah yeah W um there's so many things I could I could go bounce off of that um one one you you mentioned about waiting um we have to kind of somewhat address the elephant in the room uh uh consistency models have been blowing up uh uh the past month um is that like how do you think about integrating that um obviously there's there's a lot of other companies also trying to uh beat you to that space as well I think we were the first company to integrate it well we integrated it in a different way there are like 10 companies right now that have kind of tried to do like interactive editing where you can like draw on the left side and then you get an image on the right side we decided to kind of like wait and see whether there's like true utility on that um we have a different feature that's like unique uh in our product that um that's called preview rendering and so you go to the product and you and you say you know we we're like What is the most common use case the most common use case is you write a prompt and then you get an image but what's the most annoying thing about that the aning thing is like it's it feels like a slot machine right you're like okay I'm going to put it in and I'm going to maybe I'll get something cool so we did something that seemed a lot simpler but a lot more relevant to how users already use these products which is preview rendering you toggle it on and it will show you a render of the image and then it's just like a graphics tools already have this like if you use Cinema 4D or after effects or something it's called viewport rendering and so we try to take take something that exists in the real world that has familiarity and say okay you're going to get a rough sense of an early preview of this thing and then when you're ready to generate it's we're going to try to be as coherent about that image that you saw that way you're not spending so much time just like you know uh pulling down the slot machine lever so yeah we were we were actually the first company I think we were the first company to actually ship a quick LCM uh thing yeah okay yeah we were very excited about it so we shipped it very quick yeah yeah yeah I I think like the other um well the demos I've been seeing it's also I guess it's not like a preview necessarily they're almost using it to animate their their their Generations like to because you can kind of move shapes over yeah yeah they're they're like doing it they're like animating it but they're sort of showing like if I move a moon you know can I yeah yeah I don't know it it to me unlock it unlocks video in a way yeah um that uh but the video the video models are already so much better than that yeah so uh there's there's another one which I think it's a um um like how about like the just general ecosystem of luras right that um civit is obviously the most popular um repository of lauras um how do you think about interacting with that ecosystem yeah I mean uh the guy that that did Laura not the guy that invented luras but the person that brought luras to stable diffusion uh actually works with us um on on some projects uh his name is simu um shout out to seeu um and I think luras are are wonderful um obviously fine-tuning all these dream Booth models and such it's just so heavy and giving and I it's obvious in our conversation around Styles and Vibes and you know it's very hard to evaluate the Artistry of these things lauras give people uh this wonderful like opportunity uh to create like sub genres of Art and I think they're amazing and so any graphics tool any kind of thing that's expressing art has to provide some level of customization to it's it's user base that goes beyond you know just typing like Greg rowski in a prompt right we have to give more than that um it's not like users want to type these you know art real artist names it's that they don't know how else to get an image that looks interesting they they truly want like originality and uniqueness and I think luras provide that and they provide in a very nice scalable way um I hope that we find something even better than Laura's in the in the in the long term um because there there's still weaknesses to Laura um but I think they do a good job for now yeah and so you don't want to be the like you don't you would never compete with civit you would just kind of let people Civ is a site where like all these things get kind of hosted by the community right um and so yeah we'll often pull down the like some of the best things there um I think I think when we have a significantly better model uh we will certainly build something see that gets closer to that I still again I go back to saying just I still think this is like very nent things are very underpowered right we you lauras are not easy for people to train you know they're easy for an engineer okay but they're not they're not easy you know what it sure would be nicer if I could just pick you know five or six reference images right and and then say Hey you know this is this is and and there might even be five or six different reference images that are not they're just very different actually like they're they're they communicate a style but they're actually like it's like a mood board right and it takes you have to be kind of an engineer almost to train these lauras or go to some site and be technically Savvy at least um it seems like it' be much better if I could say I love this style I love I love this the style here or five images and you tell the model like this is what I want and the model gives you gives you something that's very aligned with what your style is what you're talking about and it's a style you couldn't even communicate right there's no word you this you know if if you have a Tron image it's not just Tron it's like Tron plus like four or five different weird things yeah um even cyber Punk can have its like sub genre right but I just think training Laura and doing that is very heavy so I hope we can do better than that cool yeah yeah um we have Shere from Lexa on the podcast before both of you have like a landing page with just a bunch of images where you can like explore things um yeah we have a feed yeah yeah it's that something you see more and more of in terms of like coming out with these Styles is that why you you have that as the starting point versus a lot of other products you just go in you have the generation prompt you don't see a lot of examples our feed is a little different than than their feed our feed is more about community so we have kind of like a Reddit thing going on where it's a kind of a competition like every day loose competition mostly fun competition of like making things and there's just this wonderful community of people where they're liking each other images and just showing their like their genine interest in each other and I think we definitely learn about styles that way one of the funniest polls uh if you go to the mid-journey polls they'll sometimes put these polls out and they'll say you know what do you wish you could like learn more from and like one of the one of the things that people vote the most for is like learning how to prompt right and so I think like you know if you if you put away your research hat for a minute you just put on like your product hat for for a second you're kind of like well why do people want to learn how to prompt right it's because they want to get higher quality images well what's higher quality composition lighting Aesthetics so on and so forth and I think that the community on our feed I think I think we have I think we might have the biggest community and uh and it gives all of the users a way to learn how to prompt because they're just seeing this huge Rising tide of all these images that are super cool and interesting and they can kind of like take each other's prompts and like kind of learn how to do that um I think that'll be shortlived because I think the complexity of these things is going to get higher um but um but that that's more about why we have that feed is to help each other help teach users and then also just you know celebrate people's art you run your own infra we do yeah that's unusual uh it's necessary it's necessary uh what have you learned running devops for gpus uh you I you had a tweet about like how many a100 you have but I feel like it's out of date probably yeah we uh I think I mean it just comes down to cost these things are very expensive so we just want to make it as affordable for everybody as possible um I don't find I find the devops for inference to be relatively easy okay I doesn't feel that different than you know I think we had thousands and thousands of servers at mix panel just for dealing with the API had such huge quantities of volume that I didn't find it I don't find it particularly very different um I do find uh GP model optimization performance is very new to me so I think that I find that very difficult at the moment but so that's very interesting but uh scaling inference is not not not terrible scaling a training cluster is very much much harder um than I perhaps anticipated why is that well you have you know you have to it's just like a very large distributed system with um you know if you have like a a node that goes down then your your training run crashes and then you have to somehow be resilient to that and I would say training infra software is very early feels very broken feel I can tell in 10 years it would be a lot better like a mosaic or whatever we don't yeah we don't no we don't we think we use very basic tools like you know slurm for scheduling and just normal py Tor P TCH lightning that kind of thing I think our tooling is in asent I think I talked to a friend that's over at xai they just they like built their own scheduler you know and doing things with kubernetes like when people are building out tools because the existing open source stuff doesn't work and every everyone's doing their own Beast spoke thing you know there's a m there's a valuable company to be formed yeah uh I think it's Mosaic I don't know well with Mosaic yeah it's it's tough with Mosaic cuz um anyway I I won't go into the details why but yeah we we found it difficult to do it might be worth like wondering like why why why not everyone is going to Mosaic and perhaps it's still it's I just think it's nent and perhaps Mosaic will come through cool anything for you um no no this was great and just to wrap we we talked about some of the pivotal moment ments in your mind with like Del and and whatnot if you were not doing this what's the most interesting unself question in AI that you would try and build in oh man coming up with startup ideas is very hard on the spot uh you you have to have them I mean you're a Founder you're a repeat founder like I I'm very picky about my startup ideas um so I don't I don't have any great ones uh the only thing that I I don't have an idea per se as much as a curiosity uh and I'll po I suppose I'll pose it to you guys right now we sort of think that a lot of the modalities just kind of feel like they're you know Vision language audio that's roughly it and somehow all this will like turn into something it'll be multimodal and then we'll end up with AGI um perhaps and I just think that there are probably far more modalities than maybe we than meets the eye and it just seems hard for us to see it right now because it's sort of like we have Tunnel Vision on the moment we're just like code image audio video yeah I think very very broad categories I think we are lacking imagination as a species in this regard I see it and and I think like you know just like you know it's not I don't know what company would would form as a result of this but you know like there's some some very difficult problems like just like a true actual like not a meta World model but an actual World model model that truly Maps everything that's going in terms of like physics and fluids and all these various kinds of interactions and what does that kind of model like a true physics Foundation model of sorts that represents Earth and that in of itself seems very difficult you know but we just think of but but we're kind of stuck on like thinking that we can approximate everything with like you know a word or a token if you will and I went you know I had a dinner last night where we were kind of debating this philosophically and I think someone you know said something that I also believe in which is like at the end of the day it doesn't really matter that it's like a token or a bite at the end of the day it's just like some you know unit of information that it emits but you know you do I do wonder if there are more far more modalities than um then meets the eye and if if you could create that then what would that what would what would that company become what problems could you solve so I I don't I don't know yet so I don't have a great company for it I know but maybe you just Inspire somebody to to try so yeah hopefully yeah my personal response to that is I'm I'm less interested in physics I'm more interested in people like like how do I how do I mind upload because that is right tation that is immortality that is everything yeah yeah can we can we model our own rather than trying to create Consciousness could we model our own um even if it was lossy to some extent yeah yeah um well we won't solve that here yeah um if I were to take a bill Gates book trip uh and I had a week uh what should I take with me to learn AI oh man oh gosh you shouldn't take a book you should just go to YouTube and visit karthi's uh class and just do it do it grind was that actually the most useful thing for you I wish it came out when I started back last year I I'm as as bum that I didn't get to take it at the beginning um but I did I did do a few of his classes regardless I I don't think books every time I buy a programming book I never read it I always find that just writing code helps cement my internal understanding yeah so so more generally advice for Founders who are not phds and are effectively selftaught like like you are like what should they do what should they avoid same thing as if is that I would advise if you're programming pick a project that seems very exciting to you but don't you know it doesn't have to be too serious and build it and learn every detail of it while you do it and it must be like like should would you train or can you can you go far enough not training just fine tuning it depend I would I would just follow your curiosity if like you want if what you want to do is something that requires fundamental understanding of training models then you should learn it you don't have to be a p you don't have to get to become a five you know fivey year whatever PhD but if that's necessary I would do it if it's not necessary then go as far as you need to go but I would learn you pick something that motivates I think most people tap out on motivation but they're deeply curious yeah cool Co thank you so much for coming out man thank you for having me appreciate [Music] it

Original Description

Before language models became all the rage in November 2022, image generation was the hottest space in AI. In our interview with Sharif Shameem from Lexica we talked through the launch of StableDiffusion and the early days of that space. At the time, the toolkit was still pretty rudimentary: Lexica made it easy to search images1, you had the AUTOMATIC1111 Web UI to generate locally, some HuggingFace spaces that offered inference, and eventually DALL-E 2 through OpenAI’s platform, but not much beyond basic text-to-image workflows. Today’s guest, Suhail Doshi, is trying to solve this with Playground AI, an image editor reimagined with AI in mind: https://playgroundai.com/ Timestamps: 0:00 - Introductions 0:56 - Suhail's background (Mixpanel, Mighty) 9:00 - Transition from Mighty to exploring AI and generative models 10:24 - The viral moment for generative AI with DALL-E 2 and Stable Diffusion 17:58 - Training Playground v2 from scratch 27:52 - The MJHQ 30K benchmark for evaluating model's aesthetic quality 30:59 - Discussion on styles in AI-generated images and the categorization of styles 43:18 - Tackling edge cases from UIs to NSFW 49:47 - The user experience and interface design for AI image generation tools 54:50 - Running their own infrastructure for GPU DevOps 57:13 - The ecosystem of LoRAs 1:02:07 - The goals and challenges of building a graphics editor with AI integration 1:04:44 - Lightning Round

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Latent Space · Latent Space · 12 of 60

← Previous Next →

Ep 18: Petaflops to the People — with George Hotz of tinycorp

Ep 18: Petaflops to the People — with George Hotz of tinycorp

FlashAttention-2: Making Transformers 800% faster AND exact

FlashAttention-2: Making Transformers 800% faster AND exact

RWKV: Reinventing RNNs for the Transformer Era

RWKV: Reinventing RNNs for the Transformer Era

Generating your AI Media Empire - with Youssef Rizk of Wondercraft.ai

Generating your AI Media Empire - with Youssef Rizk of Wondercraft.ai

RAG is a hack - with Jerry Liu of LlamaIndex

RAG is a hack - with Jerry Liu of LlamaIndex

The End of Finetuning — with Jeremy Howard of Fast.ai

The End of Finetuning — with Jeremy Howard of Fast.ai

Why AI Agents Don't Work (yet) - with Kanjun Qiu of Imbue

Why AI Agents Don't Work (yet) - with Kanjun Qiu of Imbue

Powering your Copilot for Data - with Artem Keydunov from Cube.dev

Powering your Copilot for Data - with Artem Keydunov from Cube.dev

Beating GPT-4 with Open Source Models - with Michael Royzen of Phind

Beating GPT-4 with Open Source Models - with Michael Royzen of Phind

The State of Silicon and the GPU Poors - with Dylan Patel of SemiAnalysis

The State of Silicon and the GPU Poors - with Dylan Patel of SemiAnalysis

The "Normsky" architecture for AI coding agents — with Beyang Liu + Steve Yegge of SourceGraph

The "Normsky" architecture for AI coding agents — with Beyang Liu + Steve Yegge of SourceGraph

The AI-First Graphics Editor - with Suhail Doshi of Playground AI

The AI-First Graphics Editor - with Suhail Doshi of Playground AI

The Accidental AI Canvas - with Steve Ruiz of tldraw

The Accidental AI Canvas - with Steve Ruiz of tldraw

The Origin and Future of RLHF: the secret ingredient for ChatGPT - with Nathan Lambert

The Origin and Future of RLHF: the secret ingredient for ChatGPT - with Nathan Lambert

The Four Wars of the AI Stack - Dec 2023 Recap

The Four Wars of the AI Stack - Dec 2023 Recap

The State of AI in production — with David Hsu of Retool

The State of AI in production — with David Hsu of Retool

Building an open AI company - with Ce and Vipul of Together AI

Building an open AI company - with Ce and Vipul of Together AI

Truly Serverless Infra for AI Engineers - with Erik Bernhardsson of Modal

Truly Serverless Infra for AI Engineers - with Erik Bernhardsson of Modal

A Brief History of the Open Source AI Hacker - with Ben Firshman of Replicate

A Brief History of the Open Source AI Hacker - with Ben Firshman of Replicate

Open Source AI is AI we can Trust — with Soumith Chintala of Meta AI

Open Source AI is AI we can Trust — with Soumith Chintala of Meta AI

Making Transformers Sing - with Mikey Shulman of Suno

Making Transformers Sing - with Mikey Shulman of Suno

A Comprehensive Overview of Large Language Models - Latent Space Paper Club

A Comprehensive Overview of Large Language Models - Latent Space Paper Club

Why Google failed to make GPT-3 -- with David Luan of Adept

Why Google failed to make GPT-3 -- with David Luan of Adept

Personal AI Meetup - Bee, BasedHardware, LangChain LangFriend, Deepgram EmilyAI

Personal AI Meetup - Bee, BasedHardware, LangChain LangFriend, Deepgram EmilyAI

Supervise the Process of AI Research — with Jungwon Byun and Andreas Stuhlmüller of Elicit

Supervise the Process of AI Research — with Jungwon Byun and Andreas Stuhlmüller of Elicit

Breaking down the OG GPT Paper by Alec Radford

Breaking down the OG GPT Paper by Alec Radford

High Agency Pydantic over VC Backed Frameworks — with Jason Liu of Instructor

High Agency Pydantic over VC Backed Frameworks — with Jason Liu of Instructor

This World Does Not Exist — Joscha Bach, Karan Malhotra, Rob Haisfield (WorldSim, WebSim, Liquid AI)

This World Does Not Exist — Joscha Bach, Karan Malhotra, Rob Haisfield (WorldSim, WebSim, Liquid AI)

LLM Asia Paper Club Survey Round

LLM Asia Paper Club Survey Round

How to train a Million Context LLM — with Mark Huang of Gradient.ai

How to train a Million Context LLM — with Mark Huang of Gradient.ai

How AI is Eating Finance - with Mike Conover of Brightwave

How AI is Eating Finance - with Mike Conover of Brightwave

How To Hire AI Engineers (ft. James Brady and Adam Wiggins of Elicit)

How To Hire AI Engineers (ft. James Brady and Adam Wiggins of Elicit)

State of the Art: Training 70B LLMs on 10,000 H100 clusters

State of the Art: Training 70B LLMs on 10,000 H100 clusters

The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka

The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka

Training Llama 2, 3 & 4: The Path to Open Source AGI — with Thomas Scialom of Meta AI

Training Llama 2, 3 & 4: The Path to Open Source AGI — with Thomas Scialom of Meta AI

[LLM Paper Club] Llama 3.1 Paper: The Llama Family of Models

[LLM Paper Club] Llama 3.1 Paper: The Llama Family of Models

Synthetic data + tool use for LLM improvements 🦙

Synthetic data + tool use for LLM improvements 🦙

RLHF vs SFT to break out of local maxima 📈

RLHF vs SFT to break out of local maxima 📈

The Winds of AI Winter (Q2 Four Wars of the AI Stack Recap)

The Winds of AI Winter (Q2 Four Wars of the AI Stack Recap)

Segment Anything 2: Memory + Vision = Object Permanence — with Nikhila Ravi and Joseph Nelson

Segment Anything 2: Memory + Vision = Object Permanence — with Nikhila Ravi and Joseph Nelson

Answer.ai & AI Magic with Jeremy Howard

Answer.ai & AI Magic with Jeremy Howard

Is finetuning GPT4o worth it?

Is finetuning GPT4o worth it?

Personal benchmarks vs HumanEval - with Nicholas Carlini of DeepMind

Personal benchmarks vs HumanEval - with Nicholas Carlini of DeepMind

Building AGI with OpenAI's Structured Outputs API

Building AGI with OpenAI's Structured Outputs API

Q* for model distillation 🍓

Q* for model distillation 🍓

Finetuning LoRAs on BILLIONS of tokens 🤖

Finetuning LoRAs on BILLIONS of tokens 🤖

Cursor UX team is CRACKED 💻

Cursor UX team is CRACKED 💻

Choosing the BEST OpenAI model 🏆

Choosing the BEST OpenAI model 🏆

How will OpenAI voice mode change API design?

How will OpenAI voice mode change API design?

STEALING OpenAI models data 🥷

STEALING OpenAI models data 🥷

[Paper Club] 🍓 On Reasoning: Q-STaR and Friends!

[Paper Club] 🍓 On Reasoning: Q-STaR and Friends!

[Paper Club] Writing in the Margins: Chunked Prefill KV Caching for Long Context Retrieval

[Paper Club] Writing in the Margins: Chunked Prefill KV Caching for Long Context Retrieval

The Ultimate Guide to Prompting - with Sander Schulhoff from LearnPrompting.org

The Ultimate Guide to Prompting - with Sander Schulhoff from LearnPrompting.org

llm.c's Origin and the Future of LLM Compilers - Andrej Karpathy at CUDA MODE

llm.c's Origin and the Future of LLM Compilers - Andrej Karpathy at CUDA MODE

Prompt Engineer is NOT a job 📝

Prompt Engineer is NOT a job 📝

Prompt Mining LLMs for better prompts ⛏️

Prompt Mining LLMs for better prompts ⛏️

The six pillars of few-shot prompting 🔧

The six pillars of few-shot prompting 🔧

Language Agents: From Reasoning to Acting — with Shunyu Yao of OpenAI, Harrison Chase of LangGraph

Language Agents: From Reasoning to Acting — with Shunyu Yao of OpenAI, Harrison Chase of LangGraph

[Paper Club] Who Validates the Validators? Aligning LLM-Judges with Humans (w/ Eugene Yan)

[Paper Club] Who Validates the Validators? Aligning LLM-Judges with Humans (w/ Eugene Yan)

Can you separate intelligence and knowledge?

Can you separate intelligence and knowledge?

More on: Image Generation Basics

View skill →

ULTIMATE FREE NSFW LTX 2.3 LORA TRAINING! VIDEO & VOICE!

ULTIMATE FREE NSFW LTX 2.3 LORA TRAINING! VIDEO & VOICE!

Create and Master 3D Assets in Blender from Scratch

Create and Master 3D Assets in Blender from Scratch

ControlNet and Stable Diffusion Local Step by Step Installation Guide

ControlNet and Stable Diffusion Local Step by Step Installation Guide

Onur Yuce Gun, PhD

Qwen 2.5 AI: Complete Beginner Tutorial [100% Free and OpenSource]

Qwen 2.5 AI: Complete Beginner Tutorial [100% Free and OpenSource]

FREE Video AI - Deforum Local Install - Super Easy!

FREE Video AI - Deforum Local Install - Super Easy!

GEN-3 gives live to Midjourney images

GEN-3 gives live to Midjourney images

Related AI Lessons

FREE AI Sin City Photo Generator — Turn Any Photo Into High-Contrast Noir Art (2026)

Transform any photo into a Sin City-inspired high-contrast noir art using a free AI generator

Google makes Gemini’s personalized image generation free for all US users

Google's Gemini personalized image generation is now free for all US users, allowing them to generate images informed by their Google data

The Next Web AI

Gemini’s personalized AI image generation is now free for U.S. users

Gemini's AI image generation is now free for U.S. users, allowing for personalized images based on user interests and data

WebP's Compression Secret: How a 1MB PNG Becomes a 200KB WebP

Learn how WebP compresses images more efficiently than PNG and JPEG, and why it matters for web development

Dev.to · swift king

Chapters (13)

Introductions

0:56 Suhail's background (Mixpanel, Mighty)

9:00 Transition from Mighty to exploring AI and generative models

10:24 The viral moment for generative AI with DALL-E 2 and Stable Diffusion

17:58 Training Playground v2 from scratch

27:52 The MJHQ 30K benchmark for evaluating model's aesthetic quality

30:59 Discussion on styles in AI-generated images and the categorization of styles

43:18 Tackling edge cases from UIs to NSFW

49:47 The user experience and interface design for AI image generation tools

54:50 Running their own infrastructure for GPU DevOps

57:13 The ecosystem of LoRAs

1:02:07 The goals and challenges of building a graphics editor with AI integration

1:04:44 Lightning Round

OpenAI Kills Sora then Descends into Chaos