A Brief History of the Open Source AI Hacker - with Ben Firshman of Replicate

Latent Space · Beginner ·☁️ DevOps & Cloud ·2y ago

Skills: Reading ML Papers80%Paper Reproduction70%Research Methods60%RAG Basics50%Vector Stores50%

Key Takeaways

The video discusses the history and development of Replicate, an AI inference provider, with its CEO Ben Firshman, covering topics such as CLI design principles, research papers, and machine learning models, as well as the company's experience with YC and its evolution into a platform for researchers and developers to share and publish models.

Full Transcript

hey everyone welcome to the Len space podcast this is alesio partner and CTO residents at desel partners and I'm joined by my co-host swix founder of small AI hey and today we have Ben fresman in the studio welcome Ben hey good to be here uh Ben you're co-founder C COO of replicate uh before that you were most notably uh creator of fig or founder of fig which got which became Dr compose um you also did a couple other things um before that but uh that's what a lot of people know you for um what should people know about you that you know outside of your your sort of LinkedIn profile yeah good question I think I'm a builder and tinkerer like in a very broad sense and I love using my hands to make things so like I work on you know things maybe a bit closer to Tech like Electronics but I also like build things out of wood and I like uh fix cars and I fix my bike and build bicycles and all this kind of stuff and there's so much I think I've learned from transferable skills from just like working in the real world to building things building things uh in software um and you know it's so much about being a builder both in real life and and in software that's uh that crosses over is there a real world analogy that you use often when you're thinking about like a code architecture or problem I like to build software tools as if they were a um as if they were something I like to imagine uh so I wrote this thing called the command line interface guidelines which was a bit like sort of the Mac human interface guidelines but for commin line interfaces I did it with um uh the guy who created dock compos with um and and a few other people and I think something in there I think I described that your command line interface should feel like a big iron machine where you pull a lever and it goes clunk and like things should respond within like 50 milliseconds um as if it was like a real life thing um and like another analogy here is like in the real life you know when you press a button on an electronic device and it's like a soft switch and you press it and nothing happens and there's no physical feedback about anything happening then like half a second later something happens like that's how a lot of software feels but instead like software should feel more like something that's real where you touch you pull a physical lever and the physical lever moves you know taken that lesson of kind of human interface um to to software a ton you know it's all about kind of low latency feeling things feeling really solid and robust um both for command lines and and uh and user interfaces as well and how did you operationalize that for a fig or Docker uh a lot of it's just low latency actually we didn't do it very well for for fig and POS in the first place we used python which was a big mistake where Python's really hard to get booting up fast because you have to load up the whole python run time before it can run anything okay um go is much better at this where like go just instantly starts um so so what's the like you have to be under 500 milliseconds to start up yeah effectively I mean I mean you know human perception of human things being immediate is you know something like 100 milliseconds uh so and anything like that is is is yeah good enough yeah um and also I should mention since we're talking about your side projects um well one thing is um I am maybe one of a few fell people who have actually written something about CLI design principles um because I was uh in charge of the NFI CLI back in the day um and U had many thoughts one of my fun thoughts I I'll just share in case you have thoughts is um I think CIS are effectively starting points for scripts that that are then run and the moment one of the scripts preconditions are f are not fulfilled typically they end so the the CLI developer will just will just exit the the program um and the way that I design I really wanted to to create the NFI Dev workflow was for it to be kind of a state machine that would resolve the itself um if it detected a precondition wasn't fulfilled it would actually delegate to a subprogram that would then fulfill that precondition asking for more info or waiting until condition is fulfilled then we'll go back to the original flow and and continue continue that U don't know if like that was ever tried or is there a more formal definition of it CU I just came up with it randomly but it felt like the beginnings of AI in the sense that when you run a CLI command you have an intent to do something and you may not have given a CLI all the things that it needs to do to execute that intent so that was my two cents yeah that reminds me of a thing we we sort of um thought about when writing the CLI guidelines where CIS were designed in a world where the CLI was really a programming environment and it primarily designed for like machines like to use all of these commands and scripts whereas over over time the CLI has evolved to humans it was you know it's back in a world where like the primary way of like using and computers was like writing shell scripts effectively um and uh we we've transitioned to a world where actually humans are using C programs much more than they used to and um and the current the current sort of best practices about how Unix was designed like you know there there's lots of sort of design documents about Unix from from the 70s and ' 80s where they they say things like command line commands should not output anything on success it should be completely silent and which makes sense if you're using it in a shell script but if a user is using that it just looks like it's broken if you type copy and it just doesn't say anything you assume that it didn't work as a new user um and yes so I I think what's really interesting about the CLI is for is that it's actually a really good to your point it's a really good user interface where it can be like a conversation where it feels like you're instead of just like you telling the computer to do this thing and either silently succeeding or saying no you did failed you know it can like guide you in the right direction and tell you what your intent might be and um that kind of thing in a way that's actually it's almost more natural to a CLI than it is in a graphical user interface cuz it feels like this back and forth with the computer yeah um almost finally like like a language model um uh so I think there's some some interesting intersection of like CIS and language models actually being very very sort of uh uh you know closely related and good fit for each other yeah I'll would say U one of the surprises from last year um you know I worked on a coding agent uh but I think the most successful coding agent of my cohort was open interpreter which was a CLI implementation and I have chronically even as a CLI person I have chronically underestimated CLI as a useful interface yeah yeah toally um you also developed archive vanity which you recently retired after a glorious seven years something like that something like that um which is nice I guess HTML PDFs yeah that that was actually the the start of where replicate came from okay we can tell that story which uh so when I quit doer I got really interested in um science infrastructure just as like a problem area um because it is like science has created so much progress in the world the fact that we're you know can talk to each other on a podcast and we use computers and the fact that we're live is probably thanks to medical research you know but science is just like completely archaic and broken and there like 19th century processes that just happen to be you know copied to the internet rather than take into account that you know we can transfer information at the speed of light now um and the whole way science is funded and all this kind of thing is all kind of very broken um there's just so much potential for making science work better and I realized that I wasn't a scientist and I didn't really have any time to go and get a PhD and become a researcher but I'm a tool Builder and I could make existing scientists better at their job and if I could make like a bunch of scientists A little bit better at their job maybe you know that's the kind of equivalent of being a researcher um so um one particular thing I dialed in on is just how science is disseminated in the um uh it's all of these um PDFs quite often behind pay walls you know on the internet um but and that's a whole thing because it's funded by national grants y government grants then then put behind P walls yeah exactly that's that's like a whole yeah I could talk about that but the particular thing we got we got D in on was um or I I got kind of but interestingly these PDFs are also there's a bunch of open science that happens as well so math physics computer science machine learning notably is all published on the archive which is um actually a surprisingly old institution some random Cornell it just like somebody in Cornell who started a mailing list in the 80s and then when the web was invented they built a web interface around it like it's super old um and it's like kind of like a US user group thing right that's why there all these like numbers and stuff yeah exactly like it's it's a bit like um and um that's where all basically all of ma physics and computer science happens um but it's still PDFs published to this thing you know which is just so infuriating um so um you know like the the the the web was invented at CERN a physics institution to share academic writing like there these there are figure tags there are like author tags there are heading tags there are site tags you know hyperlinks are effectively Cit because you want to link to another academic paper but instead you have to like copy and paste these things and try and get around pays like it's ABS you know like and now now we have like social media and things but still like academic papers as PDFs you know it's just like why this is not what the web was for um so anyway I got really frustrated with that and I went on vacation with my old friend Andreas so we were we used to work together in London on a startup at somebody else's startup and we were just on vacation in Greece for and he was like trying to read a machine learning paper on his phone you know like we had to like zoom in and like scroll line by line on the PDF he was like this is [ __ ] stupid so and I was like I know like this is something we discovered our mutual hatred for for this you know and uh we spent our vacation sitting by the pool like making latch to HDML like converters making the first version of archive vanity um anyway that was then a whole thing and um the story we shut it down recently because I they caught the eye of archive who were like oh this is great we just haven't had the time to work on this and what's tragic about the archive is it is um it is like this department of it's like this project of Cornell that's like they can barely scr together R of money to survive I think it might be better funded now than it was when we were we were collaborating with them um and compared to these like scientific journals it's just this is actually where the work happens they just have a fraction of the money that like the these these big scientific journals have which is just so tragic um but anyway they were like yeah this is great we can't afford to like do it but do you want to like as a volunteer integrate archive vanity into archive um oh you did the work we didn't do the work we started doing the work we did some I think we worked on this for like a few months to actually get it integrated into archive um and then we got like distracted by replicate so um a guy called Dayan picked up the work and made it happen um like somebody who works on one of the the pie the libraries that powers archive vanity um okay and relationship with archive sanity um none did did you predate them I actually don't know lineage we were after we both were both users of archive sanity which is like a sort of archive Andre like rexus on top of archive yeah yeah and we were both users of that and I think we were trying to come up with a working name for archive and Andreas just like cracked a joke of like oh let's called the guy of vanity it's making the papers look nice yeah and that was the working name and it just stuck got it got it um yeah and then from there tell us more about why you got distracted right so replicate maybe feels like an overnight success to a lot of people uh but you've been building this since 2019 um so what what prompted the the start and we've been collaborating for even longer so we created archive vanity in 2017 so in some sense we've doing this almost like six seven years now a classic seveny year over uh yes we did archive vanity and then worked on a bunch of like surrounding projects I was still like really interested in science publishing at that point um and I'm trying to remember cuz I tell a lot of like the condensed story to people cuz I can't really tell like a seven-year history so I'm trying to figure out like the right oh we got room the right the right length to we want to nail the the definitive replicate story here one thing that's really interesting about these machine learning papers is that these machine learning papers are published on on the archive and a lot of them are actual fundamental research so like should be like pros describing a theory but a lot of them are just running pieces of software that like a machine learning researcher made that did something um uh you know it was like an image classification model or something and they managed to make an image classification model that was better than the Stace of the existing state-ofthe-art and they've made an actual running piece of software that that that does image segmentation and then what they had to do is they then had to take that piece of software and write it up as pros and math in a PDF um and and what's frustrating about that is like if if you want to so this was like andreas' Andreas was a machine learning engineer at Spotify and some of his job was like he did pure research as well like he did a PhD and he was doing a lot of stuff internally but part of his job was also being an engineer and taking some of these existing things that people have made and published and trying to apply them to actual problems at Spotify and he was like you know you get given a p a paper which like describes roughly how the model works it's probably Lessing lots of crucial information there's sometimes code on GitHub more and more there's code on GitHub but back in back back then it was kind of relatively rare but it was quite often just like Scrappy research code and didn't actually run um and you know there was maybe the weights that were on Google Drive but they accidentally deleted the weights off Google Drive you know and it was like really hard to like take this stuff and actually use it for real things and we just started talking together about about like his problems at Spotify and I connected this back to my work at Docker as well was like oh this is what we created containers for you know we solved this problem for normal software by putting the thing inside a container so you could ship it around and it kept on running so we were we were sort of hypothesizing about like hm what if we put machine learning models inside containers so they could actually be shipped around and they could be defined in like some production production ready format and other researchers could run them to generate baselines and you could people who wanted to actually apply them to real problems in the world could just pick up the container and run it you know um and we then thought this is quite where it gets normally normally in this part of the story I skip forward to be like and then we created Cog this container stand for for machine learning models and we created replicate the place for people to publish these machine learning models but there's actually like two or three years between the the thing we then got dialed into was Andreas was like what if there was a the I system for machine learning cuz like one of the things he really struggled with with as a researcher is generating baselines so when like he's writing a paper he needs to like get like five other models that are existing work and get them running on the same evils on the exactly on the same eval so you can compare Apples to Apples because you can't trust the numbers in the paper so um or you can be Google and just publish them anyway um so he was like what if what if you could I think this was coming from the thinking of like there should be containers for machine learning but why are people going to use that okay maybe we can create a supply of containers by like creating this useful tool for researchers and the useful tool was like let's get researchers to package up their models and push them to the central place where we' run a standard set of benchmarks across the models so that um you can trust those results and you can compare these models Apples to Apples and for like a researcher for Andreas like doing a new piece of research he could trust those numbers and he could like pull down those pull down those those models confirm it on his machine use the standard Benchmark to then measure his model and you know all this kind of stuff um and so we started building that that's what we applied to YC with um we got into YC and we started sort of building a prototype of this and then this is like where it all starts to fall apart we were like okay that sounds great and we talked to a bunch of researchers and they really wanted that and that sounds brilliant that's a great way to create a supply of like models on This research platform but how the hell is this a business you know like how are we even going to make any money out of this and we're like oh [ __ ] that's like the that's the real unknown here of like what the business is so we um we thought it would be a really good idea to like okay before we get too deep into this let's try and like um reduce the risk of this turning into a business so let's try and like research what the business could be for this for this uh you know for This research tool effectively so we went and talked to a bunch of companies trying to trying to sell them something which didn't exist so we're like hey do you want a way to share research inside your company so that other researchers or say like the product manager can test out the machine learning model they're like uh maybe um and we were like do you want a like a deployment platform for deploying models like do you want like a central place for versioning models like we're trying to think of like lots of different like products we could sell that were like related to this thing um and terrible idea like we're not sales people and like people don't want to buy something that doesn't exist um I think some people can pull this off but we were just like you know a bunch of product people product and engineer people and we just like couldn't pull this off um so we then got halfway through our YC batch we didn't have we hadn't built a product we had no users we had no idea what our business was going to be because we couldn't get anybody to like buy something which didn't exist um and actually there was quite a way through our I think it was like 2/3 the way through our YC batch or something we like okay well we're kind of screwed now um cuz we don't have anything to show at demo day and then we then like tried to figure out okay what can we build in like two weeks there'll be something so we like desperately tried to I can't remember what we tried to build at that point um and then two weeks before demo day I just remember this um um I remember it was all it was all we were going down to M view every week for dinners and we got called onto like an alland Zoom call which was super weird like what's going on and they were like don't come to dinner tomorrow um and we realized we kind of looked at the news and we were like oh there's a pandemic going on we were like so deep in our startup we were just like completely oblivious to what was going on around us was this Jan or Feb March 2020 March 2020 yeah cuz I remember Silicon Valley at the time was early to co y like they started locking down a lot faster than the rest of exactly and I remember yeah soon after that like there was the San Francisco lockdowns and then like the YC batch just like stopped there wasn't demo day um and it was in in a sense a blessing for us because we just kind of could raise money um in the normal course of events you can you're actually allowed to defer to a future demo day yeah so we didn't even technically defer because it just kind of didn't happen you know so um so so was YY helpful yes we completely screwed up the batch and that was our fault I think the thing that Y is become incredibly valuable for us has been after YC um I think I think reason you could there was a reason argument that we couldn't didn't need to do YC to start with because we were quite experienced we had done some startups before we were kind of well connected with VCS you know it relatively easy to raise money because we were like a known quantity you know if you go to VC and be like hey I made this piece of piece of it's doer composed for AI exactly yeah and and and like you know people can pattern match like that and they can have some trust you know what you're doing um whereas it's much harder for people straight out of college and that's where like y sweet spot is like helping people straight out of college who are super promising like figure out how to do that yeah no credentials yeah exactly so some sense we didn't need that but the thing that's been incredibly useful for us since YC has been this was actually I think so Docker was Docker was a y company and Solomon the fan of Docker I think told me this he was like um a lot of people underestimate the value of YC after you finished the batch and he his biggest regret was like not staying in touch with YC I might be misattributing this but I think it was him and so we made a point of that and we just stayed in touch with our bch partner who um Jared at YC has been fantastic Jared Harris um Jared fredman Freedman and all of like the team at YC there was the growth team at YC when they when they were still there and they've been super helpful um and um two things I've been super helpful about that is like raising money like they just know exactly how to raise money and they've been super helpful during that process in all of our rounds like we've done three rounds since we did YC and they've been super helpful during the whole process um and also just like reaching a ton of customers so like the magic of YC is that all of like there's thousands of YC companies I think like on the RO of thousand I think um and they're all of your first customers and they're like super helpful super receptive really want to like try out new things um you have like a warm inro to every every one of them basically and there's this mailing list where you can post about updates to your um to your product um which is like really receptive and that's just been fantastic for us like we' we've just like got so many of our of our users and customers through through YC um yeah well so the classic criticism or the sort of you know push back is people don't buy you because um you are both from YC but at least they'll open the email yeah right like that's the okay yeah effectively um and yeah yeah so that's been a really really POS experience for us and sorry I interrupted with the YC question like you were you make you just made it out of the YC survived the pandemic um and you yeah I'll try and condense this a little bit then we then we started building tools for covid weirdly we're like okay we don't have a startup we haven't figured out anything what's the most useful thing we could be doing right now save lives so yeah let's try and let's try and save lives I think we failed at that as well we had a bunch of projects that didn't really go anywhere um uh we kind of worked on yeah a bunch of stuff like contact tracing which didn't really be a useful thing um sort of uh Andreas worked on a like a um like a a door dash for like people delivering food to people who are vulnerable uh what else did we do the meta problem of like helping people direct their efforts to what was most useful um and a few other things like that didn't really go anywhere so we're like okay this is not really working either um we we were considering actually just like doing like work for Co we have like this decision document early on in our company which is like should we become a like government app Contracting shop you know um we decided no because you also did uh work for the US uh for the gov yeah exactly we had experience like doing some like uh and the guardian and all that yeah for like government stuff and and we were just like really good at building stuff like we were just like product people like I was like the front end product side and Andreas was the back end side so we were just like a a product and we were working with a designer at the time um guy called Mark who did our early designs for replicate and we're like hey what if we just team up and like become and build stuff um yeah we gave up on that in the end for can't remember the details um so we went back to machine learning and then we were like uh well we're not really sure if this is going to work and one of my most painful experiences from previous startups is shutting them down like when you realize it's not really working and having to shut it down it's like a ton of work and it's people hate you and it's just sort of you know um so we were like how can we make something we don't have to shut down and even better how can we make something that won't page Us in the middle of the night uh so we made an open source project we made a thing which was an open source weights and biases um because we had this theory that like weights that that like people want open source tools there should be like an open source like version control experiment tracking like thing and it was intuitive to us and we're like oh we're software developers and we like command line tools like everyone El mine tools and open source stuff but machine learning researchers just really didn't care like they just wanted to click on buttons they didn't mind that it was a cloud service like it was all very visual as well that you need a lots of graphs and and charts and stuff like this so it just didn't it wasn't right like it was right we were actually realing something that Andreas made at Spotify for just like saving experiments to cloud storage automatically but other people didn't really want this so we kind of gave up on that and then we that was actually originally called replicate and we renamed that out the the way so it's now called Keepsake and I think some people still use it then we sort of came back we looped back to our original idea so we were like oh maybe there was a thing and that thing we were originally sort of thinking about of like researchers showing their work and containers for machine learning models so we just built that and at that point we were kind of running out of the YC money so we were like okay this like feels good though let's like give this a shot so that was the point we raised a seed round we raised um seed pre-launch we raised pre-launch pre-launch and pre um it was an idea basically we had a little prototype it was just an idea in a team um but we were like okay like you know when boot straing this thing is getting hard so let's actually raise some money um and then we made Cog and replicates it initially didn't have apis interestingly it was just the bit that I was talking about before of helping researchers share their work so it was a way for researchers to to put their work on a a web page such that other people could try it out uh and so that you could download the docker container so that like we didn't have we cut the benchmarks thing of it because we thought that was just like too complicated but it had a Docker container that like you know Andreas in a past life could download and run with his Benchmark and you could compare all these models Apples to Apples so that was like the theory behind it um and that kind of started to work it was like still when like you know it's pre longtime pre-ai hype and there was lots of interesting stuff going on but it was it was very much in like the classic deep learning era so sort of image segmentation models and sentiment analysis and all these kind of things you know that people were using uh that were using deep learning models for and we were very much building for research because all of the stuff was happening in research institutions you know the sort of people that' be publishing to Archive so we were we were creating an accompanying material for their models basically you know they wanted a demo for their models and we were creating ACC companying material for it um and they were like what was funny about that is they were like not very good users like they were they were doing great work obviously but but the way that research worked is that they they just made like one thing every six months and they just fired and forget forgot it like they published this piece of paper and like done I'm I've I've published it um so they like output it to replicate and then they just stopped using replicate yeah there were like once every six monthly users and that wasn't great for us um but we stumbled across this early Community this was early 2021 when people started open AI created this created clip and people started smushing clip and ganss together to produce image generation models and this started with um you know it was just a bunch of like tinkerers on Discord basically um it was um there was an early model called Big Sleep by adad noun and then there was vuan clip which was like a bit more popular by Rivers have wings and it was all just people like tinkering on stuff in collabs and it was very Dynamic and it was people just making copies of collabs and playing around with things and foring and and to me this I saw this I was like oh this feels like open source software like so much more than the research world where like people are publishing these papers yeah you don't know their real names and it's just like a Discord yeah exactly but crucially it was like people were tinkering and fing and people things were moving really fast and um it just felt like this creative Dynamic collaborative community in a way that research wasn't really like it was still stuck in this kind of six-month publication cycle so we just kind of latched onto that and started building for this this community um and you know a lot of those early models were published on replicates no I think the first one that's really primarily on replicate was one called pix Ray which was sort of sort of mid 2021 um and it had a really cool like pixel art output but it also just like produced they weren't like Crispin images but they were quite aesthetically pleasing like some of these early image generation models and um um you know that was like published primarily on replicates and then a few other models around that were like published on replicates and that's where we really started to find our early community and like where we really found like oh we've actually built a thing that people want um and they were great users as well and people really want to try out these models of people were like running the models on replicate we still didn't have apis though interestingly and this is like another like really complicated part of the story we had no idea what our business model was still at this point I don't think people could even pay for it you know it's just like these web forms where people could run the model um and just before this API bit uh Contin just for historical interest uh which discords were they and how did you find them was this the lion Discord yeah lither Luther yeah it was the Luther one alther particularly remember there was a channel where where VI gun clip this is early 2021 where VI gun clip was set up as a as a Discord bot and I just remember being completely just like captivated by this thing I was just like playing around with it all afternoon and like the sort of thing oh [ __ ] it's 2 a.m. you know yeah this is the beginnings of mid Journey yeah exactly and and stability it was the start of it was the start of mid journey and you know it's where that kind of user interface came from like what's beautiful about the user interface is like you see what other people are doing and you could you could Riff Off other people's ideas and it was just so much fun to just like play around with this in like a channel for over 100 people uh and yeah that just like completely captivated me and I like okay this this is like this is something you know so we like we should get these things on replicate um and yeah that's that's where that that all came from yeah okay sorry and I just wanted to capture that um and then you moved on to so was it apis next or was it stable diffusion next it was AP AP next and the apis happened because one of our users our web form had like an internal API for making the web form work like with a an API that was called from JavaScript and somebody like reverse engineered that to start generating images with a script you know they did like you know web inspector cop like figured out what the request was um and it wasn't secured or anything of course not and they started generating a bunch of images and like we got tons tons of traffic like what's going on um and I think like a sort of usual reaction to that would be like hey you're abusing our API and to shut them down and instead we're like oh this is interesting like people want to run these models um so we documented the API in an ocean document like our internal API in an ocean document and like messaged this person being like hey you seem to have found our API um here's the documentation that'll be like 1,000 bucks a month please with a stripe form like that we just click some buttons to make um and they were like sure that sounds great so that was our first customer a th000 bucks a month uh it was it was surprising amount of money yeah that's not casual it was on the order of a th000 bucks a month so was he was it a business like what it was the creator of pix Ray like it was he generated nft art and so he like made a bunch of art with these models and um was was you know selling these nfts effectively and I think lots of people in his community were doing similar things and like he then refer to other people who were also generating uh generating nft generative models and that was like the start of like uh that was the start of start of our API business yeah and then we then we like made an official API and actually like added some some billing to it uh so it wasn't just like a fixed fee and yeah and now people think of you as the host and models API business yep exactly and and and but that just turned out to be our business you know but was what what ended up being beautiful about this is it was really fulfilling like the original goal of what we wanted to do is that we wanted to make this research that people were making accessible to like other people and for it to be used in the real world and this was like the just like ultimately the right way to do it because all of these people making these generative models could publish them to replicate and they wanted a place to publish it and software Engineers you know like myself like I'm not a machine learning expert but I want to use this stuff uh could just run these models with a single code and we thought oh maybe the docker image is enough but it's actually super hard to get the docker image running on a GPU and stuff so it really needed to be the hosted API for this to work and to make it accessible to software engineers and we just like wound our way to this this two years to the first customer yeah exactly um did you ever think about becoming mid Journey during that time you have like so much interest in in my generation it's I mean you're doing fine to for the record but you know it was right there you were playing with it yeah I don't I don't think it was our expertise like I think our expertise was Dev tools rather than like mid journey is almost like a consumer product you know yeah um so I don't think it was our expertise uh it certainly occurred to us um I think at the time we were thinking about like oh maybe we could hire some of these commun people in this community and make great models and stuff like this but just ended up our we we ended up more being at the tooling like I think like before I was saying like I'm not really a researcher but I'm more like the tool Builder behind the scenes and I think both me and Andreas are like that yeah yeah I think this is a also like a illustration of the tool Builder philosophy something where you very you latch on to in depth tools which is when you see people behaving weird it's not your it's not their fault is yours like you you and any youone to pave the cow paths is what they say right like the The Unofficial paths that people are making like make it official and make it easy for them and then maybe charge a bit of money mhm y um and now fast forward a couple years you have two 2 million developers using replicate maybe more that that was the last public number that I found 2 million I think that got mangled actually by it's 2 million users not all those people are developers but a lot of them are developers yeah um and then 30,000 ping customers was the number um that's that's awesome U Laden space runs on replicate so we have small podcaster and we host uh transtion whisper diarization on on replicate um so and we're paying so we're lat in space this in the 30,000 ni um you raised a $40 million csb um I would say that maybe the stable diffusion time August 22 was like really when the company started to to break out um tell us a bit about that and the community that came out and I know now you're expending beyond just uh image generation yeah this like I think we kind of set ourselves like we saw that was this really interesting generative image world going on so we kind of you know like we we're building the tools for that Community already really and we knew stable diffusion was coming out we knew it was a really exciting thing you know it was the best the best generative image model so far I think the thing we didn't we underestimated was just like what an inflection point it would be where it wasn't it was I think I think Simon Willison put it this way where he said something along the lines of it was a model that was open source and tinker and like good enough that it was just like it was it was you know it was just good enough and open source and Tinker bable such that it just kind of took off in a way that none of the models had before and like what was really neat about stable diffusion is is it was open source so you could like compared to like Dary for example which was like sort of equivalent quality you it was open source so you could Fork it and Tinker on it and like the first week we saw like people making animation models out of it we saw people make like game texture models that like use circular convolutions to make repeatable textures we saw what else did we see um you know a few weeks later like people were fine-tuning it so you could make put your face in these models and um all of these other tal inversion yep yeah exactly that happened a bit before that and all of this sort of innovation was happening all of a sudden and people were publishing on replicate because you could just like publish arbitary models and replicate so we had this sort of supply of like interesting stuff being built but because it was a sufficiently good model um there was also just like a ton of people building with it they were like oh we can build products with this thing and this was like about the time where people were starting to get really interested in AI so like tons of product Builders wanted to build stuff with it and we were just like sitting in there in the middle as like the interface layer between like all these people wanted to build and all these like machine learning experts who were building cool models um and that's like really where it took off we were just sort of incredible Supply incredible and we were just like in the middle um and then yeah since then then we've just kind of grown and grown really and we we you know been building a lot for like the Indie hacker Community these like individual tinkerers but also startups and a lot of large companies as well who are sort of exploring and building AI things and then kind of the same thing happened like middle of last year with language models and llama 2 where the same kind of stable diffusion effect happened with with llama and llama 2 was like our biggest weaker growth ever cuz like tons of people wanted to Tinker with it and run it and you know since then we just been seeing a ton of growth in language models as well as image models and uh yeah we're just kind of riding a lot of the the interest that's going on in Ai and all the people building an AI you know that's uh yeah Kudos right place right time but also you know took a while to position for the for the right uh place before the wave came um I I'm I'm curious if like um you have any insights on these different markets um so Peter levels notably very loud person uh very picky about his tools um I wasn't sure actually if he used you he does because you cited him you cited him on your series B blog post and Danny postma as well his competitor um all all in that wave um what are their needs versus um you know the more Enterprise or B2B type needs did you did you come to a decision point where you're like okay you know how serious are these Indie hackers versus like the actual businesses that are bigger and perhaps better customers because they're less try they're surprisingly similar okay because I think a lot of people right now want to use and build with AI but they're not AI experts and they're not infrastructure experts either so they want to be able to use this stuff without having to like figure out all the internals of the models and you know like touch pie torch and whatever and they also don't want to be like setting up and building up servers um and that's the same all the way from like Indie hackers just getting started because like obviously you just want to get started as quickly as possible all the way through to like large companies who want to be able to use this stuff but don't have like all of the experts on stuff you know um like I think some some companies are quite you know companies big companies like Google and so on that do actually have a lot of experts on stuff but the vast majority of companies don't and they're all software Engineers who want to be able to use this AI stuff but they just don't know how to use it and it's like you really need to be an expert and it takes a long time to like learn the skills to be able to use that so they're surprisingly similar in that sense um and I think I think it's kind of also unfair of like the Indie Community like surpris they're not churning surprisingly or churny or spiky surprisingly like they're building real established businesses which is like kudos to them like of like building these like really like large sustainable businesses often just as like solo developers uh and it's kind of remarkable how they can do that actually and it's credit to a lot of their like their product skills and you know we're just like there to help them being like their machine learning team effectively uh to help them use all of this stuff um so we're actually making some like like a lot of these Indie hackers are some of our largest customers like alongside some of our biggest customers that you would think would be would be would be uh would be uh you know spending a lot more money than them but yeah and we should name some of these you have them on your landing page you have BuzzFeed you have unsplash uh character AI um how like what do they power what can you say about their their usage yeah totally it's it's kind of a various things I'm trying to think [Music] um let me actually think what can I say about what customers well I mean I'm naming them because they're on your landing page so so you have logo rights yeah um it's it's useful for people to like I'm not imaginative I I see monkey see monkey do right like if if I see someone doing something that I want to do then I'm like okay replicate is great for that yeah yeah yeah so that's what I think about case studies on company landing pages is it's just a way of explaining like y we we this is something that we are good for yeah yeah totally it I mean it's these companies are doing things all the way up and down the stack at different level of sophistication so like unsplash for example they they actually they actually publicly posted this story on Twitter where they're using uh blip to annotate all of the images in their catalog so you know they have lots of imag in the catalog and they want to create a text description of it so you can search for it um and they're annotating lies with you know the Shelf open source model you know we have this big library of Open Source models that you can run and you know we got lots of people are running these open source models off the shelf and then you know most of our larger customers are doing more sophisticated stuff so they're like fine-tuning the models they're running completely custom models on us and so a lot of these a lot of these larger companies are like using us for a lot of their their you know inference but it's like a lot of custom models and they like writing the python themselves because they've got machine learning experts on team on the team and they're using us for like you know their inference infrastructure effectively um so it's like lots of different levels of sophistication where like some people are using these off the shaft models some people are fine-tuning models so like level P level is a great example where a lot of his products are based off like fine-tuning fine-tuning image models for example and then we've also got like larger customers who are just like using us as infrastructure effectively as as servers um so yeah like all things up and down up and down the stack yeah um let's talk a bit about Cog and the the technical layer so there are a lot of GPU clouds uh I think people have different pricing points and I think everybody tries to offer a different developer experience on top of it which then lets you charge a premium why did you want to create Cog what were some of the you worked at dogger what were some of the issues with traditional container run times um and maybe yeah what what were you surprised with as you built it Cog came right from the start actually when we were thinking about this this you know evaluation the sort of benchmarking system for machine learning researchers where we wanted researchers to publish their models in a standard format that was guaranteed to keep on running that you could replicate the results of like that's where the name came from and we realized that we needed something like Docker to make that work you know um and I think it was just like natural from my point of view of like obviously that should be open source that we should try and like create some kind of open standard here that people can share because if more people use this format then that's great for everyone involved um you know I think I think the magic of Docker is not really in the software it's just like the standard that people have agreed on like here are a bunch of keys for a Json document basically and um you know that was the magic of like the metaphor of real containerization as well it's not the containers that are interesting it's just like the size and shape of the damn box you know right yeah and it's similar thing here where really we just wanted to get people to agree on like this is what a machine learning model is this is this is how a prediction works this is what the inputs are this is what the outputs are so Cog is really just a Docker container that attaches to a Cuda device if it needs a GPU that has a open API specification as a label on the docker image and the open API specification defines the interface for the machine learning model like the the um the inputs and outputs effectively or the the prams in machine learning terminology um and you know we just tried wanted to get people to kind of agree on this thing and it's like general purpose enough like we weren't saying like some of the existing things were like at the graph level but we really wanted something general purpose enough that you could just put anything inside this and it was like future compatible and it was just like arbitrary software and you know be future compatible with like future inference servers and future machine learning model formats and all this kind of stuff yeah um so that was the intent behind it and you know for it just came naturally that we wanted to Define this format and and that's been really working for us like a bunch of people have been using Cog outside of replicates which is kind of our original intention like this should be how machine Lear packaged and how people should use it like it's common to use Cog in situations where like maybe they can't use the SAS service because I don't kn

Original Description

Replicate is one of the most popular AI inference providers, reporting over 2 million users as of their $40m Series B with a16z. But how did they get there? Ben Firshman, CEO of Replicate, came on the pod to talk about why ML researchers are bad users, going through YC during COVID, how AI art communities influenced their product roadmap, and where Replicate is going next. Full show notes: https://latent.space/p/replicate Timestamps: 00:00:00 Introductions 00:01:22 Low latency is all you need 00:04:39 Evolution of CLIs 00:06:47 How building ArxivVanity led to Replicate 00:13:13 Making ML research replicable with containers 00:19:47 Doing YC in 2020 and pivoting to tools for COVID 00:23:11 Launching the first version of Replicate 00:29:26 Embracing the generative image community 00:31:58 Getting reverse engineered into an API product 00:35:54 Growing to 2 million users 00:39:37 Indie vs Enterprise customers 00:42:58 How customers uses Replicate 00:44:30 Learnings from Docker that went into Cog 00:52:24 Creating AI standards 00:57:49 Replicate's compute availability 01:02:38 Fixing GPU waste 01:10:58 What's open source AI? 01:15:19 Building for AI engineers 01:17:33 Hiring at Replicate

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Latent Space · Latent Space · 19 of 60

← Previous Next →

Ep 18: Petaflops to the People — with George Hotz of tinycorp

Ep 18: Petaflops to the People — with George Hotz of tinycorp

FlashAttention-2: Making Transformers 800% faster AND exact

FlashAttention-2: Making Transformers 800% faster AND exact

RWKV: Reinventing RNNs for the Transformer Era

RWKV: Reinventing RNNs for the Transformer Era

Generating your AI Media Empire - with Youssef Rizk of Wondercraft.ai

Generating your AI Media Empire - with Youssef Rizk of Wondercraft.ai

RAG is a hack - with Jerry Liu of LlamaIndex

RAG is a hack - with Jerry Liu of LlamaIndex

The End of Finetuning — with Jeremy Howard of Fast.ai

The End of Finetuning — with Jeremy Howard of Fast.ai

Why AI Agents Don't Work (yet) - with Kanjun Qiu of Imbue

Why AI Agents Don't Work (yet) - with Kanjun Qiu of Imbue

Powering your Copilot for Data - with Artem Keydunov from Cube.dev

Powering your Copilot for Data - with Artem Keydunov from Cube.dev

Beating GPT-4 with Open Source Models - with Michael Royzen of Phind

Beating GPT-4 with Open Source Models - with Michael Royzen of Phind

The State of Silicon and the GPU Poors - with Dylan Patel of SemiAnalysis

The State of Silicon and the GPU Poors - with Dylan Patel of SemiAnalysis

The "Normsky" architecture for AI coding agents — with Beyang Liu + Steve Yegge of SourceGraph

The "Normsky" architecture for AI coding agents — with Beyang Liu + Steve Yegge of SourceGraph

The AI-First Graphics Editor - with Suhail Doshi of Playground AI

The AI-First Graphics Editor - with Suhail Doshi of Playground AI

The Accidental AI Canvas - with Steve Ruiz of tldraw

The Accidental AI Canvas - with Steve Ruiz of tldraw

The Origin and Future of RLHF: the secret ingredient for ChatGPT - with Nathan Lambert

The Origin and Future of RLHF: the secret ingredient for ChatGPT - with Nathan Lambert

The Four Wars of the AI Stack - Dec 2023 Recap

The Four Wars of the AI Stack - Dec 2023 Recap

The State of AI in production — with David Hsu of Retool

The State of AI in production — with David Hsu of Retool

Building an open AI company - with Ce and Vipul of Together AI

Building an open AI company - with Ce and Vipul of Together AI

Truly Serverless Infra for AI Engineers - with Erik Bernhardsson of Modal

Truly Serverless Infra for AI Engineers - with Erik Bernhardsson of Modal

A Brief History of the Open Source AI Hacker - with Ben Firshman of Replicate

A Brief History of the Open Source AI Hacker - with Ben Firshman of Replicate

Open Source AI is AI we can Trust — with Soumith Chintala of Meta AI

Open Source AI is AI we can Trust — with Soumith Chintala of Meta AI

Making Transformers Sing - with Mikey Shulman of Suno

Making Transformers Sing - with Mikey Shulman of Suno

A Comprehensive Overview of Large Language Models - Latent Space Paper Club

A Comprehensive Overview of Large Language Models - Latent Space Paper Club

Why Google failed to make GPT-3 -- with David Luan of Adept

Why Google failed to make GPT-3 -- with David Luan of Adept

Personal AI Meetup - Bee, BasedHardware, LangChain LangFriend, Deepgram EmilyAI

Personal AI Meetup - Bee, BasedHardware, LangChain LangFriend, Deepgram EmilyAI

Supervise the Process of AI Research — with Jungwon Byun and Andreas Stuhlmüller of Elicit

Supervise the Process of AI Research — with Jungwon Byun and Andreas Stuhlmüller of Elicit

Breaking down the OG GPT Paper by Alec Radford

Breaking down the OG GPT Paper by Alec Radford

High Agency Pydantic over VC Backed Frameworks — with Jason Liu of Instructor

High Agency Pydantic over VC Backed Frameworks — with Jason Liu of Instructor

This World Does Not Exist — Joscha Bach, Karan Malhotra, Rob Haisfield (WorldSim, WebSim, Liquid AI)

This World Does Not Exist — Joscha Bach, Karan Malhotra, Rob Haisfield (WorldSim, WebSim, Liquid AI)

LLM Asia Paper Club Survey Round

LLM Asia Paper Club Survey Round

How to train a Million Context LLM — with Mark Huang of Gradient.ai

How to train a Million Context LLM — with Mark Huang of Gradient.ai

How AI is Eating Finance - with Mike Conover of Brightwave

How AI is Eating Finance - with Mike Conover of Brightwave

How To Hire AI Engineers (ft. James Brady and Adam Wiggins of Elicit)

How To Hire AI Engineers (ft. James Brady and Adam Wiggins of Elicit)

State of the Art: Training 70B LLMs on 10,000 H100 clusters

State of the Art: Training 70B LLMs on 10,000 H100 clusters

The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka

The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka

Training Llama 2, 3 & 4: The Path to Open Source AGI — with Thomas Scialom of Meta AI

Training Llama 2, 3 & 4: The Path to Open Source AGI — with Thomas Scialom of Meta AI

[LLM Paper Club] Llama 3.1 Paper: The Llama Family of Models

[LLM Paper Club] Llama 3.1 Paper: The Llama Family of Models

Synthetic data + tool use for LLM improvements 🦙

Synthetic data + tool use for LLM improvements 🦙

RLHF vs SFT to break out of local maxima 📈

RLHF vs SFT to break out of local maxima 📈

The Winds of AI Winter (Q2 Four Wars of the AI Stack Recap)

The Winds of AI Winter (Q2 Four Wars of the AI Stack Recap)

Segment Anything 2: Memory + Vision = Object Permanence — with Nikhila Ravi and Joseph Nelson

Segment Anything 2: Memory + Vision = Object Permanence — with Nikhila Ravi and Joseph Nelson

Answer.ai & AI Magic with Jeremy Howard

Answer.ai & AI Magic with Jeremy Howard

Is finetuning GPT4o worth it?

Is finetuning GPT4o worth it?

Personal benchmarks vs HumanEval - with Nicholas Carlini of DeepMind

Personal benchmarks vs HumanEval - with Nicholas Carlini of DeepMind

Building AGI with OpenAI's Structured Outputs API

Building AGI with OpenAI's Structured Outputs API

Q* for model distillation 🍓

Q* for model distillation 🍓

Finetuning LoRAs on BILLIONS of tokens 🤖

Finetuning LoRAs on BILLIONS of tokens 🤖

Cursor UX team is CRACKED 💻

Cursor UX team is CRACKED 💻

Choosing the BEST OpenAI model 🏆

Choosing the BEST OpenAI model 🏆

How will OpenAI voice mode change API design?

How will OpenAI voice mode change API design?

STEALING OpenAI models data 🥷

STEALING OpenAI models data 🥷

[Paper Club] 🍓 On Reasoning: Q-STaR and Friends!

[Paper Club] 🍓 On Reasoning: Q-STaR and Friends!

[Paper Club] Writing in the Margins: Chunked Prefill KV Caching for Long Context Retrieval

[Paper Club] Writing in the Margins: Chunked Prefill KV Caching for Long Context Retrieval

The Ultimate Guide to Prompting - with Sander Schulhoff from LearnPrompting.org

The Ultimate Guide to Prompting - with Sander Schulhoff from LearnPrompting.org

llm.c's Origin and the Future of LLM Compilers - Andrej Karpathy at CUDA MODE

llm.c's Origin and the Future of LLM Compilers - Andrej Karpathy at CUDA MODE

Prompt Engineer is NOT a job 📝

Prompt Engineer is NOT a job 📝

Prompt Mining LLMs for better prompts ⛏️

Prompt Mining LLMs for better prompts ⛏️

The six pillars of few-shot prompting 🔧

The six pillars of few-shot prompting 🔧

Language Agents: From Reasoning to Acting — with Shunyu Yao of OpenAI, Harrison Chase of LangGraph

Language Agents: From Reasoning to Acting — with Shunyu Yao of OpenAI, Harrison Chase of LangGraph

[Paper Club] Who Validates the Validators? Aligning LLM-Judges with Humans (w/ Eugene Yan)

[Paper Club] Who Validates the Validators? Aligning LLM-Judges with Humans (w/ Eugene Yan)

Can you separate intelligence and knowledge?

Can you separate intelligence and knowledge?

The video discusses the history and development of Replicate, an AI inference provider, and covers topics such as CLI design principles, research papers, and machine learning models. The company's experience with YC and its evolution into a platform for researchers and developers to share and publish models are also discussed.

Key Takeaways

Build a CLI with low latency and physical feedback
Design a state machine for resolving preconditions
Use containers to package machine learning models
Create a platform for researchers to publish and share models
Apply to YC and start building a prototype
Use Docker to package and run models
Fine-tune models for specific tasks

💡 The video highlights the importance of building software tools with low latency and physical feedback, and the need for a standardized way to compare and trust results from different models.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Reading ML Papers

View skill →

Automatic Literature Review with GPT-3 - I embedded and indexed all of arXiv into a search engine!

Automatic Literature Review with GPT-3 - I embedded and indexed all of arXiv into a search engine!

Marcos Lopez Caniego - ESASky's JupyterLab widget| JupyterCon 2020

Marcos Lopez Caniego - ESASky's JupyterLab widget| JupyterCon 2020

Obsidian Zotero Integration Plugin | Streamline Your Research Paper Workflow 📝️

Obsidian Zotero Integration Plugin | Streamline Your Research Paper Workflow 📝️

This FULLY FREE Research Agent can BUILD Reports in Minutes!!!

This FULLY FREE Research Agent can BUILD Reports in Minutes!!!

Claude 3.7 Sonnet API | Build a Research Assistant

Claude 3.7 Sonnet API | Build a Research Assistant

I Built An Obsidian AI Research Assistant with Oz...

I Built An Obsidian AI Research Assistant with Oz...

Related AI Lessons

Docker Explained: From “What Even Is This” to Deploying a Full-Stack App

Learn Docker fundamentals and deploy a full-stack app with this beginner-to-advanced guide

Medium · DevOps

I Used to Pay for Cloud Servers. Then I Found a Way to Run One Free, 24/7

Learn how to run a cloud server for free, 24/7, and overcome hosting cost limitations for automation ideas

KEDA 2026: Event-Driven Autoscaling Patterns That Shrank Our AWS Bill by 40%

Learn how to apply event-driven autoscaling patterns using KEDA to reduce cloud costs by 40%

Medium · DevOps

AWS CloudFormation and CDK Explained: Infrastructure as Code on AWS

Learn how to use AWS CloudFormation and CDK for Infrastructure as Code on AWS to streamline your deployment process

Medium · DevOps

Chapters (19)

Introductions

1:22 Low latency is all you need

4:39 Evolution of CLIs

6:47 How building ArxivVanity led to Replicate

13:13 Making ML research replicable with containers

19:47 Doing YC in 2020 and pivoting to tools for COVID

23:11 Launching the first version of Replicate

29:26 Embracing the generative image community

31:58 Getting reverse engineered into an API product

35:54 Growing to 2 million users

39:37 Indie vs Enterprise customers

42:58 How customers uses Replicate

44:30 Learnings from Docker that went into Cog

52:24 Creating AI standards

57:49 Replicate's compute availability

1:02:38 Fixing GPU waste

1:10:58 What's open source AI?

1:15:19 Building for AI engineers

1:17:33 Hiring at Replicate

Containers on Amazon ECS with Mama J