No Priors Ep 56 | With Baseten CEO and Co-Founder Tuhin Srivastava

No Priors: AI, Machine Learning, Tech, & Startups · Beginner ·📅 Project Management ·2y ago

Skills: AI Systems Design90%AI Startup Building80%

Key Takeaways

The video discusses the importance of speed in AI computing, with Baseten CEO Tuhin Srivastava emphasizing that it is a key advantage for early-stage companies, and explores various tools and techniques for achieving fast and scalable AI infrastructure, including Base 10, GPU clusters, and Nvidia LLM engine called trt LM.

Full Transcript

[Music] hi listeners welcome to another episode of no priors today alad and I are catching up with toin rava the CEO and co-founder of base 10 which gives teams fast scalable AI infrastructure starting with inference they're one of the players at the center of the battle heating up around AI Computing welcome to him hi thanks for having me good to see you guys let's start at the beginning for any listeners who don't know uh what is base 10 and how did you start working on it base 10 is is a infrastructure product so we provide fast scalable AI infrastructure for engineering teams working with large models currently we're focused on inference and we want to do a lot more after that but you know for the past say four and a half years actually oh that's a long time for for last four and a half years we've been you know cutting our teeth and uh trying to build this thing um I think it's been pretty rewarding over the last 12 months seeing the market kind of show up and you know everyone get equally excited about a infrastructure um we started this Hest because firstly you know we thought ml was pretty cool in in 2019 we thought it was gone somewhere um and we wanted to build a PIX and shovel business and kind of solve the problems that we were running into um I think the side note here is that I wanted to S accompany with my friends you often say that base 10 isn't no code it's efficient code like why does that difference matter that wasn't always the case I'd say I'd say like you know those times when we had Elements which were definitely a bit no Cody I think what we've learned over the last three or four years years is you know um code is just incredibly powerful and Engineers want to write code even in its best form you know you want to build like really really tight abstractions but I think the the ability to turn the knobs under the hood is is very very important I think no code kind of makes that a lot harder I don't think it removes it but it makes it a lot harder so what we what we do is just build um very strong intuitive abstractions that try to make the easy things super easy and but still make the hard things possible so you you can get a lot of value um really quickly but I say unlike a lot of other infrastructure products that' been built over the last 10 years we're trying to solve against the graduation problem which is that you know that we're able to support teams as they grow in scale and just to sort of make it a little bit more versal for our listeners like what what are the types of applications that run on base 10 like what's the scale of the platform do you have a favorite application everything from you know tiny side projects on weekend all the way to companies that are pretty AI native you know we've we've supported Foundation model companies we work with companies like descript where AI is very very Coe to the product experience we power a lot of AI features that patreon is shipped um but I'd say some of the more interesting use cases from our perspective actually or my perspective at least are either you know the really small teams that we're giving a lot of Leverage to um so that they can ship things very quickly so a really good example of that might be a company like plan AI um which is basically building a SDK for coal centers um is how I describe it but you know they're able to ship models and um you know collocate workloads um so that they can get you know sub 300 millisecond or sub 200 millisecond um responses um without you know months and months of infrastructure effort I think on the other hand it's like I'm it's really exciting to see um you know companies become AI enabled um because that's where we see a lot of the value is going to be over the next decade is you know at a company like picnic Health which has actually been around for a decade um and starting to do a very very interesting thing with the CPUs of data that they've gathered over the last 10 years and like supporting those use cases um I think their model is called picnic GPT which extracts information from medical records and you know to me that's those are the really exciting use cases where you know you're get you're giving leverage to companies that are good at like they're good at the Domain that um they are working in um the model might be proprietary the data might be proprietary but the infrastructure doesn't necessarily need to be proprietary and we can give them um we can give them um just an easy way to deploy that stuff without many many people months it's become like in in in Vogue to compare the size of your GPU cluster like people are spending a lot of money on gpus we hear about 600,000 h100 equivalents and lots of venture rounds being raised often to train you know large models in some domain or another or even you know more and more expensive posttraining are are training and inference workloads different yeah I think so I think they just have very different um almost like slas for the customer like you know the things that matter for um inference uh things like you know your your clusters somewhat collocated with where you're doing your work um whereas whereas training and stuff like that matters a bit less it doesn't really matter where that training is happening as long as all your gpus are somewhat together um you know even the GPU classes themselves like you know the for training um networking is a very very important piece to have networking on the racks themselves where influen matters a little less because you're doing a a little bit more on individual gpus and less of across gpus I think from a user perspective there's a lot more workflow I'd say in inference that's um repeated across customers as opposed to you know you guys work with a bunch of companies training models you know the stateof the-art really is um give me some SSH keys um and let let me go at it um whereas with inference is definitely um repeated workflow where people are trying to get similar things out of the inference infrastructure whether that be work whether that be you know version management whether that be the way they deployed hooking it up into cicd um you know cold starts and so on and so forth so I'd say it seems a bit more repeatable today how the problem um is being solved by customers I'd say the hardware requirements are quite different they're probably a little less um to to some degree um but I think you know resiliency and reliability matters a lot more you know you know downtime is unacceptable from an INF perspective um notes get terminated all the time um from a training perspective you were quite early to this market and I think you folks pioneered a lot of the sort of early ml uh sort of infrastructure for these sorts of use cases and applications what has been the most surprising thing or what did you least expect relative to how things evolved I think I can answer that question the two different altitudes I think you can answer that question from like a market perspective which is that you know I think El you have some old writing which is you know markets basically markets or all that matter um I think we've felt that very viscerally um in some sense which is that you know you can build all this cool stuff and then when markets show up um you feel that and that really pushes you know the customer forward um and the needs for your product for I I think so that's one thing which is like the acceleration we th we saw through the end of 202 uh and in 2023 um you know definitely took us by surprise like if I I can I can be really honest and say that from 2019 to 2022 was was pretty quiet you know we had we had happy customers but you know the demands weren't necessarily there um I think from like a practitioner perspective um how fast some of these teams move has really shocked me which is like I I think what is really clear in um Ai and early stage and in general and like I think the Enterprises are waking up to realize this right now is that speed is actually your number one advantage um things are moving so fast that you know there is um if you're not if you're not competing on speed um you're going to be left behind um and so there's actually a lot of propensity to buy versus build I'd say which is like people people are happy to you know um to buy technology where I say in the past people were pretty hesitant um to buy infrastructure we we talk with companies all the time where where we think that oh you know they probably have something built out where you know they're a lot less sophisticated than you think and they're handling a lot more scale so they need to be able to um have the infrastructure to support that so I think that's that's probably one of the larger things that we've be surprised by which is you know how fast people need to move to be relevant um I think the other thing which is just like how GPU needs um have evolved um going from you know at the end of 2022 most of our customers were using T and atgs um you know after that it changed to a100 now it's going to h100s like the compute needs aren necessarily going down um they're really going up especially as these Services scale up you guys were just highlighted for um leading on the independent artificial analysis. benchmarks for highest throughput and lowest latency serving congrats can you give us some intuition for what is driving that like you know what makes inference harder to hard to run fast inference is you know I think there's like multiple things which are quite Difficult about inference um you know I think the there's the workflow headaches which we've talked about a bit and I could talk more about there like the scalability and reliability Bott next I think the stuff you're talking about is performance optimization which is really like how long does one generation take um I think you know there's a lot lot of work and research that's being done to run Generations as fast as possible to get the maximum through put and the minimal latency um I think um historically historically well it's funny when I say historically because I mean over the last six months but over the last six months over the last six months a lot of work has been done in the research Community to get basically these things to move faster stuff like speculative decoding which um came out I think can't even remember when it came out sometime the last six months um it started really being used um for for us you know what it means is like how much how much uh how well you can use the GPU how you can scale across multiple gpus um and how you can honestly like be really really um up to date with the latest things that are happening in open souls and research um you know we've partnered with a company with with Nvidia with a company we have partnered with Nvidia and and and really like worked really closely with their L um llm engine called trt LM um and that's that's actually driven a lot of the performance games that we've worked with and you know we've contributed to that we've fored that um but you know the the hard thing there a lot of the optimization you're doing is pretty low level um and there's no real abstraction so you either have to learn how to use open SCE very well rewrite some of these kernels by yourself um you know if you look at something like opening eye like what do people complain about a lot of the time it's speed um it's speed and like that's probably you know one of the core performance advantages of Open Source is that you can get these smaller smaller um smaller models to run faster um and I think you know that will continue to be a massive Focus for us going forward as well um on on the um on the benchmarks I think um you know it's pretty crazy how that's evolved as well I think you know we've gone from like state-ofthe-art being 90 tokens a second then you know it got over 100 now it's over 200 now we're talking for some people up to 300 400 um and I think that's continue that's going to continue to be um like a very very important um place to innovate and we we think over time it will get somewhat commoditize the performance that especially especially for language models to be honest I think you know more and more of that stuff should run locally um to to some degree I think um and but um being on top of it and making sure that we're we're kind of attached to the state-ofthe-art is you know a if if we're not existential risk to the business and so you know we H we have to do it how much optimization have you been seeing for other types of models so diffusion models some of the language the Texas speech models you know other areas like that I'm just sort of curious there seems like there's different types of optimizations happening across different Foundation model types as well so I was curious you know what's what state-ofthe-art there and how you're thinking about it I don't have the metrics like on hand but we're do we're seeing we we're seeing and um you know also pushing limits there as well so just yesterday for example um you know we were able to get Wisper running so there's whisper there's faster whisper and there whisper on trt which again is an Nvidia thing I I think what what we are seeing is that there's more and more focus on bringing these experiences to real time as possible um and so like you know one of our customers is a company could gamma AI storytelling storytelling software they use staple Fusion image models to generate images um you know getting that not waiting four or five seconds there and still having high quality images is again core to their business and making it very very fast and um and easy to use and I think um you know we are seeing like definitely from a customer requirement perspective we're seeing that I think you know have we been able to juice as much there not yet but I think we're getting that speaking of the applications uh uh you know driven by these models still being generally startlingly slow including the like really amazing capable ones from chat gbt to cognition to um you know things like like Pika and mid journey and like the you know in a in a way that consumers have not seen in many years we are waiting you know seconds for um for interactions um is your view of that like it'll change because the models are getting smaller it'll change because more um smaller models will get more powerful people do distillation people just get better at running these things like we'll get better Hardware what's the what's the pass to like not waiting 10 seconds a generation 2 minutes a generation you know if you look at so many of the gains of running mrw F let's take that as an example um you know the step function gains come from you know a few things like first one is running on h100 and not running on a100 so I think you know it's Hardware gets better and better like you get this almost like leg up um and so as as you know hopefully those prices go down and that gets more available that'll be thing hopefully the h200 comes up next and we do that I think the next piece is around um software optimization so stuff like continuous batching um and dynamic batching um and speculative decoding which basically makes it um easier to either paralyze or batch process a bunch of things or makes each individual generation faster or off offsets it offloads it to another model there's lots of stuff in that place I think these models also just going to get smaller to be honest like I I think that's like the the really really powerful small model um that does one thing um that's um pretty exciting um I think is it um I think that's like stuff like Obama's very interesting but I I think if you saw um soulcraft has this um where they they basically announced that Cody's now running locally for a lot of customers I think you know that's a pretty actually exciting proposition we have to figure out where we S award like that it's not necessarily you know amazing for cloud providers like ourselves because you know we want everything to run on but I think things do get smaller things do get more efficient and we have more distilled more powerful model sharper models to do um yeah it's it's it's un the unbundling of models how how and when do customers choose to deploy their own models or open source models or fine-tuned open source models on their own infra versus use public model endpoints or like what guidance would you give people this is a general Trend we right which is you got open a you got anthropic you have an API that works really quickly you're like that's too slow all that's too expensive or you know I I don't need something that powerful um so we often see like if if customers have a lot of money they end up going paying for private deployments in Azure they still have the stun on this issue then they go to open source models when you're starting out with open source models you're going to go to a shared endpoint um there's lots of great shared endpoint um providers um but there are things that Ma that might matter to you which um that Shar endpoint providers can't give you firstly you know you want your own slas so you don't want kind of this noisy neighbors problem that when my if me and elad have two competing apps and thead apps get slammed I don't want my my app to slow down my model calls still need to be F so in that case you want might want might want dedicated compute um but there's also like data and privacy stuff maybe you want you don't want you know your data running on the same infrastructure data going through the same infrastructure that other folks um infrastructure is going to and honestly maybe even at some scale cheaper to run it yourself um and they're the three things I think then when you go to larger companies comes no BR like shars don't really work um for large companies not going to work they're not definitely not going to work for Enterprises a lot of times you're going to want a um you might want mrra 7B not coming off a shared endpoint provider maybe not even on a dedicated endpoint you might want it self hosted within your within your own AWS and gcp um and that's kind of actually where we see a lot of the world going as well which is that especially larger customers they're going to have actually pretty good compute deals um and they're going to have their own spend um you know their own uh credit system or spend commit with the marketplaces um and actually running it on the infrastructure um solves a lot of problems and has a massive cost advantage to them um as well and so I think there's like three stages to it which is like you started shedding from certain point you go to Dedicated in the cloud but I think for some customers that's not enough either and you want to go um into your own cloud one prediction on the Enterprise side would be that you know if chat GPT only launched 15 16 months ago and GPT 4 just came out a year ago then most Enterprises are still in a planning cycle and they haven't really adopted AI at any real scale which means for infrastructure providers like Bas 10 it's like a huge opportunity that's about to come right already you're crusting this giant wave and the wave is about to get 10 times bigger potentially what do you view as the timeline for really Mass scale Enterprise adoption of AI and where do you think things will be in terms of order of magnitude usage overall a year from now two years from now I'm just sort of curious like what your view of that future is I I think it's a good question I think we've been so wrong with time with time scales here but I do think like what we see it right now is that when we go to talk to Enterprises um I say a lot of them you know like for honestly um what we're seeing now is that like co-pilots especially um Coen stuff um that actually already made its way in the Enterprise like most Enterprises we talk to like when you say when you ask them how advanced are you in the AI strategy they'll tell you well everyone uses co-pilot and that's like the big first big Fay I think the next piece then is like using like openi or or anthropic um in some in some way well um I think that is going on right now and I think people are starting to experiment with that I I think like the fear I would say like I someone was just telling me that I think it was fisa e mo like tens of millions of dollars for ML investment or AI investment over the next 12 to 18 months that's kind of frightening to me in some degree like I think it's great it's great for us you know I I I love to hear it as a business Builder here but it's also like you know to me that means that the the pressure I see coming from the above and um and you know that's kind of like the I'd say the ml trap we we fell into in 2018 to 2020 when cios were buying software and it wasn't really attached to real user value or product value and so like I would actually say that we're probably overestimating how big Enterprise will get in the next 12 to 18 months but we're underestimating where it will be in you know three to five years from now and I I think 10x is like a pretty massive um underinvestment like you know like yeah like I'd say what we see like we're working with a customer right now that has four engineers and has by the end of this year we'll have you know mid hundreds of thousands of dollars I annual spend um you know that's for one use case with sub thousand users um for them and that's and they're already cash flow positive as a business which is like which is insane to me um in general but like I can't even imagine um what these workloads are going to look like um when we get to the Enterprise when you start to think about you know take a like customer service and chat bot is probably like the number one place where people think about um efficiency um the volume that some of these customers is um you know my brother's the head of AI is Sunrun which is like a public company um with that does solar panels and I think that's like another really good example of a company where there's so much opportunity for so much opportunity for AI just to eat away processes um and the volume is you know just so much higher than um what we're thinking from like a traditional business perspective with harder requirements so which even drives spending High yeah I think one like um reset in stance that people should have on on spending here is that traditionally like you know people were um looking at software companies you got really uh concerned as a software investor if your cost of goods sold was affected by a lot of like data processing basically and so you know you have this like expectation that your average SAS business at maturity might have 80% gross margins and um and and I I think like you know now people understand that the the training businesses have a big upfront capex investment that you know may or may not pay off but I I think one of the things that you're pointing out is that you can actually spend a lot on the inference and the core intelligence and that actually um you know end up with a very valuable business on the other end with perhaps fewer people and so I think people have talked about that shape of company but they don't really think of it as a norm yet and you know I at least in in my portfolio we are seeing more efficiency on headcount and like a lot more compute spend and I I know for some of the base 10 customers that compute spend um for inference is actually like the you know one of the largest items on the p&l oh we were working with this customer um and uh we basically yeah and this was like a challenge for us so we we asked for an upfront like payment for the year of compute and you know the the CTO and came back to us and said hey like like I I I appreciate what you're doing but no like this is you know our like after payroll this is our second biggest um expense for the year like we're not going to do that and I think that's you know somewhat indicative of like how much spend there is here is that you know there and it's it's probably also somewhat indicative of how big you know I personally think the market um can can be once we start seeing Mass scale um adoption but I do think you know I I think that's a good point um Sarah which is that you know I think it's somewhat of a reset in terms of you know I don't know if this looks like normal SAS business I actually I I know for sure it does not I and I think you know even like traditional multiples it's really really hard to think about and you know what what's crazy about it though is I think the most efficient businesses through markups and through software optimization can actually D like Drive pretty healthy margins um and still have these really aggressive consumption um contract and I think that's I know I think that's rare I I I can't think of I feel like you guys see more businesses um than I do um I hope um and then but I hope um but but but like you guys will be able to like chime in on on that more like is that is that unique to this industry where else have you seen that it's been a while since I've seen uh so many companies ramping so quickly like uh and sometimes they were fake ramps so like you know in the internet wave of the 90s it was kind of startup selling to each other and kind of bootstrapping off a venture capital and then there was giant Telecom build outs on like a fiveyear cycle that um caused huge Revenue uplift and then suddenly there was a glut and things dropped dramatically here it feels like things are ramping really fast off of products that are a couple months old which sometimes suggests that there's not defensibility um and so then the question starts to become okay how do you build defensibility and what does that mean and how do things get commoditize and do they and you know um so there's a couple different markets where suddenly you see three companies all go from Zer to five or Z to 10 million of Revenue in a year um yeah and then you're like okay there's three of these companies and they all ramped at the same time the same amount and so there's enormous demand but what does that mean in terms of do they cannibalize each other can three more entrance come in and do the same thing like what what is the basis for competition in that market and so I think there's a lot of that happening too which is at least for me pretty unexpected and I think it's just because we have such a big technology capability shift that suddenly you can do things you literally couldn't do a year ago you know it's kind of amazing I think it's particularly exciting when you go and apply that to um I think you L you've cut your teeth on a bunch of different Healthcare initiatives Healthcare is like a you know a really interesting place where like you look at you know um Nuance if you remember Nuance Technologies like you know they they had the strangle hold over this market for years and honestly it always looked like they were kind of struggling all along the way um as well and then and then whis and Whisper comes along and you know you see like that market now of like you know note taking um for um for the medical thing it's insane how fast it's gr and like there clearly is real value there um and then I think the question actually goes maybe this does with like assass business again when you're like okay what's the workflow and is the power in the sensibility of the workflow um you're powering yeah it's a really good point and then the other thing I think that people often forget is that many markets are not monopolies um many of them are oligopolies you know that's payments with stripe and Aden and um PayPal and all these things right and so um it's also possible some of these Market structures are oligopoly markets and then it's possible it actually ends up being win or take all and there's some Network effect or data effect or but if you look at some of these types of companies like healthcare to your point is a great example where it's deal driven right you have large appointments with big customers and you lock them in for multi-year deals and if you're actually able to lock down customer bases and effectively you you can fragment into an oligopoly Market more easily than if you have renewals every year right so I think also part of it is just like what's the contractual structure of a market and people really don't talk about that kind of stuff but I think it's really fascinating to think about through the lens of what actually is a sustainable business in each each one of these categories you know beyond Healthcare it's like where where are these businesses going to get disrupted um I think like Financial Services obvious one and they they've funnily enough I think they've been at the cusp of this stuff um in the past I don't actually think they were in the cusp of it um as as much as you know you'd think like I you know if if you think about um a lot of this the Big Data stuff like 10 years ago um you know the hedge funds were all over that right they're like hey there there's Alpha there's Alpha here and and I know that um you know some of them are starting to look at large models and language models and um and whatnot but I do feel like they're actually being a bit lagged um in terms of their adoption um of of these things it might actually be because they were so deep in the other sphere in the in like the old ammo world that's hard to kind of really quickly turn things around it's just such a different capability set that I think like old school machine learning or where you're just effectively doing regressions and just pulling out patterns and data is kind of different from some of the generative stuff in terms of what it does and what it can do for you and one of the things I've been thinking about recently related to what you just said is what are the companies that just don't care about this and that may be a very good thing because they're defensible right in the era of AI eating everything like what can't be eaten and therefore maybe those are really good things to get involved with or to work on because you're not threatened by a dozen different new startups that becomes really hard with like when when you start to think about some of the demos we seen couple I thought my job was safe until yesterday yes one of my partners asked me how long I thought Venture Capital was going to last in terms of like you know an agent based automation taking over um because he was like all excited that he got out of software engineering at exactly the right time before his skills became useless um but I tried to give him I tried to give him a real answer which is I think on the early stage um at the early stages a lot of the data doesn't exist right like you'd have to capture real world data you have increasingly meetings over Zoom but uh you would want to capture a lot of information about people um so much of it is access and the information about who is like leaving and like a 100x engineer and entrepreneurial and product oriented and it works with velocity like a lot of that is not collected today right so you have this big inputs and it's there's no digital Trail for it and so you have this big inputs problem I think the decisioning like if you think about like what is actually structurally um predictable uh maybe if you have all that data the like people are the most identifiable piece but the um maybe you can maybe you have a a model that is doing continuous learning and can and learn like meta structures like a lot is talking about like oh this is a market that operates as an oligopoly where these are the core um uh core drivers of you know differentiation and these the dimensions of competition and such but I think that that feels quite hard when you're investing in a technology landscape that is always changing right um and so like you're kind of always out of distribution and you don't have the date on the people and like I don't know how you make decisions on whether products are any good because you'd have to have all the customer point of view or you'd have have taste maybe models would have taste so I think Sarah is saying that her job is defensible which I think is what everybody says no no no not my job I am I'm just saying this morning I gave it a good think and I was like should we hire people to go work on this and I was like nah that doesn't feel like a tenable problem this year but I I commit to if it is feasible like we're g to be first but I just I feel like it'll be like you know another six months or so so that means it'll be next month um you're given what it's actually in about 20 minutes it was a company launching wait I gotta ask you one more question because like you you know you're working closely with Nvidia you work with the hardware providers um people are really interested in this topic now of like I I mean generally like do you believe in Hardware heterogeneity right there are some strong opinions on this from you you know data breaks and others here um uh and um do you like you know do do you still see the same Supply demand um Dynamics around GPU shortage um from your customers that you did maybe beginning of last year I think the chips have just changed so like I think there is like before there was a shortage of it just felt like of everything like if you wanted anything except a T4 you could not get it that that was that was hot I think right now um there there's two things we're seeing is that one like most of the like it is now possible for us to acquire compute for us to acquire compute pretty quickly we have big span though and so that that is like that everything should be kind of um conditioned on that is that we are you know we're making long-term commitments with providers so we're able to um that gives us negotiating power I think customers are still struggling with availability um for the most premium chips and I and I think you know whether that's h100s or 8100s I think even when there is availability you're you're often times looking for like three to six weeks of negotiating with Cloud providers um and then your rep calling in favors um in exchange for something or the other like the amount of times that with the cloud provided we have escalated conversations um just to get people moving faster is unreal and so I think customers are still running into it um I do think it's getting better and I do think that it will you know seems like it it will go away I think the heterogenity argument around different things I mean it's probably a good thing right like if there is more than one one provider of chips but but that that being said um I I personally think that it's pretty overstated how easy it is to um to run something that looks like Kudo or Cuda in some form on an AMD chip um seems Seems like a challenge to me I know a lot of people there are there are people who who believe that they've got it done I think the amount of time we we spend debugging bad notes we get um in a place where that you have a lot of information about um existing infrastructure um that's that's challenging as it is I can't imagine what that'll be on on these ships that are untested and so like I think over time yes I hope so that' be great I think shortterm um you know it's really hot for me to see how we make investments Beyond um in especially when there's a c where there's like customer a crunch on the other side from customers who are like hey we need this now and I don't think I don't think we want to the other thing we don't want to do is that you know what Bas 10 doesn't give you it doesn't give you Ro access to gpus it's conditioned on this inference problem today that you know like you if I gave you a GPU to use on base 10 there's not that much you could do on it except inference just just in terms of the access control um that we give to you um that being said like you know our customers do end up you know fiddling with NVIDIA drivers they do end up you know um like installing new versions of pytorch and um and you know have like custom Docker images um and I think running those things on things that aren't Nvidia and especially you know doing them in in an abstracted way like I think it'd be easier if I was building a service on top of these MD chips and saying take this service but you know our customers do interface with those with the GPS in some way and I think building an abstracted service where this heterogeneity it's like that just sounds very very challenging I'm sure there are people much smarter than me who could figure it out um but I think for us like I think that would just add a lot of complexity and really slow down how fast we can move but I hope I think you know there it is a good world when there is more than one option how do you see customers thinking about build versus buy and how do you think that's going to be evolving over time speed is the only thing that matters in this market and and and what what this means is that you know if if you if you spend time fiddling around with your infrastructure um and your service goes down when you launch it I think that actually hurts the end user experience a lot um and it's something you just don't want to mess around with and we see this from customers you know even customers like um where AI is their core core thing is that they are understanding that what what they what is proprietary to them is models data and workflow what is repeated for them is infrastructure and I think like um the smarter like the amount of times that we have seen in the last 12 months that we're going to build this ourselves which is very much like how infrastructure Engineers were thinking um a decade ago um only to come back three months later it's like I have a you know Docker dumpster fire um somewhere is like I you know we can't count like it's our super qualifier like have you built it yourself is how we know someone's going to be a great based in Customer because they they empathize with the pain and they know that this is going to allow them to move a lot we had we had a company with a four person um AI infrastructure team that been building this for two years um migrate all their workloads over the base 10 and 36 hours um and you know I think that is a pretty amazing case study for them which is like holy crap we can now take these four engineers and focus on what is actually our competitive differentiate Advantage um and the way we think about our business is not you know we don't need to scoop everything something off every single customer either like we offer options where you can run this in your own um in your own environment and you know pay us a license fee and you know like I think there it is very very cost effective the way that um a honestly other providers are doing this like I think it's crazy to be honest to try to build this yourself especially at the scale that some of these customers are operating at I was I was looking at one of our one of like the customers that we chatted with this morning who was who was tinkering tinkering around they said they were doing a billion tokens a day this is a you know a a Sixers um chat chatbot company that has a billion tokens a day going through them like to to build the infrastructure that supports that with the elasticity reliability and and the performance and then build the product experience around that um that's impossible for a six person team and like I think you should try to take away the things um that you know other people can do just as well um if not better um and that's my take and I think that's kind of what I see the market covering around to as well which is that speed is a competitive Advantage let's let's spend out like we can spend our way we can buy that competitive Advantage um without a long build cycle this an awesome conversation thanks for doing it toan thanks for joining thanks Sarah thanks a lot find us on Twitter at no prior pod subscribe to our YouTube channel if you want to see our faces follow the show on Apple podcasts Spotify or wherever you listen that way you get a new episode every week and sign up for emails or find transcripts for every episode at no- pri.com

Original Description

At a time when users are being asked to wait unthinkable seconds for AI products to generate art and answers, speed is what will win the battle heating up in AI computing. At least according to today’s guest, Tuhin Srivastava, the CEO and co-founder of Baseten which gives customers scalable AI infrastructures starting with interference. In this episode of No Priors, Sarah, Elad, and Tuhin discuss why efficient code solutions are more desirable than no code, the most surprising use cases for Baseten, and why all of their jobs are very defensible from AI. Show Notes: (0:00) Introduction (1:19) Capabilities of efficient code enabled development (4:11) Difference in training inference workloads (6:12) AI product acceleration (8:48) Leading on inference benchmarks at BaseTen (12:08) Optimizations for different types of models (16:11) Internal vs open source models (19:01) timeline for enterprise scale (21:53) Rethinking investment in compute spend (27:50) Defensibility in AI industries (31:30) Hardware and the chip shortage (35:47) Speed is the way to win in this industry (38:26) Wrap

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from No Priors: AI, Machine Learning, Tech, & Startups · No Priors: AI, Machine Learning, Tech, & Startups · 57 of 60

← Previous Next →

No Priors Ep. 13 | With Jensen Huang, Founder & CEO of NVIDIA

No Priors Ep. 13 | With Jensen Huang, Founder & CEO of NVIDIA

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 8 | With Neeva’s Sridhar Ramaswamy

No Priors Ep. 8 | With Neeva’s Sridhar Ramaswamy

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 7 | With Stanford Professor Dr. Percy Liang

No Priors Ep. 7 | With Stanford Professor Dr. Percy Liang

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 1 | With Noam Brown, Research Scientist at Meta

No Priors Ep. 1 | With Noam Brown, Research Scientist at Meta

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 9 | With Perplexity AI’s Aravind Srinivas and Denis Yarats

No Priors Ep. 9 | With Perplexity AI’s Aravind Srinivas and Denis Yarats

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 10 | With Copilot's Chief Architect and founder of Minion.AI Alex Graveley

No Priors Ep. 10 | With Copilot's Chief Architect and founder of Minion.AI Alex Graveley

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 11 | With Matei Zaharia, CTO of Databricks

No Priors Ep. 11 | With Matei Zaharia, CTO of Databricks

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 12 | With Noam Shazeer

No Priors Ep. 12 | With Noam Shazeer

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 14 | With Sarah Guo and Elad Gil

No Priors Ep. 14 | With Sarah Guo and Elad Gil

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 2 | With Runway ML’s Cristobal Valenzuela

No Priors Ep. 2 | With Runway ML’s Cristobal Valenzuela

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 3 | With Stability AI’s Emad Mostaque

No Priors Ep. 3 | With Stability AI’s Emad Mostaque

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 15 | With Kelvin Guu, Staff Research Scientist, Google Brain

No Priors Ep. 15 | With Kelvin Guu, Staff Research Scientist, Google Brain

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 4 | With Zipline’s Keller Rinaudo Cliffton

No Priors Ep. 4 | With Zipline’s Keller Rinaudo Cliffton

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 16 | With Mustafa Suleyman, Founder of DeepMind and Inflection

No Priors Ep. 16 | With Mustafa Suleyman, Founder of DeepMind and Inflection

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 17 | With Karan Singhal

No Priors Ep. 17 | With Karan Singhal

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 5 | With Huggingface’s Clem Delangue

No Priors Ep. 5 | With Huggingface’s Clem Delangue

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 6 | With Daphne Koller from Insitro

No Priors Ep. 6 | With Daphne Koller from Insitro

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 18 | With Kevin Scott, CTO of Microsoft

No Priors Ep. 18 | With Kevin Scott, CTO of Microsoft

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 19 | With Anduril CEO Brian Schimpf

No Priors Ep. 19 | With Anduril CEO Brian Schimpf

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 20 | With Sarah Guo and Elad Gil

No Priors Ep. 20 | With Sarah Guo and Elad Gil

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 21 | With Datadog Co-founder/CEO Olivier Pomel

No Priors Ep. 21 | With Datadog Co-founder/CEO Olivier Pomel

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 22 | With Instacart CEO Fidji Simo

No Priors Ep. 22 | With Instacart CEO Fidji Simo

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 23 | With Snowflake's CEO Frank Slootman

No Priors Ep. 23 | With Snowflake's CEO Frank Slootman

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 24 | With Devi Parikh from Meta

No Priors Ep. 24 | With Devi Parikh from Meta

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 25 | With Palantir's CTO Shyam Sankar

No Priors Ep. 25 | With Palantir's CTO Shyam Sankar

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 26 | With Weights & Biases CEO Lukas Biewald

No Priors Ep. 26 | With Weights & Biases CEO Lukas Biewald

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 27 | With Sarah Guo & Elad Gil

No Priors Ep. 27 | With Sarah Guo & Elad Gil

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 28 | With Khan Academy’s Creator Sal Khan

No Priors Ep. 28 | With Khan Academy’s Creator Sal Khan

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 28 | With Khan Academy’s Creator Sal Khan (Japanese Version)

No Priors Ep. 28 | With Khan Academy’s Creator Sal Khan (Japanese Version)

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 29 | With Inceptive CEO Jakob Uszkoreit

No Priors Ep. 29 | With Inceptive CEO Jakob Uszkoreit

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 30 | With Vercel CEO Guillermo Rauch

No Priors Ep. 30 | With Vercel CEO Guillermo Rauch

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 31 | With Cerebras CEO Andrew Feldman

No Priors Ep. 31 | With Cerebras CEO Andrew Feldman

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 32 | With NEAR’s Illia Polosukhin

No Priors Ep. 32 | With NEAR’s Illia Polosukhin

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 33 | With Replit's CEO & Co-Founder Amjad Masad

No Priors Ep. 33 | With Replit's CEO & Co-Founder Amjad Masad

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 34 | With Ginkgo Bioworks Co-Founder and CEO Jason Kelly

No Priors Ep. 34 | With Ginkgo Bioworks Co-Founder and CEO Jason Kelly

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 35 | With Sarah Guo and Elad Gil

No Priors Ep. 35 | With Sarah Guo and Elad Gil

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 36 | With Hubspot's Co-Founder Brian Halligan

No Priors Ep. 36 | With Hubspot's Co-Founder Brian Halligan

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 37 | With Kawal Gandhi

No Priors Ep. 37 | With Kawal Gandhi

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 38 | With Material Security Co-Founder Ryan Noon

No Priors Ep. 38 | With Material Security Co-Founder Ryan Noon

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 39 | With OpenAI Co-Founder & Chief Scientist Ilya Sutskever

No Priors Ep. 39 | With OpenAI Co-Founder & Chief Scientist Ilya Sutskever

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 40 | With Arthur Mensch, CEO Mistral AI

No Priors Ep. 40 | With Arthur Mensch, CEO Mistral AI

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 41 | With Imbue Co-Founders Kanjun Qiu and Josh Albrecht

No Priors Ep. 41 | With Imbue Co-Founders Kanjun Qiu and Josh Albrecht

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 42 | With Sarah Guo and Elad Gil

No Priors Ep. 42 | With Sarah Guo and Elad Gil

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 43 | With Clara Shih, CEO of Salesforce AI

No Priors Ep. 43 | With Clara Shih, CEO of Salesforce AI

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 44 | With Former Square CEO Alyssa Henry

No Priors Ep. 44 | With Former Square CEO Alyssa Henry

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 45 | With Reid Hoffman

No Priors Ep. 45 | With Reid Hoffman

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 46 | Best of 2023 with Sarah Guo and Elad Gil

No Priors Ep. 46 | Best of 2023 with Sarah Guo and Elad Gil

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 47 | With Sourcegraph CTO Beyang Liu

No Priors Ep. 47 | With Sourcegraph CTO Beyang Liu

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 48 | With Covariant CEO Peter Chen

No Priors Ep. 48 | With Covariant CEO Peter Chen

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 49 | With Shopify VP of Core Product Glen Coates

No Priors Ep. 49 | With Shopify VP of Core Product Glen Coates

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 50 | With Stripe Head of Information Emily Glassberg Sands

No Priors Ep. 50 | With Stripe Head of Information Emily Glassberg Sands

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 51 | With Notion CEO Ivan Zhao

No Priors Ep. 51 | With Notion CEO Ivan Zhao

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 52 | With Pinecone CEO Edo Liberty

No Priors Ep. 52 | With Pinecone CEO Edo Liberty

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 53 | With AMD CTO Mark Papermaster

No Priors Ep. 53 | With AMD CTO Mark Papermaster

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 54 | With Sarah Guo & Elad Gil

No Priors Ep. 54 | With Sarah Guo & Elad Gil

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 55 | With Figma CEO Dylan Field

No Priors Ep. 55 | With Figma CEO Dylan Field

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep 56 | With Baseten CEO and Co-Founder Tuhin Srivastava

No Priors Ep 56 | With Baseten CEO and Co-Founder Tuhin Srivastava

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 57 | With LangChain CEO and Co-Founder Harrison Chase

No Priors Ep. 57 | With LangChain CEO and Co-Founder Harrison Chase

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 58 | The argument for humanoid robots with Brett Adcock from Figure

No Priors Ep. 58 | The argument for humanoid robots with Brett Adcock from Figure

No Priors: AI, Machine Learning, Tech, & Startups

No Priors Ep. 59 | With Sarah Guo & Elad Gil

No Priors Ep. 59 | With Sarah Guo & Elad Gil

No Priors: AI, Machine Learning, Tech, & Startups

The video teaches the importance of speed in AI computing and provides insights into various tools and techniques for achieving fast and scalable AI infrastructure, including Base 10, GPU clusters, and Nvidia LLM engine called trt LM. It also discusses the challenges and opportunities in AI adoption, including the need for efficient code, no-code platforms, and scalable infrastructure. By watching this video, viewers can learn how to build and deploy scalable AI infrastructure, optimize AI model

Key Takeaways

Build scalable AI infrastructure using Base 10
Configure GPU clusters for optimal performance
Deploy AI models on cloud platforms using Nvidia LLM engine called trt LM
Optimize AI models for real-time performance using speculative decoding and continuous batching
Manage AI projects effectively by prioritizing speed and scalability
Coordinate with engineering teams to deploy AI models on scalable infrastructure

💡 Speed is a key advantage in AI computing, and achieving fast and scalable AI infrastructure is crucial for early-stage companies to compete effectively.

🔒 Pro feature: Ask AI to explain this lesson →

More on: AI Systems Design

View skill →

Architecting Scalable Cloud AI Infrastructure

Architecting Scalable Cloud AI Infrastructure

I Built an AI That Made $3,500 Betting While I Slept

I Built an AI That Made $3,500 Betting While I Slept

Unreal Engine Character Development & Combat Systems

Unreal Engine Character Development & Combat Systems

Explore NVIDIA Metropolis AI-Powered Multi-Camera Tracking on AWS

Explore NVIDIA Metropolis AI-Powered Multi-Camera Tracking on AWS

NVIDIA Developer

Modernizing your Legacy Applications with Crowdbotics

Modernizing your Legacy Applications with Crowdbotics

Microsoft Developer

Accelerate AI on NVIDIA RTX AI PCs with Windows ML | Microsoft Build 2025

Accelerate AI on NVIDIA RTX AI PCs with Windows ML | Microsoft Build 2025

NVIDIA Developer

Related AI Lessons

El buen liderazgo

Learn how project managers can develop leadership skills through effective supervision and management techniques

Medium · Data Science

Why Jira is Too Complex for 90% of Startups (And What to Use Instead)

Discover why Jira might be too complex for most startups and explore alternative project management tools

Dev.to · Muhammad Azhar

Building with mini, Part 3/9: Capturing ideas with todo

Learn to capture ideas with todo lists in a project planning phase

Dev.to · Stanislav Kremeň

The Case of BYJU’s Fall: Poor Project Management?

Learn how poor project management contributed to BYJU's fall and its implications for startups in India

Medium · Startup

Alphabet stock pops on Dow debut, but the tech giant faces major AI questions

CNBC Television