Accelerating Large Scale Optimization Workflows with NVIDIA cuOpt and Metaflow

Outerbounds · Advanced ·🏗️ Systems Design & Architecture ·4mo ago

Skills: Systems Design Basics80%AI Systems Design70%

Key Takeaways

Accelerating large scale optimization workflows using NVIDIA cuOpt and Metaflow, covering systems design and optimization techniques.

Full Transcript

Um as well too this session is being recorded so be aware of that and it will be available um on YouTube hopefully by the end of the week maybe early next week. >> Hey Sandep from Silicon Valley area. Very nice. It's very nice. I'm also in the Bay Area currently in East Bay. >> Nice to meet you, San. >> Supposed on that note, version, maybe we just go ahead and get started with introductions. >> Sure. >> Um, so go ahead and stop my screen share for a sec so I can see everybody. Oh, it's only us and the camera. No worries. [laughter] Okay. My name is Eddie. I work at Outerbounds. Um, been here for about four years. Um but very lucky to uh get to work with Burchin. Um our our devro contacted Nvidia um and we've done many many projects together but a recent project coming out of Nvidia is co-opted um focused on mathematical optimization on GPUs. Um we're lucky today for our session to have Burj Basaya who is a senior Devril um manager at Nvidia and as well has a broad kind of portfolio of technical expertise and also very much understands kind of the business angles of Nvidia. Um, so very relevant to Metalflow users to Outerbounds customers in many different regards. Um, Burchin, anything you wanted to add um as as as an intro there? >> Yeah. No, that's great. Thanks for the intro. Yeah, I mean I I've been at Nvidia for about four years now and previously uh I was a professor of data science and business analytics. So a lot of uh background and experience in optimization like decision optimization solving uh mathematical programming and mathematical optimization problems uh you know a lot of research a lot of industry projects uh so yeah happy to hear happy to be here to share some of that insight with you guys. >> Awesome. Awesome. Um and as I said earlier this session is going to be recorded um so we'll put that up on YouTube. Um and we have a we have a pretty small group here that's live on the call. So if you have any questions, feel free to interject them. I'll try and monitor the chat as we're going through demos um and just kind of having a discussion. Um so feel please feel free to interject. Um we'll preserve time for Q&A at the end, but also you if the question is relevant, we'll try and just drop it in as we go here. Um so without further ado, let's uh I'll go ahead and screen share and we can just get started. Okay. So as Burian noticed we are talking about mathematical optimization. Um one kind of framing of mathematical optim or one type of problem in mathematical optimization you might have been introduced to um whether in a data science program or a mathematics program is linear programming. Um so I think this is a good way to start just to orient us kind of what types of problems are we even talking about. Um so there's all sorts of optimization and deep learning and data science that we talk about. Um, but the core I think that a lot of these types of problems kind of emanate from is this sort of linear programming formulation where we're trying to find some number of variables in a vector. We're trying to find the optimal setting for for that those variables subject to some cost measure, some kind of uh constraints around the problem. Um, Burchin, how would you kind of frame up like what what is your perspective on how mathematical optimization relates to data science and kind of what we're doing here? Well, first and foremost, uh, you know, this mathematical optimization arises in a lot of context, right? Very diverse context. So we're really talking about uh like decision- making, decision optimization under constraints and limited resources, you know, trying to figure out uh unknown decision variables and and this has huge uh areas of application, you know, like production planning, uh you know, data center optimization, uh last mile delivery, supply chain optimization, machine scheduling, personnel scheduling. Um so a lot of these problems uh can and are formulated as linear programs or mixed integer programs or quadratic programs. Um and and this this is not new. I mean it's it's been around for decades right you know you know that some of these model modeling techniques and algorithms have been around you know since the second world war. Um so like operations research the field of O uh decision making and decision optimization by means of uh you know mathematical programming and optimization solvers uh that's been around for a long time. Uh so and yeah I mean recently uh you know data science, machine learning um and like generative AI uh large language models agents are are starting to sort of play a role in in all of this. So I mean optimization used to be u in vacuum in that sense until uh the age of AI has arrived but now we have definitely have uh a lot of more context and tools and tool sets uh for us to um employ like towards solving these optimization problems. >> Mhm. So it's yeah it's pretty much everywhere and I mean if if our uh participants if if you could sort of uh put in the chat like what kind of optimization problems you are actually dealing with uh in your professional life or personal life. Uh you know feel free to uh write those down and I'm I'm sure we're going to see a wide variety of uh application areas here. >> Yeah. Yeah. Please let us know what you're working on um and kind of what what interests you have as we go here. Um very helpful intro. Um I'm very excited towards the end of this chat maybe to get into some of the kind what are the connections with these more emerging AI use cases, agent use cases, so on and so forth. Um but as BA noticed, this stuff has been around for a long time. Um there's lots of solvers um kind of in the wild that have existed for decades and decades. Um there's lots of different open source licenses. Um well, the the project we're talking about today is Cool Opt, which I would say fits nicely into this permissive licenses category. Um, it has an Apache 2 license so you can use it in your business as you see fit. Um, virt commentary or anything you wanted to say kind of about the solver ecosystem and how people turn some of these abstract math concepts into real software? >> Yeah, I mean the ecosystem is uh definitely um out there like in terms of commercial solvers and open source solvers. Um commercial solvers are obviously proprietary and they've been around uh uh well some of them have have been around for decades obviously um and Nvidia sort of chose to go with the open source route uh particularly because um you know the ecosystem has been primarily on running on CPUs right like all these commercial as well as open source solvers >> and we wanted to sort of energize the ecosystem uh towards like GPU uh adoption and GPU acceleration. So the goal here is is to sort of share what we do uh on the decision optimization front uh with the co-opt library. Uh then co-opt is is open source and GP accelerated optimization library like for solving LP mixed integer programming, routing problems, quadratic problems and by making this open source and open to the community uh I mean our goal is to energize uh folks to innovate right like new algorithms uh on GPU uh you know in an ecosystem where everything has been traditionally on CPU and I think uh the time has for folks to uh you know develop some new algorithms, new techniques, new approaches uh that run on GPU, take advantage of parallelization uh with uh you know thousands of cores on the GPU. Um so that's how uh co-op came to be. Um and I think we yeah we also started seeing other examples of uh open source solvers uh or even commercial solvers now introducing their own uh GPU accelerated um optimization algorithms currently primarily for linear programming. Um but I'm I'm sure you know as things progress uh there's we're going to we're going to be seeing more more of these solvers on mixed integer and other types of problems as well. Mhm. A couple questions that came from what you were just saying in my mind. Uh firstly, what is the relationship between when these open source use um solvers are using a GPU backend? Is that calling cool directly or is it is the relationship a bit different there? >> So co-opt itself is actually based on a number of libraries. Uh we call them CUDA X, right? So there's there's uh QDSS, Qspar, Qlus, uh you know, Q solver. So these are and there's there's a few more. Um so these are like the building blocks if you will like Lego pieces on which um you know, co-opt or any other GPU accelerated solver can actually build upon. Um so most of our uh partnering open source and commercial uh solver companies uh or initiatives they they they have been using the these building blocks to build their own uh solvers themselves. [clears throat] >> Um so in that sense uh I mean there's there's still GPU acceleration there is still uh Nvidia library running under the hood. uh the co-op itself is more uh used as as part of you know SAS platforms uh out there like business planners uh or ISVS offering uh you know optimization solutions to their end users customers >> uh those are the ones that actually choose to uh adopt and integrate co-opt in its entirety right so >> uh but the the other solver builders uh they they typically choose to build their own solvers with the building blocks that I mentioned. >> Yeah. Very interesting, very helpful as well. >> Yeah. >> Okay. Okay. Um and you mentioned as well like so one reason why I was thinking about what is the need for doing any of this on GPU in the first place. There's kind of the obvious reason that you do this with GPUs ever is like parallelization. You want more scale both vertically and horizontally is kind of an obvious use case for GPUs. Um and then you'd mentioned sort of that those are that sort of relates to the different problem types you were mentioning like linear programs versus mixed integer programs routing is clearly been a focus of co-op and NVIDIA since the beginning of this project. >> Um I was just wondering if you could comment um like you'd mentioned there's new algorithms that need to be developed for GPU. What's kind of the state of these different algorithms in the co-op ecosystem? um like are is it are they all equally production ready or are there some that are kind of ahead of the others in terms of problem types on GPU? >> Well, it's all in in progress as we speak, right? So, I mean there's definitely some uh new algorithms that emerged uh recently highly oriented towards GPU acceleration. Uh like one of them is is uh PDHG or PDLP as we call it uh primal dual uh gradient algorithm. um that operates u well that that solves a whole bunch of matrix uh multiplication operations uh under the hood which is uh you know highly suitable for GPU as you know >> um so I mean that's that's one new algorithm that that came up came out >> um and yeah I mean so this this is this is essentially in progress so what we have done is we have uh taken uh these some these algorithms and implemented them as part of co-opt uh on the GPU you know native implement native applications >> and there are some other algorithms out there uh that can partially be accelerated uh depending on like the underlying operations um and some algorithms are more of sequential nature and perhaps uh the um the complexity under the hood like in terms of the matrix density matrix size you know which may not be entirely suitable uh for them to be accelerated. But um you know we look into some of these existing algorithms as well like for instance uh you know the interior point method or the barrier method for solving the LP problem. Uh we have also put that on on the GPU. Um so you know co-opted uh for LP problems we we have this PDLP solver running on GPU which is an entirely new thing and then the barrier solver which has been around for a while but now it's accelerated on the GPU >> and then in addition to that like you know we have other uh classical solvers uh al algorithms like um you know the simplex method or dual simplex method. So, so we we I mean under the hood we we take all these algorithms accelerate uh as much of them as we can. Uh and then like you know these algorithms actually run uh all together concurrently you know racing against one one another and you know whichever finishes first whichever finds the optimal solution first you know that's that's what the co-op co-op will be reporting >> which is kind of similar to what other some other solvers are doing as well. Yeah. Okay. >> Um, yeah. And on on the integer programming side, you know, that it's yeah, that's that's a bit more complicated because, you know, these are definitely more uh complicated problems to solve because of the combinatorial nature. Um, you have integer variables, binary variables. Um, you know, there's a well-known technique called branch and bound. Uh, which is not entirely a parallelizable. I mean, it could run on multiple threads. uh you know when you explore um the branch unbound tree but if you get into the weeds of this uh you know I mean we're still like co-opt is still running branch unbound on uh primarily on CPU >> because of the >> there's inherently sequential features of these kinds of at some point right >> so yeah new algorithms existing algorithms uh you know we accelerate as much as we can uh you know given uh you know different problem types, different uh contexts uh and also like you know complexity and size of the problem right so that uh >> is another aspect uh that's that's where the GPU actually might uh prove useful >> uh you know as as the complexity and then the size increases uh there's probably more room uh in that sense like to scale up and and you know solve these problems faster. >> Yeah, for sure. Um, before we start segueing kind of into a demo session and sort of how to start operationalizing some of this stuff, Daniel had a really nice question. Um, are there any plans to extend Coolop to support Julia APIs? Um, so we're going to be looking at the Python APIs. Um, but was curious Bion if you have any sense of the road map for other programming. >> No, that's a great question. Uh I mean since we started co-op uh we actually did a lot of uh integrations uh with you know different APIs, different language, different modeling you know building platforms. Uh so for Julia that you know things are still in progress. Um I mean we have we have done uh like you know other Python API like pulp uh or um what else did we have? Ample, GAMS, CVXPI, uh you know there's I guess most of that stuff is revolving around uh Python >> uh you know different um platforms you know different APIs but Julie is is on the way uh you know it it should be coming up soon. >> Awesome. Awesome. Okay, so as we to segue into this demo, um I'll kind of tee it up and then curious curious your reflections on kind of how customers actually do this stuff in the real world. Um but there's sort of two modes that Burchin and I have been discussing about how to run co-ops. Um and then one of them obviously connects to Metaflow per the title of this discussion. Um the first one is this what they're calling server mode here in the documentation. um which is essentially running co-opt as a sort of an API service um where you kind of have this always up deployment and then you send API requests over the network and it sends you results. Um so we'll show a demo of running that as um I'm going to use outerbounds as a deployment um mechanism here but this could run on any infrastructure you'd like. Um and then the other mode that we'll discuss afterwards is sort of a batch a more scalable batch pipeline way of invoking co-opt. Um and that we'll use metalflow for of course. Um before I dive in though virtually I'm just curious like what's your sense of the general split like what are the c the more serious enterprises in the co-op ecosystem doing in that regard? Yeah, I mean from an enterprise perspective, uh I think what what we see is is a lot of uh like you know some on-rem uh implementations uh you know in addition to cloud-based as well. So um you know co-op can be consumed in a variety of ways like like you were describing. Uh you know the the server setup is is definitely one uh one of the options and we see uh you know some of our partners and customers use that uh as as part of you know whatever infrastructure that they have. Uh so um another I guess another way to uh use co-op uh you know is is is go through the the micros service uh setup um on whatever uh GPU node or cluster that they're operating. >> Um so that that is another that definitely is another uh valid way of um uh using this. Uh I've also seen some some cases where um you know some of our partners and customers they actually they take the source code they make uh you know uh certain changes uh you know to you know >> I mean this this appears I guess more on uh not on the LP and my side but more like on the VRP routing solver side of things you know because you want to add additional features. So they do some customizations, they build uh co-op uh from source and they integrate uh those libraries into whatever uh you know and application uh that they have. So that is another way of uh usage that that we see. >> Yeah, interesting concepts. >> Yeah, >> very helpful. Cool. Um okay, so without further ado, um I'll show that first mode. So we have here I'm running this as again as I mentioned outerbounds deployments but the point is is this is all kind of open source code. Um when you actually have your environment constructed there's both docker images. Um here we're using the channel dependencies that are distributed by NVIDIA. Um also through the rapids AI channel in addition to the NVIDIA channel. Um, but essentially these are the dependencies that we need inside of our Python virtual environment. And then we're able to run this entry point command and that's going to stand up the server um to not waste your time during the demo. I've already done this um on an outer bounds deployment. Um but again, this is this this really has nothing to do with outerbounds. Um this is just I just did this because I had GPUs on this deployment already. Um but what you'll see here is if I zoom in a little bit on the standard out of this server. Um, so once I start once I set things up, I'm able to start sending these requests in. Uh, things are getting a bit truncated on this server startup because I've already run a few jobs. Um, but what I'll want to do is basically come back to this area and we'll see kind of how the API is actually working where co-op has this the the service inside of it has its own kind of system for maintaining these different requests um, and then returning results and solutions in the terminology of the the optimization problems back to the client. Um, so to kind of further like what what lane are we in here? It seems like folks on the call are pretty familiar with optimization. So I'll kind of gloss over the details. Um, but for this toy example that I'll send over this the server. Um, imagine we're kind of solving um, sort of an a resource planning problem where we have different types of energy that we can um, use. Think of yourself as like a municipal planner or something. um and you're trying to find kind of the cheapest way to power a city given some constraints around renewable energy requirements um emission caps things like this. Um so the problem is all set up inside of this file which I'll share the actually I just share in the chat quickly if folks want to follow the code. Um so the main thing that we're doing here is solving like a pretty simple kind of uh planning problem here where we have different constraints that are different for different types of energy production. Um so the main thing that I'll do to kind of show here is I'll just change around the cost and run this example a couple times so we can see what happens with the problem. Um but it's it's very easy to do once I have the um co-op server deployed. we'll very quickly be able to send this kind of again really small version of the problem. So obviously we're not really flexing the GPU here. Um but the point is is this can be quite a nice kind of interactive development way to get used to GPUs. Um and you can really scale these problems out to be quite massive even over this kind of client server model. Um the results here just to again kind of point out the baseline is showing us that so we're trying to pick basically how far to fill up these bars in the left side plot. We're trying to figure out how much of each different type of energy source to use. Um and then on the right we can see another histogram showing kind of um different like ways that we're coming up against the constraints of the problem. So we can see here like we're we're over exceeding the amount of renewables that we meet need to meet as a constraint. We're not meeting the emissions budget. So there is still budget to I guess pollute more if you want to look at it that way. Um but the like one thing that we can do then is to kind of get a sense of this problem is if you're not familiar with optimization like you might be running sensitivity analyses over these different cost vectors. Something like this is a very common thing that you might want to be doing. Um, so as we're in this interactive mode, you might just be changing one variable around. Like I just changed the cost of nuclear megawatt hours from 90 to 10. Obviously kind of ridiculous, but these are the kinds of analyses you might want to be doing. So if I run the problem again, we should see this result. The solution that co-op returns back to me over the network should be different. Um, now we're no longer So before we were using um gas 100 megawws, now we're not using it at all and we are using nuclear. So this is kind of how the problems work. Um and then you can sort of imagine all right now like in real systems I probably don't want to be changing these variables around a bit. Um so that's kind of how I think about the motivation now to go into the batch processing with metalflow use case. Um but I want to pause there see if there's any questions or Burton toss it to you as I think you you made a great point there Eddie because uh like you know this whole sensitivity analysis or what if analysis that that you just mentioned it >> it's extremely valid you know in in energy as well as you know many other problem types or or sectors >> and like you know the problem here is is very small of course right you have like five decision variables or something but if if you're dealing with a you know thousands of or or hundreds or millions of variables uh you know obviously the combinations are like more than the atoms in the universe probably right >> um so in such cases I think you know being able to run these uh scenario analysis fast uh that makes a huge difference and that's that's where the GPU acceleration uh comes useful right so you you want to be able to solve resolve resolve resolve these problems uh many many times and you want to do that fast >> and we have some really good experience like for instance uh in in the energy sector you like unit commitment problems uh you know power flow problems um so you know and and some of these problems are of stochastic nature right so the parameters uh are not deterministic they they can change so that's where GPU acceleration actually makes a huge difference >> interesting interesting would you say in general that that's there's something like categorically different about like kind of how these problems run on the GPU that in this in this kind of sensit ity analysis context or is it more just you know we're just doing more compute in that context so the more parallel >> I think the structure of the problem is more suitable towards uh like you know running the operations on the GPU like you know you know dance operations or sparse operations like matrix multiplication so it it uh the structure lends itself uh to GPU acceleration in that manner >> yeah actually that's a very good point one thing I so as I was kind of getting used to over the last couple of weeks one thing I noticed is originally I was getting very bad results on the GPU speed up time or it was even just to build the problem variables when I started scaling it up was taking so long on GPU until I realized that I wasn't following what you'd mentioned to me about GPUs work better with the sparse data inputs. Um so anyways I'm I'm highlighting on the screen here like a way that you can use the CSR representation to kind of give the GPU the sparse representation of the problem. Um but curious if you could just comment on what is the reason why sparity matters for GPUs and like why do we frame these problems in in the sparse? >> Well actually we have libraries that can handle like you know both uh sparsity and uh you know density uh or or higher density of uh the algor I mean of the model like under the hood. uh what you are showing here is is like the CSR or or sparse uh representation of the matrix which means uh you are passing only the required elements from the matrix right like you're not really passing a huge so that that has obviously memory implications right so with uh models that have like millions of variables and you know rows and uh columns uh that actually matters uh like you know we occasionally we run into situations where you know the uh the end user says I ran out of memory on the GPU. >> I mean this this problem is is too large or whatever which may happen. Uh but you know with with the right way of uh you know populating the GPU memory with these representations and of course under the hood we have uh algorithms and libraries that are suited towards processing this representation >> right uh sparse representation. So you know all these things combined uh that's where you get uh the value out of that acceleration otherwise you know you either run out of memory or you your matrix operation doesn't is not very efficient on a full matrix structure uh and yeah that's that's when you don't really get good results. >> Yeah. Yeah definitely makes sense. Is are there any like heristics that you keep in mind or that you would suggest practitioners keep in mind around like say say I'm used to doing the CPU solvers and now kind of is my first time I'm trying to use GPU solvers for these problem sets. Is there something different about sparity as you kind of move into the new domain of of GPU programming for optimization or is it kind of the same mental model as you see it? >> Well, I mean yeah like CPU solvers uh work a little bit differently. Uh so like you can obviously you can take a model uh and pass it on to um a GPU based solver like co-opt uh you know the >> uh mathematical representation the MPS file or like whatever uh representation you want to send uh it's it's actually the algorithm under the hood that that matters right so because uh I mean these these models are more or less the same uh representation Unless you come up with a different formulation of the model, right? Because you know there are >> different ways of formulating the same problem with different mathematical notation, different set of variables and whatnot. So that has an impact on the actual uh problem structure, the the the matrix, the sparity of it and everything. So I mean you could take a model uh and pass it to a GPUbased solver like co-op and it it will solve it. uh but sometimes you know analyzing the model structure maybe reformulating necess if necessary uh that could have an impact on uh the complexity uh which will then have an impact on how how fast you can solve this on the GPU. >> Well said. Okay. Okay. Um All right. So I I'll keep moving here. So one thing I wanted to show um I I won't run it because I'm ran a little bit longer in the previous section. Um there's also implementations that you can reference here if you are a metalflow user or if you work with a different workflow orchestration tool. You can kind of extract these patterns from the different examples here. Um, we've we've solved or kind of used the co-ops example sets in the documentation to kind of build up a basic example for linear programming, mixed integer programming, um, quadratic programming, and then of course the vehicle routing problem, which I think I saw in the chat that some folks are working on vehicle routing as well. Um, definitely in the is kind of the one of the main reasons for for KUOP's existence. Um, there's also like a benchmark that I ran which we can look at the results of. Let me see if I've already run the charts. Um, yeah. Okay. So, this is this is the one I wanted to show. Um, so this will give you a sense of obviously not every case is going to work like this. Um, so as Burton mentioned earlier, like different types of problems have big effects on how much the GPU speed up matters. Um, even like the structure within a problem type is going to have a lot of effects on how much the GPU parallelization gives you an acceleration boost. Um, but if you consider this example, it's a linear program and we're scaling on the x-axis. We are scaling on a log scale the size of the problem. Um, so there's like a couple different dimensions that are moving. I've just used the regions here, but you can think of everything kind of moving in like a a linear um everything is is proportional as we scale up to the right on the on what I'm considering the problem size. And then on the um left axis, the y- axis, we're seeing also on the log scale the total time to solve the problem. Um now this red line on the bottom, this is how the GPU solver scales. And this is using the PDLP algorithm that Burton mentioned earlier. >> So we can see when the problem has this size of 1,000 regions. Um, you could think of this being kind of like imagine that problem that we just solved um in the smoke test, but you're solving it for a thousand different municipalities, something like that. Um, you can see there's really no benefit at all to using the GPUs. But as we move to the right, the benefit becomes obvious very quickly where all the way to the right of this chart, we can see that it's over a 100 200 times speed up for um a comparison against these different CPU versions of the same algorithm. um or or in the the green and the blue cases, different algorithms, but they're solving the same problem. Um okay, so I'll pause here. I know benchmarks always take with a grain of salt, but Burton, do you have any commentary or kind of thoughts on this performance improvement and then sort of just general how does that generalize to like how you think about like what what should a practitioner take away from this or >> Yeah, this is this is pretty typical uh that we observe um you know with GPU versus CPU and I think it was one of the questions in the chat like you know how do you uh like what are there any benchmarks comparing uh like GPU performance uh with these types of problems? versus CPU and then I think this this picture clearly shows that uh this uh remarkable speed up here on GPU and that's actually uh maybe it should be I should say that it's primarily due to the nature of the algorithm that's used here PDLP uh that is uh you know highly oriented towards uh GPU because under the hood it it does a lot of uh iterations primal dual iterations that involve uh matrix multiplications. So >> um and we have seen like we have benchmarked uh we actually published a number of tech blogs uh over the last year uh that show and I can share some of those links. I have them handy maybe later in the chat >> that that shows like you know LP speed up uh going maybe as high as 3,000x uh on certain types of problems like multicommodity flow problems uh essentially. Mhm. >> So it yeah it it could go even yeah higher than what we are seeing uh in in this graph here. We we have we definitely have seen a lot of speed ups with LP uh with the MEIP. I don't know if you have a MYIP example here. >> Um I do have an implementation but I don't I don't have a benchmark plot handy. >> Okay. >> I mean we have seen uh also a lot of speedups on the on the myip front as well. Um yeah maybe I I'll I'll let you present that first and then >> this one's not I mean yes it's just the implementation so we're not gonna I don't want to take too much time >> producing I mean the MIP MYI my solver that we have is also uh like it's uh it's a heristic algorithm accelerated on the GPU uh and then the classical branch and bound branch and cut uh running on CPU uh so these two algorithms again run together face together and they also share information between one another uh you know to improve each other's uh search algorithm, search path if you will. Uh so we we have heard from uh some of our partners that uh you know they uh report like 60 70x speed up over what they're doing on CPU uh you know with um with the uh co-op myip solver basically. >> Uh the VRP solver is is a is a whole different beast I should say. You know it it's it's actually the very first solver that that we developed under co-opt. Uh and >> is that this one under routing optimization? >> Routing. Yes, exactly. >> Yeah. >> Uh so it's it's a heristic algorithm uh but highly parallelized uh you know running so many uh kernels and cores uh doing parallel uh search for better and better solutions. Right. >> Yeah. Uh and and recently there was a LinkedIn post uh one of our partners they reported 1,400x speed up. >> Wow. >> Over the CPU solution that they were uh they heard from their client. Uh so and another partner of ours reported like 250x speed up. So with the VRP problem, it it's especially remarkable like if I I I saw a few folks mentioning uh VRP and path finding uh routing optimization. So it Yeah, I mean you should give this a try. It's it's remarkable speed ups uh we we get out of the VRP solver we have. >> Nice. Very nice. Awesome. Um, okay. So, I wanted to before we open things up to a general Q&A, um, I wanted to kind of just hear what is your sense of what's next, like what's on the road map for co-ops. Um, if folks are kind of seriously interested in getting into the vehicle routing space or any of these problem types that we've discussed. Um, what should what should you be keeping on your radar? And then maybe we can segue into the kind of AI angles that we talked about. >> Sure. Yeah. No, thanks. Good questions. Um yeah I mean we definitely uh continue to invest in co-op uh library optimizing uh optimization solvers. This has been in development for like 3 four years now. Uh started with VRP we added LP and myip lately we added quadratic programming >> solver. Um so I think what I see or what we see as the co-op team is u well one thing that we see is that most of the problems out there optimization problems are actually mixed integer programming problems. >> Uh I would say like maybe 80%. So that that's >> like an enterprise or is it like the same across all industries and kind of use cases that you see? I I I think across the board that that that's that's relatively good estimate uh like applying to different verticals different types of problems. >> Uh so we are spending more time uh you know improving or enhancing the performance for the myip solver >> uh that we have. Um I mean some of the solvers out there are running on CPU. The commercial solvers they've been around for decades and they have implemented so many amazing uh you know algorithms and you know additional work that you know >> provide the the speed and accuracy that they do >> which is remarkable. Uh so and you know the the co-op myip solver is also you know running on GPU and CPU. So we we we are we are working on making that even better and better. Um so in that sense u that's definitely one area we're following. Uh we have recently made very significant improvements on the LP solver as well. So that in the new version it's actually coming out today. Uh it's such a coincidence 2602 version of co-op. It's coming out uh today um with all these improvements. >> Um >> guess that's the best sign. Development is active. It's always the best sign. >> Development is very active. uh you know there there's there's a decentsized team of engineers [clears throat] behind this and we continue to hire occasionally u so yeah I mean we're investing along those lines I think one other area uh that and you mentioned that briefly by saying AI um is like how can we how can we democratize uh you know model building problem solving and decision optimization you know with the help of uh AI or generative AI, large language models or agents, right? So >> that's definitely another area of direction for us that that we're exploring. >> Do you see it like um I'm actually curious do you see it like it's the agents getting access to these kinds of things as tools or it's more in the actual process of running the agent itself there is optimization problems that are popping out or maybe a mix of both. I'm not sure. >> I guess both, right? I mean like the the immediate use that comes to mind is like agents orchestrating uh like you know how to uh you know what kind of workload uh to pass on to which uh tool which uh so co-opt would be one of them like you know solving optimization problems as they come across. Um but you know there are other places where um you know optimization can be applied like orchestrating the workloads of uh like mixture of expert for instance algorithms right like how how do you allocate tokens optimally uh uh to different uh LLMs different experts I mean deepseeek has recently uh published some LP based optimization on that >> right right >> uh or orchestrating agents uh you know distri distributing workloads amongst themselves. I mean that that is another topic for optimiz. I mean optimization is everywhere in that sense. >> Uh you know but one goal that we have is um because optimization so far has been uh requiring like operations research expertise right like you know being able to build these models define those decision variables define those constraints properly and you know effectively because there are different possible formulations for a single given problem. uh that requires some expertise and I think you know generative AI can actually address that right you know just like [clears throat] you know you ask a question uh >> to a chat >> language natural language to pro problem for >> exactly yeah I mean tell the nature tell the the you know natural language what your problem is what are what your constraints are >> um and you know it should be able to build the model >> uh obviously solve it using co-op or any other solver under the hood. So like something like an end to end uh workflow uh that starts with nature language and you know ends with solving the problem but then you know there's what if analysis which can also be in nature language like what if my supplier is not supplying anymore or >> things like that right so >> all of that you know the AI and genai playing into this entire thing um I think that's another direction that we would like to follow >> yeah I think overall Um, oh, nice. We have a twopart question from Ragavendra. Um, do you have any comparison of how co-opt compares? Oh, we're getting into the bake off. Co-op compares against Goi version 13 that uses GPU for optimization. Um, and then the second part I guess could be could be related, could be unrelated, is um, is there anything coming up on mixed integer nonlinear programming from co-ops? >> Sure. Yeah, we always get that question like how does co-op compare to you know not just open source but you know commercial solvers as well. U the thing to remember is uh you know GPU acceleration and and all these algorithms oriented towards uh GPU uh it it's still an you know emerging field uh so you know Groby is is one of our partners we work with them we want them to also accelerate their solvers I mean Nvidia is happy whenever uh our partners you know use the GPU for any reason right >> uh so and they they have been uh working with us also um you know and they they recently released uh you know Groby Fico Express uh they recently released their GPU uh solvers for LP and you know we we don't want to say we compare or we we compete or anything like that instead you know we work with them you know there are cases and problems where u you know a a GPU accelerated co-op solver might work better >> uh other cases where you know Groy's uh decades of uh implementation of you know branch and bound algorithms uh will likely work better. So we we we we tend to think of them as as complimentary, right? Like so >> you know you can get both solvers run them and uh see what you get out of of them like for a given type of problem that you have and if if Groby works better faster yeah by all means because it's also on on GPU uh it's also accelerated Nvidia is happy uh and if if co-opt works for some other types of problems uh yeah we're we're still happy. [laughter] >> Yeah. Yeah, I suppose reflecting on kind of like the infrastructure decisions out of bounds customers are often asking us. Um, is it correct to say though like there's still it's not exactly apples to apples comparison in the sense that with Gorobi you're getting a license more like I don't know using a vendor API for LLMs or something as opposed to kind of hosting your own GPU infrastructure and kind of running the compute directly. Is that the right way to think about it? Or are people still are is it still like I'm going to go to Amazon and spin up my own GPU instance or something and then run the Gorobi system there or how do you see people actually like kind of doing this um in a way that makes it comparable? >> I mean yeah like there's there's multiple options here. So people can still go to cloud uh you know spin up instances or they can come to you uh because you have your platform offers uh a variety of GPU alternatives >> uh and you know with the integration of uh co-op or or any other solver for that matter. you know you could uh provide um this decision optimization capability. Uh I mean yeah different enterprises prefer go different routes. Uh some of them they have their own clusters right you know they they run uh on prem uh their own workloads um with Groby or any other GPU uh you know integrate GPU solver integrated. I mean we have recently uh worked with u a number of business planners who are I'm not able to name them today but uh it's it's coming up uh in the news sometime soon uh that you know they have uh integrated uh co-op as part of their SAS platforms. Uh so it's yeah I mean the the I guess there's there's a variety of combinations here uh that different enterprises uh prefer um you know from an infrastructure perspective as well as a solver perspective. >> Yeah suppose like all benchmark it depends is a valid answer. >> Sure. And speaking of which, I mean, we we have a major conference coming up, right? So, you know, some of that uh news and some of the, you know, most recent uh things that we're working on will be unveiled at the GTC conference. U >> I I don't know if if our audience here is familiar with the GTC conference. Hopefully, they are. >> Yeah, GTC [laughter] is great. >> And yeah, if you uh and we have a number of sessions there. So let me actually >> do we have any links that we could put in the chat? >> Yeah, like if if you haven't registered yet. So that's the link I shared, you know, uh is for registration or or the general uh GTC uh web page. Um we have a talk by Ernst Young. I mean they uh they have co-opt implementation. Uh they're going to talk about that. Um there's a panel happening in the uh like um uh factory setting. Um so maybe this this this could also be interesting to the >> uh to the group. Uh we actually offering a a a DLI deep learning institute uh training lab. So it's like you know we're actually going to have a hands-on session. Let me paste that here as well. hands-on session that um you know the participants can you know take the notebooks you know run these exercises for pretty much all the solvers that we have under co-opt uh and then there's another training lab for portfolio optimization uh let me paste that in there as well um yeah so I mean yeah all of these links require sort of you to uh or or people to create an account first uh and and see the GTC catalog. But it's all out there. Um you can take a closer look and yeah we have all kinds of other uh resources as well you know u some tech blogs that that we have recently published under NVIDIA tech blog page talking about the myip solver the heristic uh solver uh primarily or the the barrier algorithm we implemented on GPU with the LP solver. Um, do I have those as well? Yeah, let me let me copy and paste them as well. I don't want to flood the chat with lots of links and everything, but yeah, I guess people can. >> We're wrapping up. Yeah, I don't see any new questions coming in. >> Awesome. On that note, if you do have any last minute questions before we sign off, please add them in the chat in the next 30 seconds or so. Um, >> let me put this one here as well. If you want to get your hands dirty uh with with co-op u obviously you can do that through um metalflow platform we also have a separate uh examples repo. I think that's what Eddie was referring to earlier. >> So this this repo's got you know a lot of different examples from different uh problem domains. Um yeah feel free to take a look at them. >> Awesome. Awesome. And I will drop one more link in the bottom of the chat if you happen to be interested to join the metalflow and out of bounds community. That's the link to our Slack channel. It's about 10,000 at the not not quite 10,000 but almost um ML engineers, data scientists, um people will ask general questions, metalflow specific questions. Um I do see last minute are there any gotchas with respect to co-ops plus metalflow? Um, I would not nothing really gotchas as far as I I mean, granted, I've only been working with the tools together for 3 or 4 weeks, so I'm not the most experienced person in that regard. But what I'd say is like what what I like about the combination is that Metalflow gives you kind of this nice separation between it's like it's like always with Metalflow. separates your workflow logic from your business logic and your actual data science work in a way that makes it very clean to kind of understand the optimization problem. But then at the same time you can see these kind of infrastructure level tooling integrations like for example like metalflow can kind of run Nvidia SMI on a loop and sort of profile the utilization across all these different batch workers. Things like this become very helpful when you're trying to actually answer some of these questions that I lobbed very sloppily at Burchin's way around like okay which GPU type should I use this kind of stuff you can actually see the data basically is the reason um to kind of connect it to the workflow orchestration side of things in my view um yeah it just works like the I mean the biggest thing I guess is just finding the right cond channels which um luckily the Nvidia folks take good care to package those correct ically. Um I know there's like there's often issues with data science packages on but um the NVIDIA in Rapids AI channels everything works out of the box as expected. >> Yep. Speaking of uh GPUs, I mean we also encourage uh developers to try co-opt on uh like whatever GPU that they have, they might have uh with some some bare minimum of course because like you know we want uh developers and folks to sort of get their hands dirty uh with co-op and GPU acceleration on optimization, right? So um I mean not everybody has a data center GPU of course but you know for uh development purposes for testing these algorithms uh on on your own use case uh you know we essentially we have uh a minimum level that we have set uh for for GPUs uh that that you can use and if if you if you satisfy that uh I mean that actually has has a variety of GPUs aail ailable in that sense. Uh so not just uh you know data center GPUs. Uh and if if you need of course higherend uh GPUs like like for basically for performance benchmarking and for production deployment uh I mean Nvidia certainly offers uh you know black wells and hoppers and uh like the next generation is also coming up uh obviously this coming year. Um so yeah I mean we want want this to be available to as many people as possible. Uh and you know just just reach out uh to our uh platform uh git repo discussions if you have any questions about uh you know uh running co-op on a particular GPU or getting access to you know GPUs and you know just let us know and of course we're always welcome to uh contribute uh to the repo uh if you are into uh programming on GPU u so we definitely They welcome all kinds of contributions in that sense. >> Yeah. Yeah. Nice. Um, okay. One more question comes in. What is the benefit of using Metalflow and Co-op together? Um, okay. So, I'll well I'll probably reference like go back to the beginning of this chat when we put it on YouTube. Um, but I'd say there's there's certain it depends on what you're doing. Like there's there's some you like workflow modes that you might be in where there is no benefit to using metalflow on top of co-opt I'd say. And there are other modes where it definitely makes a lot of sense. Um so in that in that way it's no different than any other kind of like data science or like data programming problem in my view. Um as as it pertains to kind of metalflow as a workflow orchestration solution. Um, the modes where it's not useful is if you're doing kind of ad hoc analyses or you're just logged directly into the GPU server and you're just kind of running the analysis and you just export a JSON file somewh

Original Description

Code: https://github.com/outerbounds/cuopt-project/tree/main GTC Registration: https://www.nvidia.com/gtc/ EY talk: https://register.nvidia.com/flow/nvidia/gtc26/ap/page/catalog/session/1768237564768001mmxt Panel: https://register.nvidia.com/flow/nvidia/gtc26/ap/page/catalog/session/1764953367009001IbXV DLI Training Lab: https://register.nvidia.com/flow/nvidia/gtc26/ap/page/catalog/session/1764979279977001yWb0 DLI Training Lab: https://register.nvidia.com/flow/nvidia/gtc26/ap/page/catalog/session/1765225823733001O0Vj MIP heuristics Tech Blog: https://developer.nvidia.com/blog/learn-how-nvidia-cuopt-accelerates-mixed-integer-optimization-using-primal-heuristics/ Accelerated Barrier Tech Blog: https://developer.nvidia.com/blog/solve-linear-programs-using-the-gpu-accelerated-barrier-method-in-nvidia-cuopt/ Portfolio Optimization Tech Blog: https://developer.nvidia.com/blog/accelerating-real-time-financial-decisions-with-quantitative-portfolio-optimization/ cuOpt Examples repo: https://github.com/NVIDIA/cuopt-examples http://slack.outerbounds.co/

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Playlist UU5h8Ji6Lm1RyAZopnCpDq7Q · Outerbounds · 0 of 60

← Previous Next →

Metaflow GUI for monitoring machine learning workflows

Metaflow GUI for monitoring machine learning workflows

Metaflow Cards [no sound]

Metaflow Cards [no sound]

Fireside chat #1: How to Produce Sustainable Business Value with Machine Learning

Fireside chat #1: How to Produce Sustainable Business Value with Machine Learning

Fireside chat #2: MadeWithML.com -- Teaching Practical Machine Learning

Fireside chat #2: MadeWithML.com -- Teaching Practical Machine Learning

Metaflow on Kubernetes and Argo Workflows [no sound]

Metaflow on Kubernetes and Argo Workflows [no sound]

Fireside chat #3: Reasonable Scale Machine Learning -- You're not Google and it's totally OK

Fireside chat #3: Reasonable Scale Machine Learning -- You're not Google and it's totally OK

Metaflow Tags: Programmatic Tagging

Metaflow Tags: Programmatic Tagging

Metaflow Tags: Basic Tagging

Metaflow Tags: Basic Tagging

Metaflow Tags: Tags in CI/CD

Metaflow Tags: Tags in CI/CD

Metaflow Tags: Tags and Namespaces

Metaflow Tags: Tags and Namespaces

Metaflow Tags: Tags and Continuous Training

Metaflow Tags: Tags and Continuous Training

Fireside chat #4: Machine Learning and User Experience -- Building ML Products for People

Fireside chat #4: Machine Learning and User Experience -- Building ML Products for People

Fireside Chat #5: Machine Learning + Infrastructure for Humans

Fireside Chat #5: Machine Learning + Infrastructure for Humans

Metaflow Sandbox Demo: Free Data Science Infrastructure In the Browser

Metaflow Sandbox Demo: Free Data Science Infrastructure In the Browser

Metaflow on Azure

Metaflow on Azure

Fireside Chat #6: Operationalizing ML -- Patterns and Pain Points from MLOps Practitioners

Fireside Chat #6: Operationalizing ML -- Patterns and Pain Points from MLOps Practitioners

ML engineering vs traditional software engineering: similarities and differences

ML engineering vs traditional software engineering: similarities and differences

Why data scientists love and hate notebooks: velocity and validation

Why data scientists love and hate notebooks: velocity and validation

What even is a 10x ML engineer?

What even is a 10x ML engineer?

The 4 main tasks in the production ML lifecycle

The 4 main tasks in the production ML lifecycle

Is the premise of data-centric AI flawed?

Is the premise of data-centric AI flawed?

The 3 factors that Determine the success of ML projects

The 3 factors that Determine the success of ML projects

Fireside Chat #7: How to Build an Enterprise Machine Learning Platform from Scratch

Fireside Chat #7: How to Build an Enterprise Machine Learning Platform from Scratch

Run Metaflow on any cloud: Google Cloud, Azure, or AWS [no sound]

Run Metaflow on any cloud: Google Cloud, Azure, or AWS [no sound]

Metaflow on GCP

Metaflow on GCP

Fireside Chat #8: Navigating the Full Stack of Machine Learning

Fireside Chat #8: Navigating the Full Stack of Machine Learning

How to Build a Full-Stack Recommender System

How to Build a Full-Stack Recommender System

Modernize your Airflow deployments with Metaflow - zero-cost migration [no sound]

Modernize your Airflow deployments with Metaflow - zero-cost migration [no sound]

Easy Airflow DAGs for ML and data science with Metaflow [no sound]

Easy Airflow DAGs for ML and data science with Metaflow [no sound]

Fireside chat #9: Language Processing: From Prototype to Production

Fireside chat #9: Language Processing: From Prototype to Production

How to build end-to-end recommender systems at reasonable scale

How to build end-to-end recommender systems at reasonable scale

Full-Stack Machine Learning with Metaflow on CoRise

Full-Stack Machine Learning with Metaflow on CoRise

Natural Language Processing meets MLOps

Natural Language Processing meets MLOps

Fireside Chat #10: Large Language Models: Beyond Proofs of Concept

Fireside Chat #10: Large Language Models: Beyond Proofs of Concept

What even are Large Language Models?

What even are Large Language Models?

How to get started with LLMs today

How to get started with LLMs today

LLMs in production

LLMs in production

Accessing secrets securely in Metaflow [no audio]

Accessing secrets securely in Metaflow [no audio]

Fireside Chat #11: The Open-Source Modern Data Stack

Fireside Chat #11: The Open-Source Modern Data Stack

Fireside chat #12: Kubernetes for Data Scientists

Fireside chat #12: Kubernetes for Data Scientists

Behind the Screen: How Amazon Prime Video ships RecSys models 4x faster

Behind the Screen: How Amazon Prime Video ships RecSys models 4x faster

Fireside chat #13: Supply Chain Security in Machine Learning

Fireside chat #13: Supply Chain Security in Machine Learning

Quick Delivery, Quicker ML: DeliveryHero's Metaflow Story

Quick Delivery, Quicker ML: DeliveryHero's Metaflow Story

Crafting General Intelligence: LLM Fine-tuning with Metaflow at Adept.ai

Crafting General Intelligence: LLM Fine-tuning with Metaflow at Adept.ai

Fuelling Decisions: How DTN Powers Gas Pricing and Data Science Collaboration

Fuelling Decisions: How DTN Powers Gas Pricing and Data Science Collaboration

From Kitchen to Doorstep: Optimizing Data Science Velocity at Deliveroo

From Kitchen to Doorstep: Optimizing Data Science Velocity at Deliveroo

Building a GenAI Ready ML Platform with Metaflow at Autodesk

Building a GenAI Ready ML Platform with Metaflow at Autodesk

Media Transcoding for 10 Million users and beyond with Metaflow at Epignosis

Media Transcoding for 10 Million users and beyond with Metaflow at Epignosis

Telematics with Metaflow: How Nirvana Insurance built a large-scale Risk Estimation platform

Telematics with Metaflow: How Nirvana Insurance built a large-scale Risk Estimation platform

Fireside chat #14: Generative AI and Machine Learning for Film, TV, and Gaming

Fireside chat #14: Generative AI and Machine Learning for Film, TV, and Gaming

The Past, Present, and Future of Generative AI

The Past, Present, and Future of Generative AI

Building Production Systems with Generative AI, Machine Learning, and Data

Building Production Systems with Generative AI, Machine Learning, and Data

A Custom Fine-Tuned LLM in Action (LLMs, RAG, and Fine-Tuning: An Interactive Guided Tour Part 5)

A Custom Fine-Tuned LLM in Action (LLMs, RAG, and Fine-Tuning: An Interactive Guided Tour Part 5)

Building Live Production Systems with RAG (LLMs & RAG: An Interactive Guided Tour Part 4)

Building Live Production Systems with RAG (LLMs & RAG: An Interactive Guided Tour Part 4)

Better Relevancy with RAG (LLMs, RAG, and Fine-Tuning: An Interactive Guided Tour Part 3)

Better Relevancy with RAG (LLMs, RAG, and Fine-Tuning: An Interactive Guided Tour Part 3)

Working with OSS LLMs (LLMs, RAG, and Fine-Tuning: An Interactive Guided Tour Part 2)

Working with OSS LLMs (LLMs, RAG, and Fine-Tuning: An Interactive Guided Tour Part 2)

Hitting OpenAI and Other Vendor APIs (LLMs, RAG, and Fine-Tuning: An Interactive Guided Tour Part 1)

Hitting OpenAI and Other Vendor APIs (LLMs, RAG, and Fine-Tuning: An Interactive Guided Tour Part 1)

Production Systems with Generative AI (LLMs, RAG, & Fine-Tuning: An Interactive Guided Tour Part 0)

Production Systems with Generative AI (LLMs, RAG, & Fine-Tuning: An Interactive Guided Tour Part 0)

LLMs in Practice: A Guide to Recent Trends and Techniques

LLMs in Practice: A Guide to Recent Trends and Techniques

Metaflow for distributed high-performance computing and large-scale AI training

Metaflow for distributed high-performance computing and large-scale AI training

This video teaches how to accelerate large scale optimization workflows using NVIDIA cuOpt and Metaflow, covering systems design and optimization techniques. It provides a comprehensive overview of the tools and techniques used to optimize workflows, including MIP heuristics, Accelerated Barrier, and Portfolio Optimization. By watching this video, viewers can learn how to design and implement large scale optimization systems using GPU acceleration.

Key Takeaways

Install NVIDIA cuOpt and Metaflow
Design optimization workflows using cuOpt
Implement GPU acceleration using Metaflow
Test and optimize workflows
Deploy optimized workflows

💡 GPU acceleration can significantly improve the performance of large scale optimization workflows, and NVIDIA cuOpt and Metaflow provide a powerful toolkit for designing and implementing these workflows.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Systems Design Basics

View skill →

Complete Application Deployment using Kubernetes Components | Kubernetes Tutorial 20

Complete Application Deployment using Kubernetes Components | Kubernetes Tutorial 20

TechWorld with Nana

How to write a Windows emulator for Linux from scratch

How to write a Windows emulator for Linux from scratch

Google for Developers

Deploying an ecommerce web app to GKE

Deploying an ecommerce web app to GKE

BUILDING AN 8-BIT COMPUTER FROM SCRATCH #4 (Full Stream)

BUILDING AN 8-BIT COMPUTER FROM SCRATCH #4 (Full Stream)

Getting started with Caddy the HTTPS Web Server from scratch

Getting started with Caddy the HTTPS Web Server from scratch

Build & Optimize React Native Product Listing Apps

Build & Optimize React Native Product Listing Apps

Related AI Lessons

Monolith vs Microservices: A Real-World Architectural Autopsy

Learn to decide between monolith and microservices architectures for your project and why it matters for scalability and maintainability

Dev.to · Erwin Wilson Ceniza2

FOV in FPS Games: The Math Behind Field of View Settings

Learn the math behind Field of View settings in FPS games and how to optimize your gameplay experience

Dev.to · Alex Carter

How I Structured My Next.js 14 App Router Project — And Why It Scales

Learn how to structure a scalable Next.js 14 App Router project for better organization and maintainability

Dev.to · Mbanefo Emmanuel Ifechukwu

Let’s write a simple Lexer in Go

Learn to build a simple lexer in Go to understand source code tokenization

Medium · Programming

Retracing It All With My Son