Devin Robison - Optimizing model performance | JupyterCon 2020

JupyterCon · Intermediate ·🏭 MLOps & LLMOps ·5y ago

Skills: ML Maths Basics90%Supervised Learning80%ML Pipelines80%Fine-tuning LLMs60%

Key Takeaways

The video demonstrates optimizing model performance using GPU-accelerated libraries such as RAPIDS, Optuna, and xfeat, and managing the machine learning lifecycle with MLflow. It showcases a Jupyter Notebook walkthrough of these tools and techniques to improve model performance.

Full Transcript

hey guys welcome to this talk on optimizing model performance with future engineering and hyper parameter optimization i'm nanthani and i have devin with me we work for nvidia on the rapids team and we'll be talking about how you can use rapid framework and a few other gpu accelerator libraries to get performance boost in your models and this is the outline of the talk where we will talk about future engineering we'll talk about hyper parameter optimization and we'll talk about the libraries uh and we'll walk through a demo notebook to see how these can be used together to get a performance improvement so you might be wondering why there's a need to optimize performance you've built a model and it gives you some accuracy and that's it right like that's not the end of a data science workflow and anybody who's worked in the field knows that there is more to it than just building a model you have to think about are the features relevant uh have you used it optimally are there anything that you can do to improve the model performance more often than not the default performance of the estimator is not going to give you the best performance so you have to think about ways in which you can better the performance itself and feature engineering is one such process to enable a better performing model a feature is a property of the data said that everything following that is based on the machine learning process is based on the features of the model of the data set sorry and we think about we need to think about how we can best represent these features so that we get a good performance going forward and there are different ways in which we can engineer features one of them is transforming this is generally used for numerical data encoding is generally used for uh categorical data with label and target encoding and processes like that and combining and splitting features is used for uh date time features where you can split or combine different columns into one to get a better representation for the data and this process can boost your model performance to a good extent and it uses domain knowledge as well so that's also useful now that we've generated a bunch of features how do we know that they're meaningful it's possible that we've generated gibberish so we need to uh pass through that and that's what feature selection does for us it basically evaluates uh how best each of the features represent the target variable and how useful they are in predicting that and uh so one of the popular methods is chi-square test which basically tests for the independence of two events and in case of future selection we want a feature that is highly dependent on the target variable and this is what we'll be using in the demo notebook the second process is hyper parameter optimization which is used on the other end of the machine learning spectrum which is after you've constructed the model and you want to optimize the performance by tuning its parameters and this uses a process of cross validation after it chooses a bunch of parameters and we use cross validation to overcome overfitting and uh you can think about this in terms of for random forest you have maximum depth we have n estimators and many such parameters so we want to tune these in order to get a good performance and these can be used to complement a model to a good extent and can improve the performance and accuracy to a great degree because the default parameters generally don't work very well with real world data and how can we tune now that we know that we tuning does yield good performance uh improvements how do we do that there are two options the first one is to manually tune everything this basically implies you have to enter in the different parameters for different options and this can get very uh tedious very quickly because the number of parameters increases especially for a classifier like random forest and so we have the other option which is to automate this and this is where the library or tuner that we'll be looking at comes into play it looks at how we can have the we can have it programmatically select new parameters and evaluate the performance in order to give better results it discuss parameters that are not very useful so as we've discussed there are a lot of problems with applying feature engineering and hyper parameter optimization and some of them are it's time consuming it requires a lot of uh like thought and it requires uh even resources it requires a lot of uh time and uh even money sometimes the performance improvement is just not worth the wait and the money that you spend and this is where the gpu accelerated libraries will come into play let's look at the first library that we'll be discussing that's rapids and uh graphics is basically a data science pipeline that aims to accelerate the entire workflow on the gpu you load the data from disk and then you forget about it until it comes back with the results and this can yield huge performance improvements as we'll be seeing in the next slide and some of the libraries that are available in brackets is 4ds which is similar to pandas and qml which is similar to scikit-learn and we have cool signal and many such libraries and the biggest uh like goal for us is to maintain the same apis as pandas and circuit learn so that the adapting from cpu pipelines to gpu wavelengths is very simple and these are some of the performance improvements that we're able to achieve as you can tell that the gpu performance is much much faster than cpu and this is extremely useful especially when you are working with huge data and these can give you results within 20 minutes instead of you having to wait for three or four days and this is how you use rapids so here you see a simple scikit-learn workflow which is basically fitting a logistic regression model the way you use rapids is just change the input statements all you have to do is change uh the sklearn input to qml input and you're set to execute the same uh workflow and this is this is why we want to maintain the same interfaces that cycle learn has so that all you have to do is change a few lines of code and you're set now devin will be talking about ml flow and how this is using the notebook all right so what is mlflow who should care and why should we want to use it the ml flow project describes itself as an open source platform to manage the ml lifecycle including experimentation reproducibility deployment and a central model registry okay so far so good all this sounds like something we probably want but how does it work and what is required to integrate these features with our own projects um mlflow consists of the mlflow binary and a set of api libraries which create an overall ecosystem that supports generalized machine learning workflows both for teams and individuals for our purposes these components can be described as follows first mlflow lets you define a consistent project structure this utilizes a central configuration file called ml project which is placed in the base directory of your project it lets an end user define the environment in which their project can be executed such as condo requirements in the form of a yaml file docker container specifications and more recently extends to docker deployments within kubernetes or rather deploying docker containers and kubernetes it also allows for defining labeled entry points and associated launch variables with our project which lets us package everything up and execute training processes in a nice fashion these tools provide a standardized training and deployment workflow which can make it significantly easier to maintain consistency within a team or if you're working at an individual level it can help streamline your own development process and how you distribute your work next we have the tracking framework this lets you store all sorts of information related to the model and its training i'll discuss this a bit more in detail in the next few slides but in general this includes the ammo flow binary and api elements which allow for recording experiments configurations hyper parameters data and other important aspects of your project that you'd like to keep track of and kind of package up with the with each experiment it relies on two primary components the back-end store which is a database for storing metrics and parameters of an experiment an artifact store which is used for larger data objects such as files folders and registry models the last piece that we're going to look at is the model registry this provide tools provides tools for storing and annotating deep and machine learning models it lets you assign tags or labels to these models search them later on and deploy them for inference next slide please i'll focus on a subset of the total functionality for the purposes of this demo but if you want to know more you should check out mlfo.org they have a lot there and the documentation is pretty solid mo flow provides apis uh for python r and java as well as their own ml flow tool and the ability to publish to target rest endpoints in general we can run from any platform whether it's a local workstation a cloud notebook or something like a databricks instance we just need to be sure we've configured ml flow to use the correct experiment label and that we've pointed it at the proper back end and artifact stores the api layer provides interfaces for logging general key value metrics configuration data such as hyper parameters inputs and training results these are stored in a predefined backend such as a relational database or a local file system something to be aware of though is that if you want to use the features of the model register you will need to use a database backend it also allows for storing files and directories referred to as artifacts artifacts can be written to the local file system or to more long-term storage such as s3 gcs hdfs and ftp server and various others and finally it supports saving registering and reloading supported models from frameworks such as sklearn conveniently this also works for api compatible networks frameworks such as rapids and so we can lev we can leverage the sklearn interface for that purpose on the development side mlflow provides an excellent tool set for tracking progress recording the development cycle and having the ability to reproduce previous results uh in the future on the production side mlflow provides a standardized workflow for transitioning models to a production state as well as tools to quickly deploy models as self-contained rest endpoints docker containers and kubernetes deployments next slide please mo flow wraps each of the main iterative elements of the development cycle such as training validation and model updates into the context of an experiment it's visualized here it also allows for nested experiments which is a feature that i find quite valuable for a variety of applications one of those that will illustrate next where you have the model with a number of tunable parameters each of which you want to explore using some hpo framework such as optuna and you want to keep a record of each combination that's been explored next slide please for a given run of an experiment this process is very straightforward first we start the run we log related metrics and parameters and then possibly save a an instance of our model and register it for use later uh from the python based illustration here we can see that this process doesn't require much time in terms of code changes it's roughly what we might expect for adding logging to our our training process here we scope our run using a with statement we log a model parameter learning rate a runtime parameter python version and then we record the reported accuracy of the model finally we log our model itself using the mlflow sklearn interface next slide please next we have a slightly more robust example which is part of the code that's used for our demo in this case we have code that sets the tracking uri and the experiment name before we initiate our run note that this is going to be a nested run with the top level id being optuna-hpo concatenated with our study name here we set a callback function mlfcb which is passed to the optuna study.optimize call this is going to be responsible for actually interfacing with ml flow during each of the hpo runs and logging the associated parameters and then finally we have a nested uh start run call which logs one additional run which we label as final classifier and what this does is it stores the rapids model that's associated with the best run from the hbo tuning process and registers it with the rapids-optuna-airline tag additionally we store a conda.yaml file it's associated with this the environment this model was trained in so that later on mo flow will know how to either create a local conda environment that can run this model if we want to deploy it or something like that so what is the end result after we run this let's see so this is our next slide here we have the results of the hpo run notice that this is indeed a nested run where each of the hpo runs which are bracketed in red are logged under the top level experiment instance and we have one additional run uh for logging the the model artifact associated with our our best run uh labeled as final classifier note that each hpo trial contains the number of runtime parameters associated with it an accuracy metric which indicates how performant the model was that came from those hpo parameters and a variety of searchable tags such as start and completion dates next slide please clicking on the final classifier entry we're able to drill down into the specifics of that run and see the artifacts that are logged with it and the the associated files and their location on disk if we had been using something like s3 or a gcs endpoint that would be reflected here so we'd know how to get to it and then we move on and we can take that information in our stored model and we can actually deploy it next slide please the final step in this this development cycle is to deploy our model as an inference service one way we can achieve this is with ml flow serving interface which we can see run in the console screenshot above here we tell mo flow to use an sqlite database that's located in slash temp and we want it to serve version one of the model that we registered previously with the tag rapid stash uptuna dash airline mlflow then goes out creates a rest endpoint that will serve our train model and you can see it running here on the local system and then if we look at the simple example of the python query code that's going to reach out to this rest service and run an example inference using only the the request library we can see that when it comes back it indicates that our example flight is going to be late and that's about as simple as it gets next slide please all right so to recap mlflow is good you should use it in your workflows uh we've lost over a number of details here but this should be a good starting point to motivate ml flow and how it could be integrated into your environment um i'm going to give it back to anthony now let's uh now look at uh two other libraries optina and xp and how these can be used to uh do hyperparameter optimization and future engineering optina provides a lot of sampling algorithms and it's a hyper parameter optimization library so it's uh useful in automating our hpo runs and here we see a simple example of what an optimal workflow would look like you just have to define an objective function that returns either your score or your loss and it maximizes or minimizes depending on what you've passed and you have to specify that with create study as well and we see here that this this is going to run for a tr 100 trials and a trial refers to one simple run of the objective function and a study is a bunch of these styles grouped together and optima is extremely easy to parallelize they provide a lot of options to parallelize with them and parallelizing hpo runs is uh i think hp is one of those embarrassingly parallel you can have so uh it's uh it also provides visualization libraries in order to visualize the results of the hpo runs that we've had and this is how you integrate rapids with opuna where you include the logistic regression with predicting the fitting and predicting and it returns the accuracy score as you can see here we're maximizing the accuracy score and we select the parameters and we run the workflow that you would normally if with just a few changes you'll be able to integrate optima with any existing workflow and you can do this with cpu workflows as well xspeed is a feature engineering library that provides a lot of functions like ultimate combinations encoding to engineer your features and it also provides feature selection capabilities with the same uh with a good integration with optuna as well and it provides the chi squared test that we were talking about and this is what we'll be using in the notebook and this is a simple way to integrate rapids with xspeed you can see here that we have a numerical encoding a numerical physical engineering task where r specifies the number of columns that we want to add and the operator specifies the arithmetic operation we want to perform and yes we want to make sure that we exclude the target column here because we don't want to do anything uh any pre-processing on the target column just because that's going to be the best uh deterministic feature for like the target itself so we want to exclude that and this is what we're using in the code as well in the notebook and let's jump into the notebook and see what how these libraries can be used together so this is a notebook that you can find in the repository these are the libraries just have to uncomment and run it to install and we have a bunch of imports timing functions and data acquisition where you set this to true and set the path to your local path and you'll be able to download the file automatically and this is the ml fuel configuration that uh devin was talking about so he can explain a little bit further and sure so what we have here is just a little bit of setup that that helps us kind of automate this process first this helper function that's going to use the ml flow tracking client which talks to the dml flow registry and all it's really doing is it it allows us to look up the most recent version of a model we've created and return a string describing how we can use ml flow serving to launch that model the second element which is a class um it's a little bit more complicated but essentially all we're doing here is we take optuna's native ml flow callback that we pass to the the study optimize run and we've just augmented it a little bit so that we're able to support nested runs and add a bit of additional information in terms of logging and inversioning to each trial run so other than that it's it's the same as you'd get from the standard optuna ml flow callback but it's also kind of a nice example of how you can augment that for your own purposes as well okay now that we have ml fuel setup let's look at what feature engineering function we have this is what we currently use is uh qms label and target encoders for label encoding and target encoding and we ensure that we're not encoding on the entire data set and rather we have a train and test set so that it can generalize as well and for the arithmetic facial engineering task we have x feet uh with select numerical and many combinations and note that we are also excluding the column with uh arithmetic combinations in this case so we have this setup that is a featured engineering pipeline now that we have that uh how do we evaluate the model so train an eval function is a typical data science workflow where you have the train and test data frames and a bunch of parameters for the logistic regression model that we're going to be using it also takes in selector which is the feature selection algorithm that optina passes in and when the feature selected is set it transforms the data by selecting a bunch of features which should be used for the training purpose and if it's not set it's basically selecting uh the appropriate columns to perform training and evaluation here we're using our auc score so and return model here will is basically for the ml flow callback where it accepts the classifier and it accepts the signature of the data and we have the objective function here which takes train and test data frames with selector and trial uh this is the typical optina workflow that we saw in the slides as well where we select the parameters and pass it to the train and eval function with the selector and the selected is defined here with chi squared and k best so these two in conjunction basically they select the k highest scoring features and we can use that subset to train and evaluate the performance this is the ml flow setup and this is the optimized call this is where the magic happens where we're passing the objective function with data frames along with selector and ml flow callback once it's optimized we want to get the selected columns from the selector and also the best performing parameters with these two we evaluate the performance for and get the performance for logging to the ml flow so that we can get it back later on to uh in the production stage so we have all of this setup let's go and run the model here is just the experiment variables the number of rows you want to select how many trials you want to run all these metadata that we have so before we uh do any of the feature engineering let's see what the performance is like and we notice that it's 0.5 and these are the columns that we have which is uh 12 columns excluding the label and with feature engineering we're going to see what the performance improvement is the other thing to note here is we have we are casting the categorical columns back to object because logistic regression does not accept object columns and in the feature engineering step because of label encoding these become numerical columns so let's see what the performance is going to be like we see an improvement of two percent here and we have we also have extra columns the plus refers to the arithmetic operations and the pe refers to the target encoding columns that we have now that we have that let's see how we can select the best performance from it so you can see that it starts with 0.56 and it's probably improving to 5 8 and 6. while it does that let's just take a quick look at the cpu notebook this is also available in the repository for uh comparison and this is the notebook that we used for running getting the numbers that we'll be discussing in the next uh in the next few minutes so it's the same code that basically uses pandas version instead of uml and uh pdf the only main difference is uh the encoder which will use label and target encoder directly coming in from x speed instead of co ml label and target encoders so this is the entire notebook again this is available on the repository so feel free to play around with this as well let's go back and have a look at what the best performance was we see that the best performance was 0.6 and we'll see how much of an improvement there has been so this is basically the optima trial a study object that we're gonna from the study object that we got let's visualize these results and see if we've actually done anything useful or if we just ran for a long time as you can see here this is the hyperparameter importance graph and k seems to be the highest of the highest importance because it makes sense as that determines the number of features that we're going to select the from among the parameters it seems like penalty and l1 ratio have the highest impact although it's comparable to the other two but penalty seems to have a good impact on the performance let's look at slice plot this will tell you how how it changes as we change the different parameters and as you can see there are like some low performing c values and there are some high performing which is why it's kind of not impacting the performance too much and finally we have the history trial which basically shows how it improved the performance as we can see we started around like 0.56 and it went all the way to like above 0.6 and this is not a great improvement again you can get better improvements if you have more data and you run it for longer trials but for the purposes of this demo and the time constraints i've just decided to go with this so feel free to run it for more trials and more data now let's look at how we can get the model back from ml flow so it tells you what uh command you have to run on the command line so in the same folder that you're having the notebook execute the command and you should get this setup which is similar to what it says here so now in this this is a query uh code where it basically has a bunch of data and we're going to see whether this slide is going to be late or on time and it's going to be late so we this is a simple example of how we can get a model back that we saved through ml flow and we can have that kind of predict on new data so let's go back to the presentation these are the numbers that we got as we see that there's a five times improvement from the gpus uh with cpu for 100 kilos as you can see it's still uh the for 1 million gpu still faster than cpu 400k and as you go further the improvements are going to be more pronounced and we are encouraged to try for higher larger data sets and that's that's what we have for you today uh hopefully this was good in introducing you to how you can optimize the performance and maybe even get you started on optimizing the performance and a good place to look at is the rapids ai cloud ml examples repository where we have a bunch of these different frameworks with different cloud providers to see how best you can use it with all of these different uh options that we talked about and you're encouraged to check out these to get started and feel free to connect with us on github or twitter vera traffic's ai uh hope you enjoy the talk thank you so much

Original Description

Brief Summary Optimizing performance of a machine learning model can be a labor-intensive process. It is often overlooked in real-life applications. In this talk, we'll see a Jupyter Notebook walkthrough of GPU-accelerated libraries - RAPIDS, Optuna and xfeat as a potential solution to address some of the constraints of Feature Engineering and Hyperparameter Optimizations, and use MLflow for experiment tracking Outline This talk will walk through a demo Jupyter notebook on how we can use RAPIDS, Optuna, xfeat, and MLflow to illustrate the use of feature engineering and hyperparameter optimisation on a classification problem, in conjunction with experiment tracking and eventual production deployment. Feature Engineering is a process to transform raw data into features that can represent the underlying patterns of the data better. Hyperparameter optimization is a process that can complement a good model by tuning its parameters. These can significantly boost a model's accuracy. RAPIDS framework provides a suite of libraries that can execute end-to-end data science pipelines entirely on GPUs. Optuna is a lightweight framework for automatic hyperparameter optimization, and xfeat is a feature engineering and exploration library using GPUs and Optuna. MLflow is a framework for tracking experiment state, ensuring reproducibility, and model storage / deployment. We’ll utilize xfeat for performing feature engineering operations to add more features to the dataset using Numerical and Categorical encoding strategies - like arithmetic combinations, target encoding, etc., cuML, a library in RAPIDS, has a set of Machine Learning models that are GPU-accelerated. Optuna will be used to select the most pertinent features among the original and the newly added features, along with the hyper parameters for the model we use. Lastly, MLflow will be used to record the entire process, and publish the final model as a REST service. Using the combination of the libraries, we will be abl

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from JupyterCon · JupyterCon · 42 of 60

← Previous Next →

Interview Joshua Patterson NVIDIA

Interview Joshua Patterson NVIDIA

Dave Stuart - Jupyter as an Enterprise “Do It Yourself” (DIY) Analytic Platform | JupyterCon 2020

Dave Stuart - Jupyter as an Enterprise “Do It Yourself” (DIY) Analytic Platform | JupyterCon 2020

Jeffrey Mew - Supercharge your Data Science workflow | JupyterCon 2020

Jeffrey Mew - Supercharge your Data Science workflow | JupyterCon 2020

Michelle Ufford- Supercharging SQL Users with Jupyter Notebooks | JupyterCon 2020

Michelle Ufford- Supercharging SQL Users with Jupyter Notebooks | JupyterCon 2020

Alan Yu - What we learned from introducing Jupyter Notebooks to the SQL community | JupyterCon 2020

Alan Yu - What we learned from introducing Jupyter Notebooks to the SQL community | JupyterCon 2020

Chris Holdgraf- 2i2c: sustaining open source through hosted Jupyter infrastructure | JupyterCon 2020

Chris Holdgraf- 2i2c: sustaining open source through hosted Jupyter infrastructure | JupyterCon 2020

Yiwen Li - Intro to Elyra - an AI centric extension for JupyterLab | JupyterCon 2020

Yiwen Li - Intro to Elyra - an AI centric extension for JupyterLab | JupyterCon 2020

Luciano Resende - What's new on Elyra - A set of AI centric JupyterLab extensions | JupyterCon 2020

Luciano Resende - What's new on Elyra - A set of AI centric JupyterLab extensions | JupyterCon 2020

Alan Chin - Explore and Extend AI Pipeline Runtimes with Elyra and JupyterLab | JupyterCon 2020

Alan Chin - Explore and Extend AI Pipeline Runtimes with Elyra and JupyterLab | JupyterCon 2020

Eduardo Blancas- Streamline your Data Science projects with Ploomber | JupyterCon 2020

Eduardo Blancas- Streamline your Data Science projects with Ploomber | JupyterCon 2020

Thorin Tabor - Democratizing the accessibility of computational workflows | JupyterCon 2020

Thorin Tabor - Democratizing the accessibility of computational workflows | JupyterCon 2020

Simon Willison- Using Datasette with Jupyter to publish your data | JupyterCon 2020

Simon Willison- Using Datasette with Jupyter to publish your data | JupyterCon 2020

Brendan O'Brien - Using Qri (“query”) to fetch, query, combine and publish datasets.|JupyterCon 2020

Brendan O'Brien - Using Qri (“query”) to fetch, query, combine and publish datasets.|JupyterCon 2020

Georgiana Dolocan - Putting the JupyterHub puzzle pieces together | JupyterCon 2020

Georgiana Dolocan - Putting the JupyterHub puzzle pieces together | JupyterCon 2020

Yuvi Panda- Running nonjupyter applications on JupyterHub with jupyter-server-proxy| JupyterCon 2020

Yuvi Panda- Running nonjupyter applications on JupyterHub with jupyter-server-proxy| JupyterCon 2020

Richard Wagner- The Streetwise Guide to JupyterHub Security | JupyterCon 2020

Richard Wagner- The Streetwise Guide to JupyterHub Security | JupyterCon 2020

TamNguyen- Handling Custom Jupyter Data Sources | JupyterCon 2020

TamNguyen- Handling Custom Jupyter Data Sources | JupyterCon 2020

Immanuel Bayer- ipyannotator - the infinitely hackable annotation framework | JupyterCon 2020

Immanuel Bayer- ipyannotator - the infinitely hackable annotation framework | JupyterCon 2020

Rebecca Kelly- A shared Python, R and Q Jupyter Notebook - A Quant Sandbox Dream |JupyterCon 2020

Rebecca Kelly- A shared Python, R and Q Jupyter Notebook - A Quant Sandbox Dream |JupyterCon 2020

Itay Dafna - Leap of faith: Transitioning from Excel to Jupyter-based applications | JupyterCon 2020

Itay Dafna - Leap of faith: Transitioning from Excel to Jupyter-based applications | JupyterCon 2020

Damián Avila - Using the Jupyterverse to power MADS | JupyterCon 2020

Damián Avila - Using the Jupyterverse to power MADS | JupyterCon 2020

Chiin Rui Tan- From Zero to Hero | JupyterCon 2020

Chiin Rui Tan- From Zero to Hero | JupyterCon 2020

Firas Moosvi- Teaching an Active Learning class with Jupyter Book| JupyterCon 2020

Firas Moosvi- Teaching an Active Learning class with Jupyter Book| JupyterCon 2020

Daniel Mietchen- Jupyter in the Wikimedia ecosystem | JupyterCon 2020

Daniel Mietchen- Jupyter in the Wikimedia ecosystem | JupyterCon 2020

Qiusheng Wu- How Jupyter and geemap enable interactive mapping and analysis | JupyterCon 2020

Qiusheng Wu- How Jupyter and geemap enable interactive mapping and analysis | JupyterCon 2020

Stephanie Juneau- Jupyterenabled astrophysical analysis for researchers and students|JupyterCon 2020

Stephanie Juneau- Jupyterenabled astrophysical analysis for researchers and students|JupyterCon 2020

Denton Gentry- The Care and Feeding of JupyterHub for Climate Solution Models| JupyterCon 2020

Denton Gentry- The Care and Feeding of JupyterHub for Climate Solution Models| JupyterCon 2020

Tingkai Liu- FlyBrainLab: Interactive Computing in the Connectomic/Synaptomic Era | JupyterCon 2020

Tingkai Liu- FlyBrainLab: Interactive Computing in the Connectomic/Synaptomic Era | JupyterCon 2020

Kunal Bhalla- A Notebook Style Guide| JupyterCon 2020

Kunal Bhalla- A Notebook Style Guide| JupyterCon 2020

Julia Wagemann - How to avoid 'Death by Jupyter Notebooks' | JupyterCon 2020

Julia Wagemann - How to avoid 'Death by Jupyter Notebooks' | JupyterCon 2020

David Pugh - Best practices for managing Jupyter-based data science | JupyterCon 2020

David Pugh - Best practices for managing Jupyter-based data science | JupyterCon 2020

Karla Spuldaro - Debugging notebooks and python scripts in JupyterLab | JupyterCon 2020

Karla Spuldaro - Debugging notebooks and python scripts in JupyterLab | JupyterCon 2020

Shreyas Dalia - assert browserTest == True # Frontend Testing JupyterLab | JupyterCon 2020

Shreyas Dalia - assert browserTest == True # Frontend Testing JupyterLab | JupyterCon 2020

Chris Holdgraf - The new Jupyter Book stack | JupyterCon 2020

Chris Holdgraf - The new Jupyter Book stack | JupyterCon 2020

Hamel Husain - Fastpages - A new, open source Jupyter notebook blogging system | JupyterCon 2020

Hamel Husain - Fastpages - A new, open source Jupyter notebook blogging system | JupyterCon 2020

Marc Wouts - Jupytext: Jupyter Notebooks as Markdown Documents | JupyterCon 2020

Marc Wouts - Jupytext: Jupyter Notebooks as Markdown Documents | JupyterCon 2020

Sheeba Samuel- ProvBook |JupyterCon 2020

Sheeba Samuel- ProvBook |JupyterCon 2020

Philipp Rudiger - To Jupyter and back again | JupyterCon 2020

Philipp Rudiger - To Jupyter and back again | JupyterCon 2020

Jacob Tomlinson - What is my GPU doing? | JupyterCon 2020

Jacob Tomlinson - What is my GPU doing? | JupyterCon 2020

Afshin Darian - A visual debugger in Jupyter | JupyterCon 2020

Afshin Darian - A visual debugger in Jupyter | JupyterCon 2020

Eric Charles - Jupyter Real Time Collaboration| JupyterCon 2020

Eric Charles - Jupyter Real Time Collaboration| JupyterCon 2020

Devin Robison - Optimizing model performance | JupyterCon 2020

Devin Robison - Optimizing model performance | JupyterCon 2020

Junhua zhao - PayPal Notebooks: ML & Data Science experience | JupyterCon 2020

Junhua zhao - PayPal Notebooks: ML & Data Science experience | JupyterCon 2020

April Wang - Redesigning Notebooks for Better Collaboration | JupyterCon 2020

April Wang - Redesigning Notebooks for Better Collaboration | JupyterCon 2020

Bryan Weber - Distributing and Collecting Jupyter Notebooks for Manual Grading| JupyterCon 2020

Bryan Weber - Distributing and Collecting Jupyter Notebooks for Manual Grading| JupyterCon 2020

Georgiana Dolocan - The Littlest JupyterHub distribution | JupyterCon 2020

Georgiana Dolocan - The Littlest JupyterHub distribution | JupyterCon 2020

Tim Metzler - Electronic Examination using Jupyter Notebook | JupyterCon 2020

Tim Metzler - Electronic Examination using Jupyter Notebook | JupyterCon 2020

Blaine Mooers - Why develop a snippet library for Jupyter in your subject domain? | JupyterCon 2020

Blaine Mooers - Why develop a snippet library for Jupyter in your subject domain? | JupyterCon 2020

Ryan Abernathey - Cloud Native Repositories for Big Scientific Data | JupyterCon 2020

Ryan Abernathey - Cloud Native Repositories for Big Scientific Data | JupyterCon 2020

Tanya Rai - Introducing Bento: Jupyter Notebooks @ Facebook | JupyterCon 2020

Tanya Rai - Introducing Bento: Jupyter Notebooks @ Facebook | JupyterCon 2020

Kenton McHenry - From Papers to Notebooks | JupyterCon 2020

Kenton McHenry - From Papers to Notebooks | JupyterCon 2020

Ryan Herr - After model.fit, before you deploy| JupyterCon 2020

Ryan Herr - After model.fit, before you deploy| JupyterCon 2020

Ana Ruvalcaba - Community building is a sustainability strategy | JupyterCon 2020

Ana Ruvalcaba - Community building is a sustainability strategy | JupyterCon 2020

Martin Renou - Xeus: an ecosystem of Jupyter kernels | JupyterCon 2020

Martin Renou - Xeus: an ecosystem of Jupyter kernels | JupyterCon 2020

Michael Wilson - Teaching teenagers to understand Dark Energy | JupyterCon 2020

Michael Wilson - Teaching teenagers to understand Dark Energy | JupyterCon 2020

Davide De Marchi - Voilà dashboards for policy support | JupyterCon 2020

Davide De Marchi - Voilà dashboards for policy support | JupyterCon 2020

Marcos Lopez Caniego - ESASky's JupyterLab widget| JupyterCon 2020

Marcos Lopez Caniego - ESASky's JupyterLab widget| JupyterCon 2020

Praveen Kanamarlapud - Kernel Life Cycle Management | JupyterCon 2020

Praveen Kanamarlapud - Kernel Life Cycle Management | JupyterCon 2020

Aaron Bray - Pulse Physiology Engine | JupyterCon 2020

Aaron Bray - Pulse Physiology Engine | JupyterCon 2020

Aaron Watters - Using WebGL2 transform/feedback in Jupyter widgets | JupyterCon 2020

Aaron Watters - Using WebGL2 transform/feedback in Jupyter widgets | JupyterCon 2020

This video teaches how to optimize model performance using GPU-accelerated libraries and manage the machine learning lifecycle with MLflow. It provides a hands-on walkthrough of using RAPIDS, Optuna, and xfeat to improve model performance.

Key Takeaways

Use RAPIDS for GPU-accelerated computing
Implement hyperparameter optimization with Optuna
Perform feature engineering with xfeat
Manage machine learning lifecycle with MLflow
Deploy models using MLflow

💡 GPU-accelerated libraries can significantly improve model performance, and managing the machine learning lifecycle with MLflow can streamline the model development and deployment process.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Maths Basics

View skill →

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Data Science Fundamentals: Data Cleaning in Python

Data Science Fundamentals: Data Cleaning in Python

Related AI Lessons

DevOps Took 10 Years to Mature.

MLOps is distinct from DevOps and solves unique problems, requiring a different approach

Medium · DevOps

Praesto: A Kubernetes Operator for Node-Local ML Model Caching with CSI

Learn how Praesto, a Kubernetes Operator, optimizes ML model caching for Node-Local storage with CSI, reducing costs and improving performance

Medium · DevOps

Beyond `ollama run`: Production-Ready DeepSeek R1 Deployment with vLLM and Nginx

Learn to deploy DeepSeek R1 with vLLM and Nginx for production-ready environments, moving beyond local development

Dev.to · Shannon Dias

MCP Health Check: Building Production Monitoring for Your MCP Server — What I Learned After 84 Production Outages

Learn to build production monitoring for your MCP server to minimize outages and ensure smooth operation

Pole Pruner How A Rope Lever Shears High Branches

Innoforge Studio