Accelerating model deployment with MLOps
Key Takeaways
The video discusses accelerating model deployment with MLOps using Google Cloud's Vertex AI platform, highlighting its capabilities in automating the end-to-end process of extracting and preparing data, training, and deploying models. It also covers model monitoring, data drift detection, and artifact management.
Full Transcript
hi everyone I'm Alina osipova a customer engineer and I'm once young also a customer engineer and we both work here at Google Cloud welcome to the technical series for startups where we are creating a series of videos for technical enablement to help startups to start build and grow their businesses successfully and sustainably on Google cloud in our previous video we learned how you can innovate with machine learning Solutions such as contact center Ai and document AI they are pre-packaged Solutions designed for common use cases companies can start using them without relying on AI experts however if you start implementing custom Solutions with AI experts in your team you will start to face new challenges and today we are going to Deep dive a little bit more into how you can Implement machine learning Ops on Google Cloud we'll start by looking at some of the challenges that startups face when it comes to their machine learning workloads following that we will introduce the concept of ml Ops and how ml Ops can be achieved on Google Cloud we'll then walk through a demo of ml Ops with vertex AI in the Google Cloud console and finally we will learn about one of our customers and their ml Ops Journey let's explore common challenges our customers are facing when dealing with ML workloads let's imagine that we are walking with an advanced startup company with AI and ml technical skills employees have experience with Frameworks tensorflow or pytorch and even were able to deploy their models on one of our compute options that we covered in the previous sessions Cloud run or Google kubernetes engine the common challenges this company could potentially face include models don't make into production High degree of manual and man of war no reusable or reproducible components and many others further we are going to explore how vertex AI pipelines can help our customers to overcome those challenges why are ml workloads becoming so challenging to handle over time is it all because of the complexity of the code in reality only a small fraction of real world ml system is composed of ml code as shown by the small blue box in the middle the required surroundings infrastructure is vast and complex given all the challenges that comes with managing machine learning workloads how can we best address them and help eliminate the management overhead that comes with this machine learning workloads here we want to introduce the concept of ml Ops but what exactly is mlops just by looking at the name we can guess that ml Ops is related to devops to a certain extent we can Define it as an ml engineering culture and practice that aims at unifying ml system development and ml system operations in other words ml Ops aims to prepare infrastructure to streamline the production process of ml systems such as training testing deployment and monitoring most importantly it is to ensure that the process is consistent scalable and repeatable within ml Ops there are multiple stages involved and it often goes in this sequence first you start with your ml model development once you have decided on your model you will have to operationalize your training by building a training pipeline this includes Gathering training data running your training jobs and evaluating the performance of your model when your pipeline is up and running you can conduct continuous training Whenever there is fresh data or based on any custom metric that you set The Next Step will be to deploy a model to a serving environment and prepare your model to solve predictions SM model is in production and serving clients in real time you will need to continuously monitor your model for any data drift or performance degradation so that you can intervene when needed finally at the call is data and Model Management which is essential to governing ml artifacts to ensure auditability traceability and compliance at the same time this can also be used to promote shareability reusability and discoverability of ml assets to get started with ML Ops on Google Cloud we have vertex AI Vortex AI is Google Cloud's unified machine learning platform that allows you to create deploy and manage models over time and at skill it is built on top of Google's robust and secure Foundation and provides flexibility for users of all ml expertise whether you are new to the domain or an ml expert imagine that we have done our exploratory analysis and build out our model in our Jupiter notebook we are now ready to operationalize and productionalize our machine learning workflow so what can we do to automate the process of extracting and preparing data as well as training and deploying our model to do so we have vertex AI pipelines that can help you automate the end-to-end process allowing for repeatability and scalability when it comes to building your machine learning models with pipelines you can easily retrain your models at regular schedules based on new data that comes in without having to worry about infrastructure or management overhead if you are interested in learning more about building your own pipelines with vertex AI pipelines do check out the link in the description box for a more detailed video under the AI simplified playlist before you can start serving predictions to your end users you will have to deploy your trim model to an endpoint for your users to access for this we have vertex AI prediction that can handle both online endpoints as well as batch prediction requests with vertex AI prediction you can deploy model stream with either Auto ml or custom code to serve online and batch prediction requests depending on your use case [Music] as vertex AI prediction skills automatically based on traffic it extracts away any form of infrastructure overhead from the machine learning Developers on top of that you will be able to split traffic between models and also customize the endpoint machine types based on your model needs now let's move to the managing and tracking the model quality what is model monitoring and what challenges does it help to solve model monitoring helps data scientists and machine learning engineer to answer the question why does a deployed model perform in a certain way modern applications rely on well-established set of capabilities to monitor the house of their services examples include software versioning rigorous deployment processes event login alert and notification of situation requiring intervention on-demand and automated diagnostic tracing automated performance and functional testing vertex CI provides a host of products to Monitor and govern your models all of which helps drive both successful and responsible AI deployment those Services include monitor signals for model predictive performance and alert when those signals deviate diagnose to identify the cause of deviation model updating to trigger model retraining pipeline integration with feature store model monitoring is only one piece of the envelopes puzzle I would like to explain a bit more about train and serving skill and data drift detection it helps to answer the following questions why does the model perform differently from the training time in the production environment this is called training service Q This queue can be caused by a discrepancy between how you handle data in the training and serving pipelines a change in the data between when you train and when you serve and many other reasons among them data drift is one of the common causes of the skew data drift is defined as variation in the production data from the data that was used to test and validate the model before deploying it into production so you have to monitor how significantly service requests are evolving over time this is called Drift detection so now how do you manage and govern your ml models here are some products that you might find helpful first up we have a feature store that allows you to share and reuse ml feature across use cases allowing you to serve ml features at scale with low latency on top of that Google Cloud also offers ml metadata which enables you to automatically track inputs and outputs to all your components with this you will be able to visualize analyze and compare detailed ml lineage lastly model registry is a central repository where you can manage the life cycle of your ml models from the registry you can have an overview of your models so you can better organize track and train new versions of your model when you have model versions that you would like to deploy you can simply assign it to an endpoint directly from the repository to deploy it explainable AI is fully managed service on the vertex side that enables user to generate what are called feature attributions or feature importance values for their model's predictions feature attributions are an explainability method that shows users how much each input feature contributed to their model's predictions and to the model's overall predictive power explainable AI is built into multiple vertex AI Services users can currently get feature attributions in vertex prediction automl tables and vertex notebooks as a fully managed service xci is flexible fast and scalable you can use xai on models trained on tabular image or text Data it works on models trained using any ml Frameworks non-to-stensorflow and we support both online and batch prediction use cases explainable AI provides insights to improve the quality of the models by refining the training data you can identify mislabeled examples Active Learning misclassification analysis and providing decision support to your stakeholders on top of that xci supports tabular image text Data from any tensorflow model fully managed serverless and faster query let's explore on how to manage and govern a model in the next slide I hope the above presentation help you to understand more about Emma loves and now I would like to show you them in action for this example we are going to run a simple pipeline to train ml model but please remember that it's not only for training with vertex AI pipeline it's also possible to automate the entire life cycle of a mail model such as training test and deploying and monitoring vertex pipelines before creating the pipeline make sure that necessary apis are activated and Google Cloud Storage bucket created to create a cloud storage bucket click the create button in the cloud storage dashboard first of all let's create a cloud storage bucket go to the navigation menu then cloud storage and click create name your bucket and choose where to store the data Regional or multi-regional let's create a multi-regional bucket the bucket is going to contain data pipeline model and serving model after creating a bucket make sure that vertex AI API is enabled and necessary permissions are granted to your account to run the pipeline let's move on to the vertex CI dashboard let's create a user managed notebook from vertex CI dashboard go to workbench from the code repo I have an option to deploy in vertex cyborg bench and that's what I'm going to do after initiating the process I can see user manage notebook has been created with tensorflow 2.8 environment pre-installed you can do the same manually from the console and then upload your notebook with the pipeline code next let's open Jupiter lab and execute the code the current pipeline defined using python apis the pipeline consists of three components CSV example gen trainer and pusher let's take a look at the pipelines function example gen component brings data to the pipeline trainer component uses user provided python function that trains a model Pusher component pushes the model to a file system destination then we Define a runner to run the pipeline and finally to meet the job after the job runs successfully let's explore the visual output we can see our pipeline components let's select the CSV Trend component and go to the node info here we can see our input parameters and our output parameters the current pipeline is a good simulation of the production-like pipeline in the pipeline you can also Define function based or container-based components Google also provides predefined pipeline components that you can reuse for your pipeline when you deploy the pipeline I would encourage you to check if components you are looking for already exists for more information please follow the link in the description next part is artifacts artifacts pass complex data like training data between components let's expand one of the artifacts as we can see it's no more than a pass to our previous created cloud storage bucket you can monitor each component's CPU utilization from the pipeline UI by selecting component and go to YouTube there are a few metrics you can see CPU GPU and network that have been sent and received by each component those metrics could be leveraged to further optimize your workload on the component level now that we have our three model let's import it into vertex AI model registry a central repository where we can manage the life cycle of our ml model to do so head over to model registry and click on import select import as new model alternatively if you have trained a new version of an existing model you can also choose to import as new version give the model a name a description and select a region since my model is trained with tensorflow 2.8 I will select import model artifacts into a new pre-built container and then choose the corresponding model framework and version for my container next you will need to specify the cloud storage directory that contains your safemodel.pb file you can set up explainable AI here but we'll skip this for now and carry on to import the model so with our model in model registry now we are ready to deploy it to an endpoint and start serving predictions let's head over to endpoints and click on create endpoint give your endpoint a name you'll be able to select either standard access mode which makes the endpoint available through rest API or private access mode which uses a VPC Network to create a private connection to the endpoint I want to be able to access it through a rest API so I'll select standard here we will select our model that we imported if we have multiple versions of our model we can deploy them to one endpoint and split the traffic between the different versions suppose you found a way to increase the accuracy of your current model with new training data you can add a new model to the same endpoint to serve a small percentage of traffic and gradually increase the traffic split to the new model 200 percent since I only have one version I will leave the traffic split at 100 percent next we select the compute resources to serve prediction traffic I'll leave the minimum number of notes at the default one next select the appropriate machine type and the service account in the next section we can set up model monitoring to monitor the performance of our models as it starts to self-prediction traffic we will name the monitoring job for easy identification set how frequently we want the job to run I'll leave it as the default at 24 hours we can also set the monitoring data window which determines the length of the window to put prediction traffic from I'll leave it blank so that it will default to the monetary interval next we can set up alerting so that relevant stakeholders can be notified when a model exceeds the alerting threshold lastly we specify a sampling rate of 10 which is the percentage of prediction requests within the monitoring window that we want to sample photo monitoring objective we can choose to either detect training serving skill or prediction drift so for demo purposes I will select training serving skill we will need to provide our training data so that the monitoring job can compare then we specify the target column name I'll leave the alert threshold as default so it will be 0.3 finally we'll click on Create and we will see that the status is shown as deploying model once it is successfully deployed you will be able to monitor metrics related to these endpoints in this dashboard and so that brings us to the end of our demo let's look at one of our customers Apna and their machine Learning Journey on Google Cloud app now is a smartphone app that leverages proprietary algorithms to drive AI enabled matches between candidate profiles and employers at a hyper local level in just three years Apna built one of India's leading professional marketplaces on cloud growing to 22 million users across 70 plus cities with a combination of cloud SQL Cloud pops up and bigquery Amna created a pipeline for millions of daily data points that are ready for vertex AI to perform ml modeling the results of running the platform on Google Cloud manage infrastructure services and vertex AI has been substantial it is estimated that abna is compressing the time to build AI models by 20 compared to how long it would take to build from scratch using a traditional data analytics engine as a result Apna is able to Crunch up to 500 million user interactions every day to power its vertex AI enabled algorithms thanks to the possibility of Rapid ml modeling on vertex AI in a situation where fraudsters constantly evolve methodologies Apna is able to identify and remove up to 60 percent of inappropriate content on the platform daily in this session we learn about ml Ops and look at the various tools that are available in vertex AI that can help us incorporate ml Ops practice into our ml systems we also Deep dive into understanding how to run pipelines and manage our models with model monitoring in vertex AI through a demo and lastly we reviewed a customer success story Apna and saw how Google cloud and vertex AI helped them automate their ml workflow to build out a successful AI enabled platform if you are interested in learning more please click on the links in the description box below where you will be able to read out more on vertex AI check out our AI simplified YouTube playlist get Hands-On with the platform by trying our step-by-step guided tutorials and lastly please reach out and get connected to learn more that brings us to the end of this video in the next video we will go over how you can architect a retail startup with Google Cloud how can Google Power your retail business and how you can accelerate the retail lifecycle with Google and that's a wrap don't forget to like And subscribe to our YouTube channel and also to click on the Bell icon to be notified whenever a new video is posted hand tied and we'll see you very soon in the next video foreign [Music]
Original Description
Identify Objects from Images using AI on Google Cloud → https://goo.gle/unicorn-ObjectLoc
Here to bring you the latest news in the startup program by Google Cloud is Wan Qi Ang and Alena Osipova!
Welcome to the third season of the Google Cloud Technical Guides for Startups - the Grow Series.
Grow Series - Episode 8: Accelerating model deployment with MLOps
Tune into our new series for a new episode each time and let us know what you think in the comments below!
Chapters:
0:00 - Intro
1:00 - Agenda
1:24 - Challenges of ML Workloads
2:40 - Introduction of MLOps
4:43 - MLOps on Google Cloud
6:48 - Managing models and tracking model quality
11:31 - Demo: Create a Vertex Pipeline
15:38 - Demo: Manage Models with Model Registry
16:42 - Demo: Deploying to Endpoints and Model Monitoring
19:28 - Customer success story
20:49 - Summary
21:22 - Want to find out more?
21:42 - Coming next
MLOps on Google Cloud → https://goo.gle/3QAQaFP
Vertex Pipelines → https://goo.gle/3qxDiWl
Model Registry → https://goo.gle/3s9Tyx7
Model Monitoring → https://goo.gle/45tA44X
Explainable AI → https://goo.gle/3OX6TSh
AI Simplified Youtube playlist → http://goo.gle/AISimplified
Step-by-step guided Vertex AI tutorials → https://goo.gle/45vpfzy
Google Cloud Pipeline Components List → https://goo.gle/3KEmqUE
Check out our website → https://goo.gle/3w2uyGB
Google Cloud Technical Guides for Startups playlist → https://goo.gle/3lBtYvu
Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech
#GCPStartupGuides
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Google Cloud Tech · Google Cloud Tech · 53 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
▶
54
55
56
57
58
59
60
I’m going for it #GoogleCloudCertified
Google Cloud Tech
I had to get #GoogleCloudCertified
Google Cloud Tech
Be better overall at what you do #GoogleCloudCertified
Google Cloud Tech
Cloud Monitoring on our radar #Analysis #Uptime
Google Cloud Tech
Introduction to Generative AI Studio
Google Cloud Tech
How to use Github Actions with Google's Workload Identity Federation
Google Cloud Tech
Introduction to Responsible AI
Google Cloud Tech
Networking updates and CDMC-certified architecture
Google Cloud Tech
Create and use a Cloud Storage bucket
Google Cloud Tech
How to digitize text from documents
Google Cloud Tech
Faster analytical queries with AlloyDB
Google Cloud Tech
Next ‘23 sessions and FaaS Wave
Google Cloud Tech
Introduction to Assured Open Source Software
Google Cloud Tech
BigQuery Cost Optimization: Storage
Google Cloud Tech
BigQuery Cost Optimization: Compute
Google Cloud Tech
BigQuery Cost Optimization: Select Queries
Google Cloud Tech
Remote Field Equipment Management with Manufacturing Data Engine
Google Cloud Tech
Supercharging your applications with Cloud SQL Enterprise Plus
Google Cloud Tech
Vector Support on our radar #GenAI
Google Cloud Tech
Architecting a blockchain startup with Google Cloud
Google Cloud Tech
Kubernetes and multitasking updates!
Google Cloud Tech
GKE: Using Kubernetes Events
Google Cloud Tech
How to configure firewall rules for Cloud Composer
Google Cloud Tech
Vertex AI Embeddings API + Matching Engine: Grounding LLMs made easy
Google Cloud Tech
Geospatial analytics on our radar #EarthEngine #BigQuery
Google Cloud Tech
Ensuring requests are set in Kubernetes
Google Cloud Tech
Cloud Next 2023, Google research program, and more!
Google Cloud Tech
How to migrate projects between organizations with Resource Manager
Google Cloud Tech
How to run #MySQL in Google Cloud
Google Cloud Tech
#GenerativeAI for enterprises and #Next2023
Google Cloud Tech
How Google Photos scales to store 4 trillion photos and videos
Google Cloud Tech
Google Cross-Cloud Interconnect (Demo 2)
Google Cloud Tech
GKE Cost Optimization Golden Signals: Introduction
Google Cloud Tech
GKE Cost Optimization Golden Signals: Workload Rightsizing
Google Cloud Tech
GKE Load Balancing: Overview
Google Cloud Tech
GKE Load Balancing: Best Practices
Google Cloud Tech
Disaster Recovery in GKE
Google Cloud Tech
How to configure IP masquerade agent in GKE Standard clusters
Google Cloud Tech
Enable and use GKE Control plane logs
Google Cloud Tech
Compliance in Australia with Assured Workloads
Google Cloud Tech
Creating budgets and budget alerts in Google Cloud #FinOps
Google Cloud Tech
Cloud SQL Enterprise Plus on our radar #mySQL
Google Cloud Tech
What's Next for Google Cloud?
Google Cloud Tech
How Loveholidays scaled with Contact Center AI
Google Cloud Tech
What is fleet team management in GKE?
Google Cloud Tech
Troubleshoot VPC Network Peering
Google Cloud Tech
Introduction to DocAI and Contact Center AI
Google Cloud Tech
Cloud Run Direct VPC egress explained
Google Cloud Tech
Database deployment options in GKE
Google Cloud Tech
Analyze cloud billing data with #BigQuery
Google Cloud Tech
Tips to becoming a world-class Prompt Engineer
Google Cloud Tech
Serverless is simple. Do I need CI/CD?
Google Cloud Tech
Accelerating model deployment with MLOps
Google Cloud Tech
How Hawaii's Department of Human Services scaled with CCAI
Google Cloud Tech
Pricing API on our #Radar
Google Cloud Tech
How Recommendations AI for Media can boost customer retention
Google Cloud Tech
Troubleshooting: Node Not Ready Status
Google Cloud Tech
One weekend until Cloud Next 2023!
Google Cloud Tech
#GoogleCloudNext starts tomorrow!
Google Cloud Tech
#GoogleCloudNext will be demand!
Google Cloud Tech
More on: LLM Engineering
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
DevOps Took 10 Years to Mature.
Medium · DevOps
Praesto: A Kubernetes Operator for Node-Local ML Model Caching with CSI
Medium · DevOps
Beyond `ollama run`: Production-Ready DeepSeek R1 Deployment with vLLM and Nginx
Dev.to · Shannon Dias
MCP Health Check: Building Production Monitoring for Your MCP Server — What I Learned After 84 Production Outages
Dev.to AI
Chapters (13)
Intro
1:00
Agenda
1:24
Challenges of ML Workloads
2:40
Introduction of MLOps
4:43
MLOps on Google Cloud
6:48
Managing models and tracking model quality
11:31
Demo: Create a Vertex Pipeline
15:38
Demo: Manage Models with Model Registry
16:42
Demo: Deploying to Endpoints and Model Monitoring
19:28
Customer success story
20:49
Summary
21:22
Want to find out more?
21:42
Coming next
🎓
Tutor Explanation
DeepCamp AI