Accelerating model deployment with MLOps

Google Cloud Tech · Beginner ·🏭 MLOps & LLMOps ·2y ago

Skills: LLM Engineering90%ML Pipelines80%AI Systems Design70%

Key Takeaways

The video discusses accelerating model deployment with MLOps using Google Cloud's Vertex AI platform, highlighting its capabilities in automating the end-to-end process of extracting and preparing data, training, and deploying models. It also covers model monitoring, data drift detection, and artifact management.

Full Transcript

hi everyone I'm Alina osipova a customer engineer and I'm once young also a customer engineer and we both work here at Google Cloud welcome to the technical series for startups where we are creating a series of videos for technical enablement to help startups to start build and grow their businesses successfully and sustainably on Google cloud in our previous video we learned how you can innovate with machine learning Solutions such as contact center Ai and document AI they are pre-packaged Solutions designed for common use cases companies can start using them without relying on AI experts however if you start implementing custom Solutions with AI experts in your team you will start to face new challenges and today we are going to Deep dive a little bit more into how you can Implement machine learning Ops on Google Cloud we'll start by looking at some of the challenges that startups face when it comes to their machine learning workloads following that we will introduce the concept of ml Ops and how ml Ops can be achieved on Google Cloud we'll then walk through a demo of ml Ops with vertex AI in the Google Cloud console and finally we will learn about one of our customers and their ml Ops Journey let's explore common challenges our customers are facing when dealing with ML workloads let's imagine that we are walking with an advanced startup company with AI and ml technical skills employees have experience with Frameworks tensorflow or pytorch and even were able to deploy their models on one of our compute options that we covered in the previous sessions Cloud run or Google kubernetes engine the common challenges this company could potentially face include models don't make into production High degree of manual and man of war no reusable or reproducible components and many others further we are going to explore how vertex AI pipelines can help our customers to overcome those challenges why are ml workloads becoming so challenging to handle over time is it all because of the complexity of the code in reality only a small fraction of real world ml system is composed of ml code as shown by the small blue box in the middle the required surroundings infrastructure is vast and complex given all the challenges that comes with managing machine learning workloads how can we best address them and help eliminate the management overhead that comes with this machine learning workloads here we want to introduce the concept of ml Ops but what exactly is mlops just by looking at the name we can guess that ml Ops is related to devops to a certain extent we can Define it as an ml engineering culture and practice that aims at unifying ml system development and ml system operations in other words ml Ops aims to prepare infrastructure to streamline the production process of ml systems such as training testing deployment and monitoring most importantly it is to ensure that the process is consistent scalable and repeatable within ml Ops there are multiple stages involved and it often goes in this sequence first you start with your ml model development once you have decided on your model you will have to operationalize your training by building a training pipeline this includes Gathering training data running your training jobs and evaluating the performance of your model when your pipeline is up and running you can conduct continuous training Whenever there is fresh data or based on any custom metric that you set The Next Step will be to deploy a model to a serving environment and prepare your model to solve predictions SM model is in production and serving clients in real time you will need to continuously monitor your model for any data drift or performance degradation so that you can intervene when needed finally at the call is data and Model Management which is essential to governing ml artifacts to ensure auditability traceability and compliance at the same time this can also be used to promote shareability reusability and discoverability of ml assets to get started with ML Ops on Google Cloud we have vertex AI Vortex AI is Google Cloud's unified machine learning platform that allows you to create deploy and manage models over time and at skill it is built on top of Google's robust and secure Foundation and provides flexibility for users of all ml expertise whether you are new to the domain or an ml expert imagine that we have done our exploratory analysis and build out our model in our Jupiter notebook we are now ready to operationalize and productionalize our machine learning workflow so what can we do to automate the process of extracting and preparing data as well as training and deploying our model to do so we have vertex AI pipelines that can help you automate the end-to-end process allowing for repeatability and scalability when it comes to building your machine learning models with pipelines you can easily retrain your models at regular schedules based on new data that comes in without having to worry about infrastructure or management overhead if you are interested in learning more about building your own pipelines with vertex AI pipelines do check out the link in the description box for a more detailed video under the AI simplified playlist before you can start serving predictions to your end users you will have to deploy your trim model to an endpoint for your users to access for this we have vertex AI prediction that can handle both online endpoints as well as batch prediction requests with vertex AI prediction you can deploy model stream with either Auto ml or custom code to serve online and batch prediction requests depending on your use case [Music] as vertex AI prediction skills automatically based on traffic it extracts away any form of infrastructure overhead from the machine learning Developers on top of that you will be able to split traffic between models and also customize the endpoint machine types based on your model needs now let's move to the managing and tracking the model quality what is model monitoring and what challenges does it help to solve model monitoring helps data scientists and machine learning engineer to answer the question why does a deployed model perform in a certain way modern applications rely on well-established set of capabilities to monitor the house of their services examples include software versioning rigorous deployment processes event login alert and notification of situation requiring intervention on-demand and automated diagnostic tracing automated performance and functional testing vertex CI provides a host of products to Monitor and govern your models all of which helps drive both successful and responsible AI deployment those Services include monitor signals for model predictive performance and alert when those signals deviate diagnose to identify the cause of deviation model updating to trigger model retraining pipeline integration with feature store model monitoring is only one piece of the envelopes puzzle I would like to explain a bit more about train and serving skill and data drift detection it helps to answer the following questions why does the model perform differently from the training time in the production environment this is called training service Q This queue can be caused by a discrepancy between how you handle data in the training and serving pipelines a change in the data between when you train and when you serve and many other reasons among them data drift is one of the common causes of the skew data drift is defined as variation in the production data from the data that was used to test and validate the model before deploying it into production so you have to monitor how significantly service requests are evolving over time this is called Drift detection so now how do you manage and govern your ml models here are some products that you might find helpful first up we have a feature store that allows you to share and reuse ml feature across use cases allowing you to serve ml features at scale with low latency on top of that Google Cloud also offers ml metadata which enables you to automatically track inputs and outputs to all your components with this you will be able to visualize analyze and compare detailed ml lineage lastly model registry is a central repository where you can manage the life cycle of your ml models from the registry you can have an overview of your models so you can better organize track and train new versions of your model when you have model versions that you would like to deploy you can simply assign it to an endpoint directly from the repository to deploy it explainable AI is fully managed service on the vertex side that enables user to generate what are called feature attributions or feature importance values for their model's predictions feature attributions are an explainability method that shows users how much each input feature contributed to their model's predictions and to the model's overall predictive power explainable AI is built into multiple vertex AI Services users can currently get feature attributions in vertex prediction automl tables and vertex notebooks as a fully managed service xci is flexible fast and scalable you can use xai on models trained on tabular image or text Data it works on models trained using any ml Frameworks non-to-stensorflow and we support both online and batch prediction use cases explainable AI provides insights to improve the quality of the models by refining the training data you can identify mislabeled examples Active Learning misclassification analysis and providing decision support to your stakeholders on top of that xci supports tabular image text Data from any tensorflow model fully managed serverless and faster query let's explore on how to manage and govern a model in the next slide I hope the above presentation help you to understand more about Emma loves and now I would like to show you them in action for this example we are going to run a simple pipeline to train ml model but please remember that it's not only for training with vertex AI pipeline it's also possible to automate the entire life cycle of a mail model such as training test and deploying and monitoring vertex pipelines before creating the pipeline make sure that necessary apis are activated and Google Cloud Storage bucket created to create a cloud storage bucket click the create button in the cloud storage dashboard first of all let's create a cloud storage bucket go to the navigation menu then cloud storage and click create name your bucket and choose where to store the data Regional or multi-regional let's create a multi-regional bucket the bucket is going to contain data pipeline model and serving model after creating a bucket make sure that vertex AI API is enabled and necessary permissions are granted to your account to run the pipeline let's move on to the vertex CI dashboard let's create a user managed notebook from vertex CI dashboard go to workbench from the code repo I have an option to deploy in vertex cyborg bench and that's what I'm going to do after initiating the process I can see user manage notebook has been created with tensorflow 2.8 environment pre-installed you can do the same manually from the console and then upload your notebook with the pipeline code next let's open Jupiter lab and execute the code the current pipeline defined using python apis the pipeline consists of three components CSV example gen trainer and pusher let's take a look at the pipelines function example gen component brings data to the pipeline trainer component uses user provided python function that trains a model Pusher component pushes the model to a file system destination then we Define a runner to run the pipeline and finally to meet the job after the job runs successfully let's explore the visual output we can see our pipeline components let's select the CSV Trend component and go to the node info here we can see our input parameters and our output parameters the current pipeline is a good simulation of the production-like pipeline in the pipeline you can also Define function based or container-based components Google also provides predefined pipeline components that you can reuse for your pipeline when you deploy the pipeline I would encourage you to check if components you are looking for already exists for more information please follow the link in the description next part is artifacts artifacts pass complex data like training data between components let's expand one of the artifacts as we can see it's no more than a pass to our previous created cloud storage bucket you can monitor each component's CPU utilization from the pipeline UI by selecting component and go to YouTube there are a few metrics you can see CPU GPU and network that have been sent and received by each component those metrics could be leveraged to further optimize your workload on the component level now that we have our three model let's import it into vertex AI model registry a central repository where we can manage the life cycle of our ml model to do so head over to model registry and click on import select import as new model alternatively if you have trained a new version of an existing model you can also choose to import as new version give the model a name a description and select a region since my model is trained with tensorflow 2.8 I will select import model artifacts into a new pre-built container and then choose the corresponding model framework and version for my container next you will need to specify the cloud storage directory that contains your safemodel.pb file you can set up explainable AI here but we'll skip this for now and carry on to import the model so with our model in model registry now we are ready to deploy it to an endpoint and start serving predictions let's head over to endpoints and click on create endpoint give your endpoint a name you'll be able to select either standard access mode which makes the endpoint available through rest API or private access mode which uses a VPC Network to create a private connection to the endpoint I want to be able to access it through a rest API so I'll select standard here we will select our model that we imported if we have multiple versions of our model we can deploy them to one endpoint and split the traffic between the different versions suppose you found a way to increase the accuracy of your current model with new training data you can add a new model to the same endpoint to serve a small percentage of traffic and gradually increase the traffic split to the new model 200 percent since I only have one version I will leave the traffic split at 100 percent next we select the compute resources to serve prediction traffic I'll leave the minimum number of notes at the default one next select the appropriate machine type and the service account in the next section we can set up model monitoring to monitor the performance of our models as it starts to self-prediction traffic we will name the monitoring job for easy identification set how frequently we want the job to run I'll leave it as the default at 24 hours we can also set the monitoring data window which determines the length of the window to put prediction traffic from I'll leave it blank so that it will default to the monetary interval next we can set up alerting so that relevant stakeholders can be notified when a model exceeds the alerting threshold lastly we specify a sampling rate of 10 which is the percentage of prediction requests within the monitoring window that we want to sample photo monitoring objective we can choose to either detect training serving skill or prediction drift so for demo purposes I will select training serving skill we will need to provide our training data so that the monitoring job can compare then we specify the target column name I'll leave the alert threshold as default so it will be 0.3 finally we'll click on Create and we will see that the status is shown as deploying model once it is successfully deployed you will be able to monitor metrics related to these endpoints in this dashboard and so that brings us to the end of our demo let's look at one of our customers Apna and their machine Learning Journey on Google Cloud app now is a smartphone app that leverages proprietary algorithms to drive AI enabled matches between candidate profiles and employers at a hyper local level in just three years Apna built one of India's leading professional marketplaces on cloud growing to 22 million users across 70 plus cities with a combination of cloud SQL Cloud pops up and bigquery Amna created a pipeline for millions of daily data points that are ready for vertex AI to perform ml modeling the results of running the platform on Google Cloud manage infrastructure services and vertex AI has been substantial it is estimated that abna is compressing the time to build AI models by 20 compared to how long it would take to build from scratch using a traditional data analytics engine as a result Apna is able to Crunch up to 500 million user interactions every day to power its vertex AI enabled algorithms thanks to the possibility of Rapid ml modeling on vertex AI in a situation where fraudsters constantly evolve methodologies Apna is able to identify and remove up to 60 percent of inappropriate content on the platform daily in this session we learn about ml Ops and look at the various tools that are available in vertex AI that can help us incorporate ml Ops practice into our ml systems we also Deep dive into understanding how to run pipelines and manage our models with model monitoring in vertex AI through a demo and lastly we reviewed a customer success story Apna and saw how Google cloud and vertex AI helped them automate their ml workflow to build out a successful AI enabled platform if you are interested in learning more please click on the links in the description box below where you will be able to read out more on vertex AI check out our AI simplified YouTube playlist get Hands-On with the platform by trying our step-by-step guided tutorials and lastly please reach out and get connected to learn more that brings us to the end of this video in the next video we will go over how you can architect a retail startup with Google Cloud how can Google Power your retail business and how you can accelerate the retail lifecycle with Google and that's a wrap don't forget to like And subscribe to our YouTube channel and also to click on the Bell icon to be notified whenever a new video is posted hand tied and we'll see you very soon in the next video foreign [Music]

Original Description

Identify Objects from Images using AI on Google Cloud → https://goo.gle/unicorn-ObjectLoc Here to bring you the latest news in the startup program by Google Cloud is Wan Qi Ang and Alena Osipova! Welcome to the third season of the Google Cloud Technical Guides for Startups - the Grow Series. Grow Series - Episode 8: Accelerating model deployment with MLOps Tune into our new series for a new episode each time and let us know what you think in the comments below! Chapters: 0:00 - Intro 1:00 - Agenda 1:24 - Challenges of ML Workloads 2:40 - Introduction of MLOps 4:43 - MLOps on Google Cloud 6:48 - Managing models and tracking model quality 11:31 - Demo: Create a Vertex Pipeline 15:38 - Demo: Manage Models with Model Registry 16:42 - Demo: Deploying to Endpoints and Model Monitoring 19:28 - Customer success story 20:49 - Summary 21:22 - Want to find out more? 21:42 - Coming next MLOps on Google Cloud → https://goo.gle/3QAQaFP Vertex Pipelines → https://goo.gle/3qxDiWl Model Registry → https://goo.gle/3s9Tyx7 Model Monitoring → https://goo.gle/45tA44X Explainable AI → https://goo.gle/3OX6TSh AI Simplified Youtube playlist → http://goo.gle/AISimplified Step-by-step guided Vertex AI tutorials → https://goo.gle/45vpfzy Google Cloud Pipeline Components List → https://goo.gle/3KEmqUE Check out our website → https://goo.gle/3w2uyGB Google Cloud Technical Guides for Startups playlist → https://goo.gle/3lBtYvu Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech #GCPStartupGuides

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Google Cloud Tech · Google Cloud Tech · 53 of 60

← Previous Next →

I’m going for it #GoogleCloudCertified

I’m going for it #GoogleCloudCertified

Google Cloud Tech

I had to get #GoogleCloudCertified

I had to get #GoogleCloudCertified

Google Cloud Tech

Be better overall at what you do #GoogleCloudCertified

Be better overall at what you do #GoogleCloudCertified

Google Cloud Tech

Cloud Monitoring on our radar #Analysis #Uptime

Cloud Monitoring on our radar #Analysis #Uptime

Google Cloud Tech

Introduction to Generative AI Studio

Introduction to Generative AI Studio

Google Cloud Tech

How to use Github Actions with Google's Workload Identity Federation

How to use Github Actions with Google's Workload Identity Federation

Google Cloud Tech

Introduction to Responsible AI

Introduction to Responsible AI

Google Cloud Tech

Networking updates and CDMC-certified architecture

Networking updates and CDMC-certified architecture

Google Cloud Tech

Create and use a Cloud Storage bucket

Create and use a Cloud Storage bucket

Google Cloud Tech

How to digitize text from documents

How to digitize text from documents

Google Cloud Tech

Faster analytical queries with AlloyDB

Faster analytical queries with AlloyDB

Google Cloud Tech

Next ‘23 sessions and FaaS Wave

Next ‘23 sessions and FaaS Wave

Google Cloud Tech

Introduction to Assured Open Source Software

Introduction to Assured Open Source Software

Google Cloud Tech

BigQuery Cost Optimization: Storage

BigQuery Cost Optimization: Storage

Google Cloud Tech

BigQuery Cost Optimization: Compute

BigQuery Cost Optimization: Compute

Google Cloud Tech

BigQuery Cost Optimization: Select Queries

BigQuery Cost Optimization: Select Queries

Google Cloud Tech

Remote Field Equipment Management with Manufacturing Data Engine

Remote Field Equipment Management with Manufacturing Data Engine

Google Cloud Tech

Supercharging your applications with Cloud SQL Enterprise Plus

Supercharging your applications with Cloud SQL Enterprise Plus

Google Cloud Tech

Vector Support on our radar #GenAI

Vector Support on our radar #GenAI

Google Cloud Tech

Architecting a blockchain startup with Google Cloud

Architecting a blockchain startup with Google Cloud

Google Cloud Tech

Kubernetes and multitasking updates!

Kubernetes and multitasking updates!

Google Cloud Tech

GKE: Using Kubernetes Events

GKE: Using Kubernetes Events

Google Cloud Tech

How to configure firewall rules for Cloud Composer

How to configure firewall rules for Cloud Composer

Google Cloud Tech

Vertex AI Embeddings API + Matching Engine: Grounding LLMs made easy

Vertex AI Embeddings API + Matching Engine: Grounding LLMs made easy

Google Cloud Tech

Geospatial analytics on our radar #EarthEngine #BigQuery

Geospatial analytics on our radar #EarthEngine #BigQuery

Google Cloud Tech

Ensuring requests are set in Kubernetes

Ensuring requests are set in Kubernetes

Google Cloud Tech

Cloud Next 2023, Google research program, and more!

Cloud Next 2023, Google research program, and more!

Google Cloud Tech

How to migrate projects between organizations with Resource Manager

How to migrate projects between organizations with Resource Manager

Google Cloud Tech

How to run #MySQL in Google Cloud

How to run #MySQL in Google Cloud

Google Cloud Tech

#GenerativeAI for enterprises and #Next2023

#GenerativeAI for enterprises and #Next2023

Google Cloud Tech

How Google Photos scales to store 4 trillion photos and videos

How Google Photos scales to store 4 trillion photos and videos

Google Cloud Tech

Google Cross-Cloud Interconnect (Demo 2)

Google Cross-Cloud Interconnect (Demo 2)

Google Cloud Tech

GKE Cost Optimization Golden Signals: Introduction

GKE Cost Optimization Golden Signals: Introduction

Google Cloud Tech

GKE Cost Optimization Golden Signals: Workload Rightsizing

GKE Cost Optimization Golden Signals: Workload Rightsizing

Google Cloud Tech

GKE Load Balancing: Overview

GKE Load Balancing: Overview

Google Cloud Tech

GKE Load Balancing: Best Practices

GKE Load Balancing: Best Practices

Google Cloud Tech

Disaster Recovery in GKE

Disaster Recovery in GKE

Google Cloud Tech

How to configure IP masquerade agent in GKE Standard clusters

How to configure IP masquerade agent in GKE Standard clusters

Google Cloud Tech

Enable and use GKE Control plane logs

Enable and use GKE Control plane logs

Google Cloud Tech

Compliance in Australia with Assured Workloads

Compliance in Australia with Assured Workloads

Google Cloud Tech

Creating budgets and budget alerts in Google Cloud #FinOps

Creating budgets and budget alerts in Google Cloud #FinOps

Google Cloud Tech

Cloud SQL Enterprise Plus on our radar #mySQL

Cloud SQL Enterprise Plus on our radar #mySQL

Google Cloud Tech

What's Next for Google Cloud?

What's Next for Google Cloud?

Google Cloud Tech

How Loveholidays scaled with Contact Center AI

How Loveholidays scaled with Contact Center AI

Google Cloud Tech

What is fleet team management in GKE?

What is fleet team management in GKE?

Google Cloud Tech

Troubleshoot VPC Network Peering

Troubleshoot VPC Network Peering

Google Cloud Tech

Introduction to DocAI and Contact Center AI

Introduction to DocAI and Contact Center AI

Google Cloud Tech

Cloud Run Direct VPC egress explained

Cloud Run Direct VPC egress explained

Google Cloud Tech

Database deployment options in GKE

Database deployment options in GKE

Google Cloud Tech

Analyze cloud billing data with #BigQuery

Analyze cloud billing data with #BigQuery

Google Cloud Tech

Tips to becoming a world-class Prompt Engineer

Tips to becoming a world-class Prompt Engineer

Google Cloud Tech

Serverless is simple. Do I need CI/CD?

Serverless is simple. Do I need CI/CD?

Google Cloud Tech

Accelerating model deployment with MLOps

Accelerating model deployment with MLOps

Google Cloud Tech

How Hawaii's Department of Human Services scaled with CCAI

How Hawaii's Department of Human Services scaled with CCAI

Google Cloud Tech

Pricing API on our #Radar

Pricing API on our #Radar

Google Cloud Tech

How Recommendations AI for Media can boost customer retention

How Recommendations AI for Media can boost customer retention

Google Cloud Tech

Troubleshooting: Node Not Ready Status

Troubleshooting: Node Not Ready Status

Google Cloud Tech

One weekend until Cloud Next 2023!

One weekend until Cloud Next 2023!

Google Cloud Tech

#GoogleCloudNext starts tomorrow!

#GoogleCloudNext starts tomorrow!

Google Cloud Tech

#GoogleCloudNext will be demand!

#GoogleCloudNext will be demand!

Google Cloud Tech

This video teaches how to accelerate model deployment with MLOps using Google Cloud's Vertex AI platform, covering topics such as automating the end-to-end process of model deployment, model monitoring, and artifact management. It provides a comprehensive overview of MLOps practices and how to implement them using Vertex AI. By watching this video, viewers can learn how to streamline their model deployment process and improve their overall ML workflow.

Key Takeaways

Create a user-managed notebook from Vertex CI dashboard
Import a model into Vertex AI Model Registry
Deploy a model to an endpoint
Select compute resources to serve prediction traffic
Reuse pipeline components

💡 Vertex AI provides a unified platform for creating, deploying, and managing models, and its pipelines can automate the end-to-end process of model deployment, making it easier to implement MLOps practices.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Engineering

View skill →

Build an LLM and RAG-based Chat Application using AlloyDB and LangChain

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Shane | LLM Implementation

How to Make an Asteroids Game Bot (LIVE)

How to Make an Asteroids Game Bot (LIVE)

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Automata Learning Lab

Related AI Lessons

DevOps Took 10 Years to Mature.

MLOps is distinct from DevOps and solves unique problems, requiring a different approach

Medium · DevOps

Praesto: A Kubernetes Operator for Node-Local ML Model Caching with CSI

Learn how Praesto, a Kubernetes Operator, optimizes ML model caching for Node-Local storage with CSI, reducing costs and improving performance

Medium · DevOps

Beyond `ollama run`: Production-Ready DeepSeek R1 Deployment with vLLM and Nginx

Learn to deploy DeepSeek R1 with vLLM and Nginx for production-ready environments, moving beyond local development

Dev.to · Shannon Dias

MCP Health Check: Building Production Monitoring for Your MCP Server — What I Learned After 84 Production Outages

Learn to build production monitoring for your MCP server to minimize outages and ensure smooth operation

Chapters (13)

Intro

1:00 Agenda

1:24 Challenges of ML Workloads

2:40 Introduction of MLOps

4:43 MLOps on Google Cloud

6:48 Managing models and tracking model quality

11:31 Demo: Create a Vertex Pipeline

15:38 Demo: Manage Models with Model Registry

16:42 Demo: Deploying to Endpoints and Model Monitoring

19:28 Customer success story

20:49 Summary

21:22 Want to find out more?

21:42 Coming next

Pole Pruner How A Rope Lever Shears High Branches

Innoforge Studio