ML Ops Best Practices

Data Skeptic · Intermediate ·🏭 MLOps & LLMOps ·3y ago

Key Takeaways

The video discusses ML Ops best practices, highlighting the importance of experiment tracking, model monitoring, and collaboration between data scientists, ML engineers, and DevOps engineers, with a focus on Neptune.ai as a tool for efficient ML ops.

Full Transcript

[Music] deploying a machine learning model in production is a fundamentally different thing than deploying regular software in production any organization serious about machine learning needs to have proper procedures for change management and a way of monitoring those models while they're in production in this episode i speak with piot nijevich from neptune.ai we explore the current state of ml ops experiment tracking and how organizations of all sizes are achieving success my name is pierce nejvec co-founder and ceo at neptune ai at neptune we help data scientists and data science teams organizations to track and manage their machine learning models metadata well obviously we're going to get into some details about that but i'd love to hear a little bit about your background like where's the first line of code for you oh it was it was early i was seven it was in basic so yeah like engineering is my i would still say number one backgrounds after i started coding early as i said then i got into classical algorithms i was starting from high school and during my university time i was doing a lot of competitive programming and with some global successes on the college and competitions top coder i graduated from double degree program in mathematics and computer science but i also have entrepreneurial background i can say i think that i can say after two internships i did it was in google and facebook i got enough money to establish my own company it was something like 13 years ago long story short besides snapchat i co-own three companies in total there are over 600 people all the companies are in tech those companies were self-funded so you can say that it is to some extent reasonable sales scale success business-wise but not of course google type of success well we heard about when you started with code when is your first introduction to machine learning it is a good question because we would need to set here a boundary between statistics and machine learning so it is this boundary is not so clear for me something like 16 years ago of course it is hard to compare what is possible today but machine learning the foundations are rather older than few years the earliest days machine learning was very custom first of all there weren't any tools or platforms just yet and everyone was kind of investing a lot of time and reinventing the wheel then of course we got libraries and the field progressed and things like that but i'm curious since you see some and interact with so many people who are using your platform obviously there are a bit more standards it's not quite the wild west but what is the current state are people centralizing on a standard tech stack for machine learning or do you still see a great variety definitely the change versus you know 16 years ago is that today there are a lot of options from one perspective it is great on the other hand it is pretty problematic from high level if you are data science team building models that that you want to eventually as a business operate on production i would say there are two options on high level on one hand you can go and buy one of end-to-end machine learning platforms more like they provide almost everything the alternative approach an approach that i need to admit that i am biased towards because i think it is closer to software engineering is approach of building your tech stack from best in breed best in class tools today the challenge with building your tech stack is that for me a person who is following this market on daily basis it is quite challenging what tools what products you should use and how they work together because there are today there are too many like we do not have clearly defined categories in envelope space but i but but i think it's getting like definitely if i compare situation today versus three years ago when we were starting and we were having at nation we had a hard time to to explain what we do today it is order of magnitude either because we as a community of people in this field we started to establishing some common vocabulary and started naming things but it is still in progress well when is the right time for a new startup to engage with a tool like neptune ai let's say we're going to dabble in some machine learning do i start with you day one or do i have to hit a certain milestone i would start from day zero day zero assuming that day zero is the day i start to develop train machine learning model of course day zero in practice is firstly you need to know that you have a problem that can be solved using machine learning techniques and it's worth solving yourself right very good point yeah because i am a big fan of buying what you can buy and building what you really really need to build that makes you different that is essential to your business because very often like the problems that you may think yeah maybe some artificial intelligence would do some magic very often those problems are not machine learning learning problems or those problems can be solved using already existing tools libraries apis so it is the very first thing like define the problem then of course it is hard to apply machine learning without data it is the next stage but once you get to the point that you know that you are dealing with a problem that is important to your business you have relevant resources i i i talk mainly about data and you are committed to build to create a machine learning model to solve it this is day 0 for neptune it is super easy to start it just think about neptune at the beginning as a more sophisticated print function in your code so instead of printing stuff to the console and and losing it you will print it of course it is sent to neptune servers visualize it is more rich than uh more richer than just uh you know logs but you will start logging things tracking things not losing things from the zero i think today some people may say maybe it's too early i remember it was maybe 13 years ago i remember discussions on stack overflow about similar questions but to software when you should start using git or other version control system oh day zero for sure yeah exactly exactly why because because it helps you not to do the same things many times and help you get more organized but at the same time it is not so hard doing these gifts commit with push is not a big task to to and practice to follow so well your comparison to a print statement i think is very a useful shortcut for someone who develops code and develops models because they get that i'm constantly printing things out and as you say they're lost but we've always had like logging tools where i could just send arbitrary messages and they're stored in the cloud there's a lot more features especially aimed at machine learning development what are some of the things people can take advantage of or maybe in machine learning it would be the first thing to log would be the evolution of the of your training metrics that would help you understand how your training is doing whether your your model is getting better or not you will also would like to log reference to your code i'm not you can lock code but i'm talking about maybe git reference to your to your source code you would like to log reference again not the data set but reference to your data set in order to know what data you use to train this model and of course like it is now all depends on the problem you're working on but if you are working on some image processing quite likely you would like to log information about like some debug images from different epoch of your training of course hyper parameters so the the parameters that you want to change testing different model training versions yeah those would be the first things that comes to my mind yeah very often people also log snapshots of model weights from different epochs from time to time just to be able to kind of get particular model version or if they're using spot instances so trying to you know save money on on compute being able to resume training of the model so so there are a lot of uh i would say special types of data that are a little bit more rich or even significantly more rich than the standard logs that you would like to log during you know when you're running software well when i think about times i've been in a software engineering role typically if i'm looking at my logs and metadata it's because i have a bug i'm trying to trace down a problem or you know maybe once in a while i want to reduce the latency but usually it's reactive whereas i think as a data scientist i really want to take a whole different approach technically all this is just logging data but it seems like a much more interactive process could you compare and contrast a little bit between how software engineers use logs and how data scientists do sure like you're absolutely right but i would so this context is different the the difference i think the difference comes from amount of iterations like very often when you code something of course it all depends on the on on the skill set and how hard things you code but very often you are not getting it right at the very very first iteration like initially it doesn't compile okay then it compiles but but quite likely doesn't work you have bugs it doesn't pass unit tests but in a day or two you're good that's why you don't spend so much time on debugging comparing looking at the logs because it is way easier in software to have a plan like who have a plan at the very beginning before you start coding a function a class how it should work and usually it works like you thought in data science you really cannot know like we don't know we have intuition but it is static statistics you need to prove it you need to test it that's why before you get with when you're working on new model before you get to the level of the quality that is acceptable to be moved to production you will run hundreds or thousands of experiments of trials and you are in the debugging mode or debugging looking for you know improvements because debugging sounds like you have a bug in the code but here it's not about bugs most often or maybe equally often but also about what's what sometimes people call meta ml so about finding the ideas how i can like what i can change in this architecture in hyper parameters okay you can always do hyper parameter optimization but still you have to limit it to some space otherwise quite likely otherwise you don't have resources to try every every potential permutation or setup for your hyper parameters so it is way more iterative process that's why you need debugging more often so in this context you're absolutely right but there is a second context where that i would say is pretty common to software and to data science the context is closer to production and then like it is hard at least for me it's hard to imagine to have software on production and not to have any monitoring system once there is some buck on production that can be caused by you know things that we didn't predicted during writing the tests or we have not observed so far it's it's rather safer to to be able to understand what happened right to have those locks in software it is obvious that you need those locks if you want to operate software on production in order to fix it later if you observe something that cause problems in data science you also need to track you need to monitor your models on production and i'm not only talking about drift of your inputs outputs i'm also talking about about more software related aspects you monitor because model on production is in fact a piece of software right so in this context you need to you you will need those logs more retroactively liking software while i'm going through that iterative process building up a machine learning experiment of some kind there's a lot like you say i'm looking for insights what do i need to tweak what do i need to improve and that path i can definitely understand how experiment tracking helps me i'm curious if you look outside of just one practitioner do you see teams finding good ways to collaborate and organizations to have more transparency through tools like this definitely it is easier to share something to in order to discuss because you can just get grab a link to your training process pass it over slack and somebody can go check you hyper parameters codes learning curves how your model was doing maybe the person can help you debug or provide you some idea for improvement can compare it with her work because very often like after i remember a practice that once we it was two years ago we were competing on kaggle and we had a practice we were a team of three every week we had a sync up where we compared like our best models and explain how we did it and the next week we usually started from the you know the few best models two out of three people were picking up someone else's model training code and try to iterate from it so it helps it's in a team it is helpful in a team but i think it is even more fundamental if you think about the whole organization because when it comes to whole organization if you don't have a place where you have all the like okay i will use this word metadata about your machine learning model's life cycle it is pretty risky on one hand you may be losing like if someone decide to leave your company that in these skills is happening from time to time you may lose a lot like you will lose a lot anyway but at least you will have some artifacts some some some models in a way that you can pick them up and if needed you understand you understand the whole image quite likely you know how to retrain the model how it was built from this perspective it is important it's very important actually if you are thinking about operating models in production because from technical point of view it is not so problematic because it is not so far from how we operate there are differences but not such significant difference in my opinion to how we operate today software using devops practices but the challenge is in the confidence that you know that the model that was built this model is something that should be on production that you really know what was the data whether it was properly tested after training whether it was if it all depends on the on the use case but but you would like to see that this model pass your quality checks i don't want to go into legal aspects in some fields it is a must have to you know for auditing perspective but i prefer to focus on just technical or you know looking at this uh set of problems from tech perspective yeah i one challenge i've had in delegating a little bit is that you know someone else could take over a model i've built and when we want to do an update things could go wrong uh maybe that they sent me the wrong model binary or it's using the wrong features or lots of just different communication issues could go on are there ways in which these processes can help make uh kind of less production mistakes like that and give me more of a confidence that i know what's going on of course help you organize yourself organize yourself as an organization we had neptune try to be like okay we won't force you to get organized because if you try to force somebody quite likely you will need to limit somebody and it is something that we don't want to do but you can establish i would call it a protocol of collaboration within the team of data scientists between data scientists and ml engineers between ml engineers and devops engineers and in software in most most cases it is like that if someone wants to do something bad quite likely unless you are really sophisticated you know big organization that has a lot of checks it is hard but in most companies if software engineer or devops engineer would like to do something quite likely she would succeed but most people are not like that and it will help you establish a clean protocol of collaboration definitely well as a machine learning practitioner i have a number of tools i could pick you know i could use some open source things like sklearn myself i could build it in spark i could use a commercial tool like sagemaker how does the neptune experience work in all of these different settings in most most most cases it works out of the box next to it we are not end-to-end platform right so we are not here to replace your stock i think it is too complex and and there are too many smart people in the field so i we don't even have such an ambition we are a component that integrates with pipelining frameworks we are integrated naturally with with different options for data like data storage data processing of course of all the machine learning frameworks if you go and play with net you will figure out that it is pretty simple and maybe a little bit more complex than print statement but not so much yeah well in order for you to get wired into an organization you have to have some advocate on the inside someone who says i want to use this tool does that tend to be the machine learning practitioner or their manager or c level who's the first to say let's use it of course it depends on on organization if it is small startup you know sea level is hands-on person like adnaption i yeah it's fun even to call me sea level guy like and but it is mostly bottom up it comes in most cases from data scientists data science team leads but yeah almost always it is bottom-up so practitioner who really understands what she does and what she needs i think it is very hard today and a little bit counterproductive to try to approach this market going top down going to ctos like of course there are who are on but if we are talking about bigger organizations ctos are having a lot of problems and quite likely our you know this envelopes field is too complex to follow on daily basis and we are neptune we are in my opinion important but one of the components of the of the broader setup so at this stage of the market development our go to market and our efforts are targeted at data scientists ml engineers team leads of substance well in comparison to something like using source control it would be shocking to me if a serious software group wasn't on it the git or svn or whatever you have source control i would think it's 100 adoption i don't think we're at 100 adoption with experiment tracking and model registry do you have a sense of where things are at in the growth life cycle i think it is not so obvious like eventually of course bigger players will play in this field as well of course i hope that snapchat the time will be a bigger player but i think that's from today's perspective if you look at let's think a little bit about github gitlab i would say that there are players for they're not only source control systems there are more today also in dealing with task management issue management and even primary that i think the direction is to go deeper into cicd processes so i think there are a lot of work here to some extent it will be because i think that envelopes has to be closely closely integrated with devops because if you are releasing even when you look at the model if you are releasing the model with a little bit different assumptions around data your application like inputs and outputs signature your application also needs to be aware of that so sometimes you need to deploy like coordinate deployment of model version with application version so it is natural that they have to be integrated closely but when it comes to aspects of that we are dealing with with neptune we are quite quite far think about data dog splunk sumo logic like software monitoring tools elk elast elastic of course you can say that they are capturing metadata about software but not static form of software not necessarily about code but about behavior of the code of the of the running software right that's experiment tracking tracking of the process of retraining automatically retraining of the models model registry type of functionality is something close but source code is just part of the like it is just a metadata like like data sets are we going to store data sets as git quite likely not of course it is hard to predict but i don't see that this should be packed into one product in the next four or five years it is i don't believe it will be packed and when teams adopt uh what are some of the low-hanging fruit they achieve by turning on uh these sorts of services instantly they free some cognitive memory like team members don't need to you know jump to the ssh to a console and see how the model is performing they they can see it on the on the mobile they can see it on the you know nicely on as a chart they very often say compute power so those are the low-hanging fruits that comes within a week reproducibility is not the thing that is important in a week because usually you remember what you've been doing this week right but reproducibility comes after a month and what about long term what does that team appreciate six to 12 months out this is not about neptune it is more about you know experiment tracking in general as a function of the set of jobs that tools like that are doing i heard a couple of times from people who have not been using tool like that and and used use it the first time that it is no go back and they said that they feel that things are under control finally it is this kind of a comfort that you it is not so easy to break things i would say this way like in soft in source version control i try to find an analogy to source version control there is another like there is one aspect that that is not necessarily so relevant in envelope space is that source version control comes with two things like one is the backup but second even quite like even more important is merging without source version control and merge functionality it is almost impossible to build bigger software as a team together yeah because it was hard to merge stuff source version controls and merge functionality help here in data science we are not necessarily merging okay you can always do some model and same but it is not this way of merging so i would say that that in the long run it is it is this confidence that you know what is happening how things were done you know that your team established a protocol of collaboration besides of course some efficiency benefits on daily basis can you tell me what you mean i'm not familiar with the phrase reasonable scale ml apps it is all about i would say simple observation that 99 of the companies are not doing much learning at hyperscale like you know google uber airbnb right so it is it is simple observation right but at the same time most of the good blog posts like really good blog posts white papers conference talks are about are created by people working in such companies and showing like what an incredible job they are doing people are then trying to you know apply some good practices how to do how how we should go with tech stack with processes around operating modes and production that are naturally biased or hyperscale but they are not even close to ops to google have different needs yeah there is a lot of over over engineering uh in the field yeah yeah i mean the average you know small business that could benefit from machine learning and production uh they don't need to do so at 5 000 transactions per second you know but not like we can so call small but medium level even medium level when you are processing millions of transactions on daily basis it is still not so huge and you quite like you wouldn't need to retrain your model every eight hours a lot of things can be way simpler so not necessarily you know going with the architecture good practices coming from the biggest players quite likely is not a good idea for most so does that open opportunities in the market for a team to help with solutions that are more addressable for everyone else i think that definitely definitely we are we are trying to do that not only by two and by our products we are pretty active when it comes to our blog to try to interview some of our customers and but not only reasonable skate teams and try to understand and ask them okay how your tech stack look like usually those guys know how big guys are doing and if my stock is way smaller it sounds that small it is not so impressive but actually it is super useful because it is what most teams need to understand how similar teams were dealing with similar problems well that sounds like a really good resource we'll have a link in the show notes but for listeners at their terminal now it's neptune.ai slash blog well fiora thank you so much for taking the time to come out and share your experience thanks a lot it was a pleasure that's it for today's episode you know the main reason a lot of people run experiments in production is because they have great uncertainty about the future there are different kinds of queries arriving over a period of time future is unknown uncertain and when let's say a platform wants to make some decisions about ads it doesn't really know what's going to come in the future so that's really where my interest arises from it's more from the mathematical connection with these problems that i'm interested in that's next time on data skeptic ad tech

Original Description

Today, we are joined by Piotr Niedźwiedź, Founder and CEO of Neptune.ai. Piotr discusses common MLOps activities by data science teams and how they can take advantage of Neptune.ai for better experiment tracking and efficiency. Listen for more!
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Skeptic · Data Skeptic · 0 of 60

← Previous Next →
1 Data Skeptic book giveaway contest winner selection
Data Skeptic book giveaway contest winner selection
Data Skeptic
2 OpenHouse - Front end and API overview
OpenHouse - Front end and API overview
Data Skeptic
3 OpenHouse Crawling with AWS Lambda
OpenHouse Crawling with AWS Lambda
Data Skeptic
4 [MINI] Logistic Regression on Audio Data
[MINI] Logistic Regression on Audio Data
Data Skeptic
5 Data Provenance and Reproducibility with Pachyderm
Data Provenance and Reproducibility with Pachyderm
Data Skeptic
6 [MINI] Primer on Deep Learning
[MINI] Primer on Deep Learning
Data Skeptic
7 Big Data Tools and Trends
Big Data Tools and Trends
Data Skeptic
8 [MINI] Automated Feature Engineering
[MINI] Automated Feature Engineering
Data Skeptic
9 The Data Refuge Project
The Data Refuge Project
Data Skeptic
10 [MINI] The Perceptron
[MINI] The Perceptron
Data Skeptic
11 [MINI] Feed Forward Neural Networks
[MINI] Feed Forward Neural Networks
Data Skeptic
12 Data Science at Patreon
Data Science at Patreon
Data Skeptic
13 [MINI] Backpropagation
[MINI] Backpropagation
Data Skeptic
14 [MINI] GPU CPU
[MINI] GPU CPU
Data Skeptic
15 OpenHouse
OpenHouse
Data Skeptic
16 [MINI] Generative Adversarial Networks
[MINI] Generative Adversarial Networks
Data Skeptic
17 [MINI] AdaBoost
[MINI] AdaBoost
Data Skeptic
18 [MINI] The Bootstrap
[MINI] The Bootstrap
Data Skeptic
19 [MINI] Dropout
[MINI] Dropout
Data Skeptic
20 [MINI] Gini Coefficients
[MINI] Gini Coefficients
Data Skeptic
21 [MINI] Random Forest
[MINI] Random Forest
Data Skeptic
22 [MINI] Heteroskedasticity
[MINI] Heteroskedasticity
Data Skeptic
23 [MINI] ANOVA
[MINI] ANOVA
Data Skeptic
24 Urban Congestion
Urban Congestion
Data Skeptic
25 [MINI] The CAP Theorem
[MINI] The CAP Theorem
Data Skeptic
26 Unstructured Data for Finance
Unstructured Data for Finance
Data Skeptic
27 Detecting Terrorists with Facial Recognition?
Detecting Terrorists with Facial Recognition?
Data Skeptic
28 Predictive Models on Random Data
Predictive Models on Random Data
Data Skeptic
29 [MINI] Entropy
[MINI] Entropy
Data Skeptic
30 [MINI] F1 Score
[MINI] F1 Score
Data Skeptic
31 Causal Impact
Causal Impact
Data Skeptic
32 Machine Learning on Images with Noisy Human-centric Labels
Machine Learning on Images with Noisy Human-centric Labels
Data Skeptic
33 The Library Problem
The Library Problem
Data Skeptic
34 Stealing Models from the Cloud
Stealing Models from the Cloud
Data Skeptic
35 Data Science at eHarmony
Data Science at eHarmony
Data Skeptic
36 Multiple Comparisons and Conversion Optimization
Multiple Comparisons and Conversion Optimization
Data Skeptic
37 Election Predictions
Election Predictions
Data Skeptic
38 [MINI] Calculating Feature Importance
[MINI] Calculating Feature Importance
Data Skeptic
39 MS Connect Conference
MS Connect Conference
Data Skeptic
40 Music21
Music21
Data Skeptic
41 The Police Data and the Data Driven Justice Initiatives
The Police Data and the Data Driven Justice Initiatives
Data Skeptic
42 Studying Competition and Gender Through Chess
Studying Competition and Gender Through Chess
Data Skeptic
43 [MINI] Goodhart's Law
[MINI] Goodhart's Law
Data Skeptic
44 Trusting Machine Learning Models with LIME
Trusting Machine Learning Models with LIME
Data Skeptic
45 [MINI] Leakage
[MINI] Leakage
Data Skeptic
46 Predictive Policing
Predictive Policing
Data Skeptic
47 Mutli-Agent Diverse Generative Adversarial Networks
Mutli-Agent Diverse Generative Adversarial Networks
Data Skeptic
48 [MINI] Convolutional Neural Networks
[MINI] Convolutional Neural Networks
Data Skeptic
49 Unsupervised Depth Perception
Unsupervised Depth Perception
Data Skeptic
50 [MINI] Max-pooling
[MINI] Max-pooling
Data Skeptic
51 MS Build 2017
MS Build 2017
Data Skeptic
52 Activation Functions
Activation Functions
Data Skeptic
53 Doctor AI
Doctor AI
Data Skeptic
54 [MINI] The Vanishing Gradient
[MINI] The Vanishing Gradient
Data Skeptic
55 CosmosDB
CosmosDB
Data Skeptic
56 Estimating Sheep Pain with Facial Recognition
Estimating Sheep Pain with Facial Recognition
Data Skeptic
57 [MINI] Conditional Independence
[MINI] Conditional Independence
Data Skeptic
58 MINI: Bayesian Belief Networks
MINI: Bayesian Belief Networks
Data Skeptic
59 Project Common Voice
Project Common Voice
Data Skeptic
60 [MINI] Recurrent Neural Networks
[MINI] Recurrent Neural Networks
Data Skeptic

The video teaches the importance of ML Ops best practices, including experiment tracking, model monitoring, and collaboration, and introduces Neptune.ai as a tool for efficient ML ops. It highlights the need for a protocol of collaboration between data scientists, ML engineers, and DevOps engineers to prevent production mistakes. By following these best practices, organizations can improve the efficiency and effectiveness of their ML ops.

Key Takeaways
  1. Start using Neptune AI from day zero
  2. Log training metrics and model evolution
  3. Log reference to code and data sets
  4. Use Neptune AI as a sophisticated print function for logging and tracking
  5. Establish a protocol of collaboration between data scientists, ML engineers, and DevOps engineers
💡 Experiment tracking and model monitoring are crucial for efficient ML ops, and collaboration between data scientists, ML engineers, and DevOps engineers is essential for preventing production mistakes.

Related AI Lessons

DevOps Took 10 Years to Mature.
MLOps is distinct from DevOps and solves unique problems, requiring a different approach
Medium · DevOps
Praesto: A Kubernetes Operator for Node-Local ML Model Caching with CSI
Learn how Praesto, a Kubernetes Operator, optimizes ML model caching for Node-Local storage with CSI, reducing costs and improving performance
Medium · DevOps
Beyond `ollama run`: Production-Ready DeepSeek R1 Deployment with vLLM and Nginx
Learn to deploy DeepSeek R1 with vLLM and Nginx for production-ready environments, moving beyond local development
Dev.to · Shannon Dias
MCP Health Check: Building Production Monitoring for Your MCP Server — What I Learned After 84 Production Outages
Learn to build production monitoring for your MCP server to minimize outages and ensure smooth operation
Dev.to AI
Up next
Pole Pruner How A Rope Lever Shears High Branches
Innoforge Studio
Watch →