Jennifer Prendki Interview - Agile Machine Learning - TWiML Talk #46

The TWIML AI Podcast with Sam Charrington · Intermediate ·🎯 Management & AI-Era Leadership ·8y ago

Skills: ML Maths Basics90%Supervised Learning80%ML Pipelines70%Fine-tuning LLMs60%LLM Engineering50%

Key Takeaways

Jennifer Prendki discusses agile machine learning, data science, and team building, covering topics such as machine learning lifecycle management, model performance tracking, and data quality, with tools like Hadoop, Cloudera, TensorFlow, and Kubernetes.

Full Transcript

[Music] hello and welcome to another episode of we'll talk the podcast where I interview interesting people doing interesting things in machine learning and artificial intelligence I'm your host Sam Charrington the details are set for the next twill online meetup mark your calendars the September meet up will be held on Tuesday the 12th from 3 to 4 p.m. Pacific time the discussion will be led by Nicola Kuchar Dava who will be presenting learning long-term dependencies with gradient descent is difficult by yoshua bengio and company this is one of the classic papers on recurrent neural networks so you won't want to miss it for additional details or to join the Meetup head over to twiddle AI comm slash meetup if you missed the first meetup the recording is available on that page as well my guest this week is Jennifer pranky that name might sound familiar as she was one of the great speakers from my future of data summit back in May at the time Jennifer was a senior data science manager and principal data scientist at Walmart labs but she since moved on to become head of data science at Atlassian back at the summit Jennifer gave an awesome talk on what she calls data mixology the slides for which can be found on the show notes page at Wilma Lee I comm / talks last 46 our conversation this time begins with a recap of that talk after which we shift our focus to some of the practices she helped develop and implement at Walmart around the measurement and management of machine learning models and production and more generally building agile processes and teams for machine learning Before we jump in I want to give a big thank you to our friends at Cloudera for sponsoring this show you probably think of clutter primarily as the hadoop company and you're not wrong for that but did you know they also offer software for data science and deep learning yep they do the idea is pretty simple if you work for a large enterprise you probably already have a dupe in place and your Hadoop cluster is filled with lots of data that you want to use in building your models but you still need to easily access that data process it using the latest open-source tools and harness bursts of compute power to train your models this is where cloud eras data science workbench comes in with the data science workbench cloud era can help you get up and running with deep learning without massive new investments by implementing an on-demand self-service deep learning platform on existing CDH clusters from a tech perspective data science workbench is pretty neat it uses kubernetes to transparently schedule workloads across the cluster supporting our Python and Scala and deep learning frameworks like tensor flow Karis cafe in Theano and as of last month's 1.1 release GPUs on the Hadoop cluster are fully supported the folks at Cloudera are so confident that you're going to like what you see that for a limited time they're offering a drone to qualified participants who register for a demo of the data science workbench for your demo and drone visit to Amelia comm slash cloud era and now onto the show all right everyone I am on the line with Jennifer pranky Jennifer is a senior data science manager at Walmart labs specializing in machine learning and I am super excited to have her on the line with me Jennifer welcome to the show hi nice to be here yeah nice to have you here and it is so nice to speak with you again folks recognize Jennifer's name it's because Jennifer was one of the speakers at the future of data summit and she so graciously offered to spend some time with us to talk a little bit about what she's doing at Walmart labs Before we jump into that Jennifer why don't we have you spend a little bit of time talking about your background and how you ended up working in machine learning at Walmart sure so actually when I tell people what my background is they're a little bit surprised because I'm actually a particle physicist originally and so the reason why it's not as crazy as you might think at first is that I was doing the type of particle physics where you have lots of data to to treat and so I was actually working with huge amounts of data even before the word data science become as trendy as it is today so so I mean the reason why I eventually switch to pure data science and specifically retail data science is that I was looking for people like a lots of data interesting data to work with and so it actually turns out that retail has lots of very interesting challenges for someone passionate with data to work with so here I am really fantastic fantastic can you tell us a little bit about the talk that you gave at the summit what were your goals for that presentation yes so my topic for the summit was something I call data mixology right I mean so my goal was to try to you know set aside people to the fact that the real challenge with with big data today is not necessarily velocity or volume as people think it's it's really about variety right because when you start plugging in several data sources sometimes you have to rethink your model entirely and you have to deal with all challenges related to data silo and understanding the quality of the data coming from different sources and so yeah I mean I really thought that this was a topic that was not necessarily covered enough and different conferences that I had been around recently so I thought it was an interesting topic to cover it was definitely an interesting topic and it was clear as you were delivering it that it came from your experience how did these issues of silos manifest themselves in in your world I mean so the way to come around on my experience is that I recently started a new team where essentially the goal is to try to use both stores data from Walmart and online data from Walmart and bring them together and so in the word word truth is I mean the Walmart ecommerce business and the Walmart stores business are essentially separated it's not the same people and even the data lives in separate places it's not necessarily trivial for an e-commerce data scientist at Walmart to access the store sales data for example and so as we were trying to bring these two words together I actually came to discover first hands all the different challenges you have from bringing different data sources together even when it comes from the same company so this is exactly how how I came you like to come come to speak about this topic hmm and a lot of companies are pursuing ideas like data Lakes or you know that idea by various different names is that something that you guys ended up doing or did you take a different approach to integrating all this data now we're absolutely taking that direction right I mean but as you can imagine right I mean the challenge for Walmart is really that you have a Walmart ecommerce which is a tech company that is more recent and really like a typical Silicon Valley company and on the other hand this huge the Walmart's company legacy company that has lots of data they actually been gathering data for a long time now I think they're one of the first companies actually realized that data was so important and so you really have to deal with different types of systems altogether necessarily using the same technology so we're definitely after the creation of the italic where all the scientists across the company would be able to come and look at they're the same beta but it's a long road where they think every company that is trying to tackle this this challenge knows that it is a long road and it requires a lot of different skill set and lots of different people even expertise to actually achieve the goal it's funny I think the way that some of the vendors in the space talk about it is that you just set up you know set up a Hadoop cluster and run some ETL jobs and you'll have a data Lake what are some of the challenges that you ran into and what makes it what makes the road long well I mean so I mean I'll give you a specific example so one of the very interesting data sets that everybody across the company wants to work with is the online engagement data right I mean essentially which items does the customer actually click on and what do they even change by rate and so this is a data set that for example stores doesn't have access to because they don't have engagement they just have their final purchases so they don't have any way to measure properly ha the interest of a customer as long as they don't purchase something and so people have keep like actually getting this data from us and they actually get a data dump right and they don't necessarily create like exhaustive signal pipelines to get this real-time and so there are lots of different versions of these data that live across the company whenever we almighty commerce make a change to this data it's not easy to communicate these changes to other teams and so one of the challenges is you don't necessarily know anymore which which is the original source of truth and so in that specific case maybe easier because you know who the owner is but in some other cases you we don't necessarily even know where the data is coming from and so everybody's interested in the same data but this data exists in multiple versions and it's actually very hard to come up with Jonica a procedure to actually figure out which one is the best one in which one is the accurate source of truth that actually gives us a really interesting segue to one of the main topics that I wanted to dig in with you here on odd casts and that is one of the interesting aspects of your role is leading a team that's focused on measuring and auditing for the various machine learning models at Walmart and you mentioned you know the source of truth and data Providence is kind of one you know small aspect of that can you tell us a little bit about your role and some of the type of work that you're focused on in that role right definitely so so I'm actually part of a group called search algorithms team so we're essentially the group of data scientists and machine learning experts that take care of all machine learning algorithms that you would see at work on the walmart.com page right and so that includes learning to rank algorithms and involves everything related to the understanding of the customer so so we actually split down the responsibilities on my team into three different portions so there is something called the perceived team which is essentially in charge of trying to understand what the customer wants right and so who are we understanding it involves a lot of a natural language processing algorithms auto-completion algorithms spell checking algorithms would be their responsibilities okay then there is the guide team so the guy team is about learning to rank and showing the right items once you you think you understand what the customer is looking for and then we have this measure team which is the which is my team that essentially takes care of helping the others understand their weaknesses suggest new data sets that they can use suggest best practices make sure that these are your other algorithms are we trained properly at the proper frequency catch problems early on so we're essentially creating models to take care of other models right I mean so we create specific measurement scoring systems that range from data quality to make customer satisfaction so we're trying to bring Leica essentially we're the team that gets a real profound understanding of the other algorithms in order to help the others understand what they need to do to make it even better and are you primarily focused on helping the search teams or are you do you also work with teams outside of search that are doing data science and machine learning so that's an interesting question because my original mission was definitely to help the search team but we're actually it turns out that we are the only measured team within the company and so on once people started understanding what we're doing we actually get lots of requests from other teams to actually help them as well right I mean so and so yeah search is obviously an area where you have lots of different teams that are involved with us right and so we're like focusing on on search but you can imagine that the team in charge of the of the inventory and the catalog is also a team teams that were very close closely working with so it's pretty natural that we also bring measurements for them and so another area where we're also partnering with other teams is that we actually created an entire process whole I sell a machine learning lifecycle management which is essentially a checklist of things that we believe all machine learning models should I mean people who work on machine learning models should do before pushing something to production and so I actually turns out that we have a pretty efficient system now so I mean we essentially requiring data scientists to provide you know like a very clear view of what the accuracy is but also what the performance of the algorithm is in terms of the amount of CPU that their model consume when they're retraining and so forth and so on and so on we are not trying to expand this this process to the entire ecommerce section of Walmart and actually turns out that lots of people are interested by that because the challenge in data science is oftentimes in a company like ours you have machine learning engineers who are really like engineering people who don't necessarily understand the limitations of data science properly speaking right so they are not necessarily trained to think in terms of evaluating the accuracy and making the proper checks before sending something to production the other type of people who are really looking forward to see our model in action and they don't necessarily take the time to evaluate the statistical performance of the models and so on creating this like this process is really making sure that everybody's on the same page that things are running properly in production it's interesting think of a few years ago when the software development community went through this process of like industrializing the delivery of software and that resulted in ideas like lean and agile methodologies and DevOps and things like that and it sounds like you guys are kind of on the you know the front the cutting edge of an industrialization wave of machine learning not to be confused at all with the industrial AI line of inquiry that we've talked about here in the podcast recently but I love this idea of a machine learning lifecycle model what can you tell us about that model and the you know the various steps and stages and requirements that you've put in place for the teams there right now I mean you're definitely right about your maker then being like a new wave of agile right I mean I train for data science our our machine learning this is exactly what we're after so I mean as as we were putting the like the first steps together I actually came to realize that it is really a cultural problem right I mean because if you want to reach the stage where things are done properly you're really about trying to fix tech debt but people usually think of tank that as code that right and I think this is the way that people came to know our code that and truth is take that is much more than this right I mean there's a-- they're actually more pieces to take that and just code that there is a definitely data that related to you like are the quality of your data but also the datasets that you may not be using but your competitors are using right and so if you are actually in this situation where for example we know for example that amazon is using a specific data set that we have but we're not using currently we are in the data density short wait hmm then there is the notion of system debt so the case were you using legacy systems and you're not are improving and getting to use the latest versions of a specific software or know like our newest kids a cutting-edge software that that isn't read of the industry and then you have machine learning that so machine learning that is really when you're using a machine learning not to the best of its ability right I mean for example if you don't understand at which frequency you should be retraining a model you don't understand you don't monitor the inputs and outputs it's definitely also a situation that you have to take care of so I mean the steps that you like when somebody asked me how what should I do to actually that get started with like an automation and try to make up basically audit my models what should I do so my answer to that is it's not necessary something that's very complicated it's really about a process and also creating the culture in your company where everybody understand that making things right is important and so it really depends on the kind of model you're dealing with but like usually one thing I suggest everybody should do is make sure that you document everything that you're doing right I mean so it may sound like a I feel like cheesy answer you know but it's definitely super important we actually turned out that most of the times when we we didn't have a model performing well enough it wasn't necessary because of the model itself it was because we didn't have a clear understanding of what the model was doing writing so we were not able to reproduce the same model there was a lack of transparency and so for example you would have a new engineer coming over and take it trying to take over the project and they wouldn't even know what how the model was built so the other thing is you're it's extremely important that you have a clear understanding of what your failures and weaknesses were so that I mean people tend to forget that you're like in the concept of machine learning lifecycle management there is the worst cycle right I mean so there is an opportunity for everybody to learn about their weaknesses in order to make sure that the next iteration of your model is better and you're like so so definitely like a think about the culture that you have to bring in your company and make sure that you're keeping track of everything you're doing that it is very clear the data you're using it is very clear that you understand the quality of your data and you understand your challenges on the various teams there can you tell me a little bit about the relationship between data scientists and people with a statistical orientation and developers and engineers yeah I can absolutely tell you about that's all so actually if my team has a statistical analysis data scientists and machine learning engineers and so people sometimes struggle to understand what the difference is so really our in our view our statistical and I start people who know how to play with the data read you very well so they very centrally like I can get you you know like a very clear understanding of whether your data is sufficient entropy and sufficient variance for you to build a model and then can give you answers very quickly equal to get started the data scientist is actually the person that would I would say like prototype model right and so once you have an understanding that your data is good enough for you to solve a specific problem the data scientist will come up with the solution and essentially try to assess which which is the best type of machine learning model for you to to solve that problem so we don't necessarily expect like the statistical amends to be someone who's an expert in machine learning I mean of course they they have some understanding but they are not the persons that be in charge of creating a model and then the machine learning engineer is someone that knows how to optimize this machinery model and make it work at scale so so that they're really like focusing on making everything efficient and and I mean they really have the ability to push that to production so having all these skill set together in one team has been really helpful for us because it really helps us move things to production really quickly one of the things I've seen in the past with organizations that have a model similar to yours although I think less sophisticated in the way you are managing it in the machine learning lifecycle processes that you've introduced is a little bit of friction in kind of the interface between the data scientists and the machine learning engineers where you would have a data scientist you know create a model kind of coded up using you know maybe even a set of tools that the are not the set of tools that the ml engineers are working with kind of throw it over the wall and then have this machine learning engineer who you know is maybe less sophisticated in understanding the the model you know try to implement it often in you know going from you know Python for example to Java or something like that and that both resulting in you know creating an opportunity for the introduction of errors as well as slowing cycle time and iteration time just because of the back and forth over this barrier how have you guys seen that at all and how have you addressed it now so I definitely saw see how that problem can arise right I mean so I think like at the very beginning when this team was still very nascent I mean we definitely had that problem the way we we kind of sold it is that there is actually a very decent overlap between the data scientists and the machine learning engineer and so usually the data scientist would actually code something which is a pretty close what would end up being in production except that it is not necessarily functioning as scale right I mean so usually they use the same language so that that's for sure the other thing is we make sure that I actually like I have my machine learning engineers and my data scientists work in pairs okay so the machine learning engineer is actually involved in the early stages as well but he's not the technique for that portion right and so it actually gets to be involved and immerse with the model like very early on which gives him some more sophisticated understanding of the model that makes it easier for him to him or her to actually push it to production later so we are we don't really have like this French assertion face it's really like the entire pair is working for the process acceptance the first phase is the phase where the data scientist is in charge and the last phase is the phase where the machine learning person is in charge okay that's another really adaptation of the agile idea or at least the pair programming notion of agile to absolutely this machine learning lifecycle interesting interesting so you you develop these models you get them in production and then you are tasked with tracking and measuring and auditing their performance not just when you were putting them in the production but over time tell us a little bit about that cycle yeah sure sure so so I mean the interesting thing was pneumonic at first when we came up with this new model of having like an external team kind of measuring things was a pretty interesting read because our other teams up to that point in time they were actually used to essentially come up with a success metric that they would use for 20th century build a model and they would actually use the same success metric for measuring and auditing this model themselves right and so the value proposition was that you are kind of in the situation where you are you have a conflict of interest rate right on mattered even even if you want to be like a really truthful I mean if the same person is actually coming up with measurements and actually assessing their own models they don't necessary these things in a different light right I mean so the value proposition here is that you have a different person that doesn't know or knows very little about the model auditing things and actually come up with their own definition of what success means for that model right so we had a little bit of tension at the beginning as you can imagine right because it's almost like you use the word auditing right and that's definitely what we do right and so when you're in a situation where everybody's wondering like well what is the the status of my mother or these guys are going to find anything wrong with my mother so it took some time for us to actually like make it very clear that we are not really actually judged your work we're actually here thank you improve it right there right but I think everybody's very comfortable right now that we actually in charge of you know making sure of that the of the quality of the model so we have a very good dynamic with the other teams right now when it comes to measuring the performance of the models that you guys are using are you focusing on business metrics or technical model performance metrics or a combination of both it's definitely a combination of both I mean so the reason why we believe that there should be an entire team focused on bases as you can imagine there is not one single metric per mother nature so we actually have like some models actually use like several metrics or several tens of metrics to actually make sure that we have a comprehensive view of how the model is performing and it ranges from ulica how accurate is the model to how efficient is the model in terms of Jamaica is it using too much CPU as I mentioned earlier and is it is it impacting the customer in the proper way right so our belief is that you should have a specific metric for every single model separately so in retail it is pretty traditional to use a typically like the number of AD 2 cards or the number of clicks or even the revenue as a measurement of your next success when you run for example run a be test run so our belief is that because you have these two steps right understanding the customers who perceive and guiding the customer through guide we believe that you should have metrics specific to each one of this portion specifically because otherwise you are looking at all models in terms of add two cards it doesn't really make sense right because the perception phase is really about understanding the customer and unnecessarily so for example if I have a drop in add two cards it is possible that my new perceive algorithm is really working well but because there is a bottleneck with the guide phase I won't see that this model is performing well right I mean so really making sure that you have very narrow and very specific metrics even if it means having many of them is definitely working very well for us I guess I have mixed feelings about that hearing it I wonder about local minima local maxima or I guess probably a better way to put it is unit test versus integration test or system tests like what if your you're creating you have a measure that the perceived team is able to maximize but it doesn't maximize the overall you know metric of you know something like revenue or an Add to Cart how do you manage that that's a very good question actually we see that problem very often so basically like you would you would have a new perceive algorithm that performs really well but you actually see that it actually causes the the guide performance to drop right and so you definitely have this kind of cannibalization problems let me give you an example right and so we actually figured at some that's when you're actually improving the accuracy or the efficiency of your auto completion algorithms it which essentially drops the like the performance of the spellcheck algorithm why because if people can use the the auto completion algorithm they're not gonna finish entering the categories by ad which means that the spell checking algorithm is not called that often right and it's pretty logical if you think about it so so I mean this is exactly the kind of thing you want to observe because in that specific scenario that essentially allows us to say you know what it's worth investing more time making a perfect Auto completion algorithm rather than making your perfect spellcheck algorithm and so you actually use the same efficiencies to determine which algorithms you should focus on interesting a related question that I've had for folks in the retail space is around short sighted versus long sided models and and this an example here might be you know as as we talked about it's pretty common to to optimize your models around Add to Cart or even you know short term you know even immediate revenue creation or even something like profitability to be kind of one level higher in business impact but I wonder if when you're doing that if it's possible that you are sub optimizing the broader metric like customer lifetime value or something along those lines is that something that you think about there at all now we definitely have that as a metric so you suggested like a customer lifetime value this is one of the metrics you will monitor against the entire process which is why I say that you need to have several metrics for every model right and so we make sure that we keep track of all different aspects and dimensions of the problem but as in always in business at the end of the day you have to follow this this decision as well right I mean so if the goal of the company is to increase revenue drastically over the next quarter I mean you at the end of the day you align your decision based on this as well right I mean at the end of the day the final choice I'll find you're like which algorithm you should improve comes down to a business decisions our goal is really to make sure that they have a whole information in hand and handy to actually make a decision based off that meeting so whatever they decide to do we make sure that they are aware that if they choose to do a specific or take a specific decision it may impact customer lifetime value all these kind of things are there other instances where you are where you're working to balance short term versus long term optimization targets well I'm sorry I mean obviously for as far as I've seen things at Walmart so far is really like this kind of optimization would come down to a business decision right and you know I don't think we're already reach the level where we can forecast predict the future well enough to actually make a like okay get to this side to like a comprehensive knowledge that brings everybody on on the embassy page for sure right right are you in the process of auditing these various teams do you have a list you know either formal or in your head of you know these are the top end things that people tend to do wrong or put another way what's your advice for folks that you know want to learn from you know what you've learned from your team's on you know how they should approach modeling right definitely so I would see the three things that I believe are and I could take up is for everybody who's trying to tackle this problem so the first one is definitely what I would say before making sure you document everything especially in large organizations where the turnover of young employees is really high right I mean you want to make sure that if something went wrong with a past model at least you know what went wrong and you're you have the ability of fixing this in the next situation and so make sure that anyone can actually grab that model and we produce the same results that's one thing the other thing is I actually notice that many times when our models are unsuccessful it is essentially not due to a performance issue from the model side is actually a problem with the inputs so a failure in one of the systems or like typically retain something you could see happening is a seasonality pattern right and so basically your model was meant to function well for Anika your inputs to be in a specific range and you have to make sure that it is still the same range right I'm so actually monitoring the inputs and the outputs goes a very long way it doesn't necessarily mean that you have to monitor like a things very closely but like essentially get a get a sense that the number of average number of auto cars you see on a specific day still like you know like are pretty close to what you would expect them and what it was when you actually trained your mother the last thing is I would say that one issue I've seen as well as not necessarily an issue but the scientist attend to you're like I would think that not necessarily ever fit in the way you would think about it but like used too much data for our models so something that we are actually requiring for from all our data scientists now is that when they suggest a specific amount of data for to retrain the models we actually ask them to train the same model exact same model but with the lesser amount of data and they actually do that for several data points and we actually build this curve off you know like it's actually like a CPU consumption versus accuracy of the model and actually turned out that in our case many people who are using like I would say like four times to manage data compared to what was actually needed Wow so it's actually that means that you're using four times too much CPU right there is no or like a taking you to four times longer to train these models so it's actually of course it's better to use more data but if you're going to increase your accuracy by just one percent by throwing four times as much data it doesn't really make sense right there means for me so I'm definitely I think that data scientists are not trained to think in terms of money optimization rate and variable so this is something that I will we made sure now that everybody's like actually aware you know like conscious of the amount of CPU they're using without render mode and have you developed a set of rules of thumb is there a way to generalize that or is is the right way for them to do they always need to run the models with four different data points and understand where the kind of that utility curve and pick the right point on it this is the way we're functioning right now right of course for the future I've had some hope of coming up with that I mean it's a very iterative process waiting so we've been thinking about this data I mean as we ask people to do things people actually start creating their own scripts and their own tools to actually perform these tasks so when something comes across as being easy to generalize we try to make sure that this is also accessible to other team members and so over time we actually building this database of tools that everybody can use for their specific problems and so we're moving towards automation it's just like it's a very slow process because as you may guess like a we have very different types of models and not everything can be reused for the models as well mm-hmm do you have some kind of tool or platform in place for deploying and managing the various models or the individual teams do that themselves for their their own services I guess but you know part of the question is thinking about it like what's happening on the dev side of things folks are forming you know DevOps teams around micro services that you know have full lifecycle responsibilities for those services are you doing similar things around models yes so we're moving in that direction so we actually we're developing our own compute platform where essentially all models will be trade and so that that platform would actually be talking to the data alike directly right I mean but then again it's a it's a very slow process because you have to train people to use that new platform there is there are some paradigm that are not necessarily very obvious to everybody we try to make sure that you everybody gets to use their favorite language that platform but essentially we're also loading that compute platform with the tools I was mentioning before so that everything is in one place everybody's aware of like what tools exists to make your life easier as a data scientist of machine learning engineer mm-hm and is this a home run platform or something that you're yes I'm imagine when you talked about the monitoring the inputs and outputs of the models that struck me is really interesting and I imagine some platform that you know you would tie into a monitoring system that when you're you know as part of your documentation phase you're able to describe the expected bounds of a given model and then this thing is monitoring the inputs and if it starts if you start seeing inputs outside of the bound this thing would shoot off you know red flags and start paging people have you have you gotten there yet or is that part of what yes absolutely this is exactly what we're what we're trying to do right now right it's a four I mean the the challenge with these specific is that when you have like a like supervised models and like a numerical data set is fairly easy to monitor the inputs right I mean so on from some other models like especially NLP based models how do you keep track of you know like oh the language is changing right now among the one customer right I mean so so it may be more or less complicated to actually monitor these inputs but we we are you are definitely developing that vitamin so for some of the models it's actually already in place where essentially we whenever an input goes outside of like minus 2 Sigma plus 2 Sigma bender it's actually shooting an email to the person in charge are in charge of monitoring the bottle and they would actually know that you're like something is potentially about to happen right and so we're definitely geared towards this like I mean one thing we definitely want to achieve in the near future is a bubble that allows you to understand that your model is expiring before before its time right I mean for right now I think most companies are thinking of retraining models in terms of irregular cycle sequence right and basically I retrain my model every other current question is when you're in retail there may be lots of happenings there may be during holidays and sometimes you have to retrain things faster unless you have something in place to let you know that the model is about to change or needs to be updated you would actually learn that by a customer complaining about getting the wrong results or something that's not accurate or relevant to their search or anything so you don't want that to happen because it essentially involves that the customer needs to have a bad experience for you to be aware that something's wrong with your model and so right we want to make sure that we can catch these problems early in the process before it actually impacts the customer hmm and so what are some of the methodologies that you use to identify these expiring models well again it's about like finding the right metric to actually assess the satisfaction of the customer right I mean but I don't think there is like a one true only metric that works for for all cases but I mean like again it's the mission of the measurement team right there in measuring things and so another wise fiction sorry sorry for cutting you off with just a paraphrase in other words the model is expiring when it stops performing if there's not some other dimension to it yep right right okay interesting as part of this measure team you also are chartered with specifically looking for weaknesses in other people's models what does that look like and how do you approach that and and I guess I'm thinking of you know looking for you know corner cases or you know cases in the data that these teams might not have thought about that you know based on your experience you could foresee causing poor model performance how do you approach that part of the role now there are definitely two components to it right there so there's definitely like weaknesses that you will see you like a specific model that requires like a freaking training or is extremely sensitive to seasonality would be something that we would like to look at and try to figure out like what is causing this wait I mean so the way we do that is essentially we essentially keep track of for example the assume that your model is something a logistic regression model because it's a easy to easier to explain so you would be able to see like what parameters are extremely stable over time and essentially don't change even when you will train the model and which one of these parameters are actually extremely volatile and have a very big error to it right and so we would actually understand various precision but parameters are causing the model to underperform so that that's kind of like a reverse engineer other people's model in order to understand what the weaknesses are so that's that's one thing we do when we're trying to kind of automate the other pieces you're like something which is like you know like something you have an inkling that requires to be updated right and so an example something we've tried to do recently is that we were trying to add the notion of geo locations which he personalized the results depending on your location in the country right okay and so and so I mean you you know that needs this needs to be taking into account and you know that you're going to add that that feature in the model but the question is like what is your based data set and your best bet to actually add that to the math alright I mean so this is why we have statistical statistical analysts trying to assess the quality of the different data sets that we have we have available so this is where our table actually gets interesting because we get to touch to lots of different data sets across the company right and try to understand you like what is the data source that we could use to actually improve these signals and make our search engine better so do you have is this maybe goes back to our platform discussion a moment ago but is there a place that has a dashboard of all the models that are running in Walmart I guess I'm wondering at the Glen the granularity at which you track this like do you have a master view of all deployed models and their performance and you can do trend analysis across this and see you know where logistic regression you know type types of models work versus other things or are these things managed more on a product by product basis no we're definitely geared towards like at least for search I mean world we were definitely moving forwards to a phase where we get to see a holistic view of all models in production at one time right I mean so basically if lots of your mothers are using the same base model like it's fairly easy to do it it gets more complicated if you have many different types of machine learning models in production but we definitely believe that you should have a compressive view of everything before the reason we mentioned earlier that you have some crosstalk happening across models right there is possible that the fact that one model is the underperforming is caused by another one over performing and so on we believe that you cannot keep things segmented and just keep track of one product at a time yeah I really strongly believe that having a comprehensive view as much as possible is really important getting to the level where we have a comprehensive view of all the models across the company is going to be very challenging to what extent do you use machine learning models to manage these models and then how do I do that that's definitely what we want to do right and I sometimes call my team records know that I could see the Christic machine on their machine learning models of machine model so I think so essentially the way you would do that is essentially using the the parameters of the other models as a feature for another model right and so basically you're you're kind of as you mentioned earlier right and you want to many four things over time so essentially like a trend analysis would be something that could you know you could definitely use machine learning for for this type of management and are you doing this at all today or is it more directional getting started okay interesting interesting yeah I can imagine if you have all of your you know all of your model data all of your parameter data all of your performance data you know then part of what your measure team is able to do is someone brings you a model and some data and you can just run your meta model against it and predict whether their models going to work or not it sounds like a great application yeah awesome well is there anything else that your team is focused on that we haven't talked about so far well I mean I we've covered most of it I mean the one thing like I mentioned this effort to actually bring the stories lead that together with the online data so this is an effort we started pretty recently one of the challenges we're trying to tackle is the following so on search is actually an interesting problem because where as you can imagine we're using like lots of different data sources to rank the items we're showing to the customer right and so essentially we're using data related to the content of the items so if somebody searches for TV samsa you want to show that near like the right brand that the right product for sure but then the question is a amount samsung series which one do you want to for show first right and so the answer to that is you're showing the one that is the most popular so we sometimes run into a problem because I think for example after like a smaller type of item that you would usually buy in the store right and so sometimes people connect to walmart.com website they enter a search and they actually decide to go buy that item in store so the reason why they actually search for that item was to check the inventory in their local local Walmart store all right I mean and so for us as the search team this is really a problem because if you see someone click on the night-time but eventually they don't purchase it we take that as a bad sign that we didn't show the right item right and so so it's actually that would cause us to demote that item over time and it's very possible that the item that we showed was actually the one that the customer meant to see right I mean it's over it is very possible that eventually they bought that so closing the loop with that and actually like a attribute seeing a specific story purchase to a specific online search is something that we're trying to do now right and so I think people have heard of the new Google attribution ready but they actually get to to track you when you shop in store as well as online I mean essentially we're trying to do that for all for essentially mapping the lake up essentially meki mapping the gap between the the stores and the online experience mm-hm and that that's what the data like enables you to do by pulling all that information into one place and allowing folks to build models across it yeah interesting Anna are you to what extent are you using external day sources in building your search models so we do like I I don't know that we're losing like a lot of data so I mean the external data sources we mean socially uses to obviously for monitoring purposes right I mean so for example we're trying to catch instances where we have a coal stock problem right there and so if something doesn't sell really well at Walmart when you actually know this is a very popular item on the marketplace you would try to do something about it but we don't necessarily use that to create and build new models we're essentially focusing on our own data at this but got it right all right well this has been a really really interesting conversation and I appreciate you taking the time out to chat with us about what you're up to I think folks can learn a ton about the machine learning lifecycle management challenge and and and learn a ton from the way you guys have taken it on at at Walmart I really appreciate you taking the time to join us awesome thanks so much Jennifer all right everyone that's our show for today for the notes for this episode head on over to tunnel AI comm slash talk slash 46 whether this is your first or fiftieth show I want to thank you so much for listening I really want to hear from you so please take a moment to comment on the show notes page or on Twitter with your feedback or questions or just what you found most interesting and useful about this episode also if you share your favorite quote via a comment or social media we'll send you one of our fab laptop stickers another thanks to this week's sponsor cloudera for more information on their data science workbench or to schedule your demo and get a free drone visit to Malaya comm slash Cloudera if you subscribe to my newsletter you already know that I've got a busy month ahead as far as events go the week of September 18th Albion San Francisco for the O'Reilly artificial intelligence conference there's also a chance that on Saturday the 16th I'll make it to the scaling deep Learning Conference in SF which looks to be an interesting one the following week I'll be at strangely a great technical conference held each year right here in st. Louis now I love meeting up with listeners so if you're planning to be at any of these events please drop me a note via a comment the contact form or Twitter for more info on any of these events check out the show notes thanks again for listening and catch you next time [Music]

Original Description

My guest this week is Jennifer Prendki. That name might sound familiar, as she was one of the great speakers from my Future of Data Summit back in May. At the time, Jennifer was senior data science manager and principal data scientist at Walmart Labs, but she's since moved on to become head of data science at Atlassian. Back at the summit, Jennifer gave an awesome talk on what she calls Data Mixology, the slides for which you can find on the show notes page. My conversation with Jennifer begins with a recap of that talk. After that, we shift our focus to some of the practices she helped develop and implement at Walmart around the measurement and management of machine learning models in production, and more generally, building agile processes and teams for machine learning. The notes for this show can be found at twimlai.com/talk/46 Subscribe! iTunes ➙ https://itunes.apple.com/us/podcast/this-week-in-machine-learning/id1116303051?mt=2 Soundcloud ➙ https://soundcloud.com/twiml Google Play ➙ http://bit.ly/2lrWlJZ Stitcher ➙ http://www.stitcher.com/s?fid=92079&refid=stpr RSS ➙ https://twimlai.com/feed Lets Connect! Twimlai.com ➙ https://twimlai.com/contact Twitter ➙ https://twitter.com/twimlai Facebook ➙ https://Facebook.com/Twimlai Medium ➙ https://medium.com/this-week-in-machine-learning-ai

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from The TWIML AI Podcast with Sam Charrington · The TWIML AI Podcast with Sam Charrington · 50 of 60

← Previous Next →

Engineering Practical Machine Learning Systems with Xavier Amatriain - #3

Engineering Practical Machine Learning Systems with Xavier Amatriain - #3

The TWIML AI Podcast with Sam Charrington

How to Build Confidence as an ML Developer with Siraj Raval - #2

How to Build Confidence as an ML Developer with Siraj Raval - #2

The TWIML AI Podcast with Sam Charrington

Open Source Data Science Masters, Hybrid AI, Algorithmic Ethics & More with Clare Corthell - #1

Open Source Data Science Masters, Hybrid AI, Algorithmic Ethics & More with Clare Corthell - #1

The TWIML AI Podcast with Sam Charrington

Interactive AI, Plus Improving ML Education with Charles Isbell - #4

Interactive AI, Plus Improving ML Education with Charles Isbell - #4

The TWIML AI Podcast with Sam Charrington

Machine Learning for the Stars & Productizing AI with Joshua Bloom - #5

Machine Learning for the Stars & Productizing AI with Joshua Bloom - #5

The TWIML AI Podcast with Sam Charrington

Generating Labeled Training Data for Your ML/AI Models with Angie Hugeback - #6

Generating Labeled Training Data for Your ML/AI Models with Angie Hugeback - #6

The TWIML AI Podcast with Sam Charrington

Explaining the Predictions of Machine Learning Models with Carlos Guestrin - #7

Explaining the Predictions of Machine Learning Models with Carlos Guestrin - #7

The TWIML AI Podcast with Sam Charrington

Deep Learning: Modular in Theory, Inflexible in Practice with Diogo Almeida - #8

Deep Learning: Modular in Theory, Inflexible in Practice with Diogo Almeida - #8

The TWIML AI Podcast with Sam Charrington

Emotional AI: Teaching Computers Empathy with Pascale Fung - #9

Emotional AI: Teaching Computers Empathy with Pascale Fung - #9

The TWIML AI Podcast with Sam Charrington

Statistics vs Semantics for Natural Language Processing with Francisco Webber - #10

Statistics vs Semantics for Natural Language Processing with Francisco Webber - #10

The TWIML AI Podcast with Sam Charrington

Building AI Products with Hilary Mason - #11

Building AI Products with Hilary Mason - #11

The TWIML AI Podcast with Sam Charrington

Reprogramming the Human Genome with AI, w/ Brendan Frey - #12

Reprogramming the Human Genome with AI, w/ Brendan Frey - #12

The TWIML AI Podcast with Sam Charrington

Understanding Deep Neural Networks with Dr. James McCaffery - #13

Understanding Deep Neural Networks with Dr. James McCaffery - #13

The TWIML AI Podcast with Sam Charrington

Scaling Deep Learning: Systems Challenges & More with Shubho Sengupta - #14

Scaling Deep Learning: Systems Challenges & More with Shubho Sengupta - #14

The TWIML AI Podcast with Sam Charrington

Domain Knowledge in Machine Learning Models for Sustainability with Stefano Ermon - #15

Domain Knowledge in Machine Learning Models for Sustainability with Stefano Ermon - #15

The TWIML AI Podcast with Sam Charrington

Machine Learning in Cybersecurity with Evan Wright - #16

Machine Learning in Cybersecurity with Evan Wright - #16

The TWIML AI Podcast with Sam Charrington

Interactive Machine Learning Systems with Alekh Agarwal - #17

Interactive Machine Learning Systems with Alekh Agarwal - #17

The TWIML AI Podcast with Sam Charrington

Location-Based Intelligence for Smarter Marketing with Klustera - #18

Location-Based Intelligence for Smarter Marketing with Klustera - #18

The TWIML AI Podcast with Sam Charrington

AI-Powered Customer Support with HelloVera - #18

AI-Powered Customer Support with HelloVera - #18

The TWIML AI Podcast with Sam Charrington

Using AI to Simplify the Programming of Robots with Cambrian Intelligence - #18

Using AI to Simplify the Programming of Robots with Cambrian Intelligence - #18

The TWIML AI Podcast with Sam Charrington

Increasing Efficiency of Healthcare Insurance Billing with NLP, w/ Behold.ai - #18

Increasing Efficiency of Healthcare Insurance Billing with NLP, w/ Behold.ai - #18

The TWIML AI Podcast with Sam Charrington

Creating a Worldwide Financial Knowledge Graph with AlphaVertex - #18

Creating a Worldwide Financial Knowledge Graph with AlphaVertex - #18

The TWIML AI Podcast with Sam Charrington

From Particle Physics to Audio AI with Scott Stephenson - #19

From Particle Physics to Audio AI with Scott Stephenson - #19

The TWIML AI Podcast with Sam Charrington

Selling AI to the Enterprise with Kathryn Hume - #20

Selling AI to the Enterprise with Kathryn Hume - #20

The TWIML AI Podcast with Sam Charrington

Engineering the Future of AI with Ruchir Puri - #21

Engineering the Future of AI with Ruchir Puri - #21

The TWIML AI Podcast with Sam Charrington

Deep Neural Nets for Visual Recognition with Matt Zeiler - #22

Deep Neural Nets for Visual Recognition with Matt Zeiler - #22

The TWIML AI Podcast with Sam Charrington

Introducing Psycholinguistics into AI with Dominique Simmons- #23

Introducing Psycholinguistics into AI with Dominique Simmons- #23

The TWIML AI Podcast with Sam Charrington

Reinforcement Learning: The Next Frontier of Gaming with Danny Lange - #24

Reinforcement Learning: The Next Frontier of Gaming with Danny Lange - #24

The TWIML AI Podcast with Sam Charrington

Offensive vs Defensive Data Science with Deep Varma - #25

Offensive vs Defensive Data Science with Deep Varma - #25

The TWIML AI Podcast with Sam Charrington

Global AI Trends with Ben Lorica - #26

Global AI Trends with Ben Lorica - #26

The TWIML AI Podcast with Sam Charrington

Intelligent Autonomous Robots with Ilia Baranov - #27

Intelligent Autonomous Robots with Ilia Baranov - #27

The TWIML AI Podcast with Sam Charrington

Reinforcement Learning Deep Dive with Pieter Abbeel - #28

Reinforcement Learning Deep Dive with Pieter Abbeel - #28

The TWIML AI Podcast with Sam Charrington

Robotic Perception and Control with Chelsea Finn - #29

Robotic Perception and Control with Chelsea Finn - #29

The TWIML AI Podcast with Sam Charrington

Natural Language Understanding for Amazon Alexa with Zornitsa Kozareva - #30

Natural Language Understanding for Amazon Alexa with Zornitsa Kozareva - #30

The TWIML AI Podcast with Sam Charrington

The Power of Probabilistic Programming with Ben Vigoda - #33

The Power of Probabilistic Programming with Ben Vigoda - #33

The TWIML AI Podcast with Sam Charrington

Intel Nervana Update + Productizing AI Research with Naveen Rao and Hanlin Tang - #31

Intel Nervana Update + Productizing AI Research with Naveen Rao and Hanlin Tang - #31

The TWIML AI Podcast with Sam Charrington

Video Object Detection at Scale with Reza Zadeh - #34

Video Object Detection at Scale with Reza Zadeh - #34

The TWIML AI Podcast with Sam Charrington

Enhancing Customer Experiences with Emotional AI, w/ Rana el Kaliouby - #35

Enhancing Customer Experiences with Emotional AI, w/ Rana el Kaliouby - #35

The TWIML AI Podcast with Sam Charrington

Expressive AI-Generated Music With Google's Performance RNN with Doug Eck - #32

Expressive AI-Generated Music With Google's Performance RNN with Doug Eck - #32

The TWIML AI Podcast with Sam Charrington

Smart Buildings & IoT with Yodit Stanton - #36

Smart Buildings & IoT with Yodit Stanton - #36

The TWIML AI Podcast with Sam Charrington

Deep Robotic Learning with Sergey Levine - #37

Deep Robotic Learning with Sergey Levine - #37

The TWIML AI Podcast with Sam Charrington

Deep Learning for Warehouse Operations with Calvin Seward - #38

Deep Learning for Warehouse Operations with Calvin Seward - #38

The TWIML AI Podcast with Sam Charrington

Cognitive Biases in Data Science with Drew Conway - #39

Cognitive Biases in Data Science with Drew Conway - #39

The TWIML AI Podcast with Sam Charrington

Data Pipelines at Zymergen with Airflow, w/ Erin Shellman - #41

Data Pipelines at Zymergen with Airflow, w/ Erin Shellman - #41

The TWIML AI Podcast with Sam Charrington

Web Scale Engineering for Machine Learning with Sharath Rao - #40

Web Scale Engineering for Machine Learning with Sharath Rao - #40

The TWIML AI Podcast with Sam Charrington

Marrying Physics-Based and Data-Driven ML Models with Josh Bloom - #42

Marrying Physics-Based and Data-Driven ML Models with Josh Bloom - #42

The TWIML AI Podcast with Sam Charrington

Machine Teaching for Better Machine Learning with Mark Hammond - #43

Machine Teaching for Better Machine Learning with Mark Hammond - #43

The TWIML AI Podcast with Sam Charrington

LSTMs, Plus a Deep Learning History Lesson with Jürgen Schmidhuber - #44

LSTMs, Plus a Deep Learning History Lesson with Jürgen Schmidhuber - #44

The TWIML AI Podcast with Sam Charrington

Learning From Simulated & Unsupervised Images through Adversarial Training - TWiML Online Meetup

Learning From Simulated & Unsupervised Images through Adversarial Training - TWiML Online Meetup

The TWIML AI Podcast with Sam Charrington

Jennifer Prendki Interview - Agile Machine Learning - TWiML Talk #46

Jennifer Prendki Interview - Agile Machine Learning - TWiML Talk #46

The TWIML AI Podcast with Sam Charrington

Evolutionary Algorithms in Machine Learning with Risto Miikkulainen - #47

Evolutionary Algorithms in Machine Learning with Risto Miikkulainen - #47

The TWIML AI Podcast with Sam Charrington

Learning Long-Term Dependencies with Gradient Descent is Difficult - TWiML Online Meetup

Learning Long-Term Dependencies with Gradient Descent is Difficult - TWiML Online Meetup

The TWIML AI Podcast with Sam Charrington

Word2Vec & Friends with Bruno Gonçalves -#48

Word2Vec & Friends with Bruno Gonçalves -#48

The TWIML AI Podcast with Sam Charrington

Symbolic and Subsymbolic Natural Language Processing with Jonathan Mugan - #49

Symbolic and Subsymbolic Natural Language Processing with Jonathan Mugan - #49

The TWIML AI Podcast with Sam Charrington

Bayesian Optimization for Hyperparameter Tuning with Scott Clark - #50

Bayesian Optimization for Hyperparameter Tuning with Scott Clark - #50

The TWIML AI Podcast with Sam Charrington

Intel Nervana DevCloud with Naveen Rao & Scott Apeland - #51

Intel Nervana DevCloud with Naveen Rao & Scott Apeland - #51

The TWIML AI Podcast with Sam Charrington

AI-Powered Conversational Interfaces with Paul Tepper - #52

AI-Powered Conversational Interfaces with Paul Tepper - #52

The TWIML AI Podcast with Sam Charrington

Topological Data Analysis with Gunnar Carlsson - #53

Topological Data Analysis with Gunnar Carlsson - #53

The TWIML AI Podcast with Sam Charrington

ML Use Cases at Think Big Analytics with Mo Patel & Laura Frølich - #54

ML Use Cases at Think Big Analytics with Mo Patel & Laura Frølich - #54

The TWIML AI Podcast with Sam Charrington

Ray:A Distributed Computing Platform for Reinforcement Learning with Ion Stoica -#55

Ray:A Distributed Computing Platform for Reinforcement Learning with Ion Stoica -#55

The TWIML AI Podcast with Sam Charrington

Jennifer Prendki discusses agile machine learning, covering topics such as machine learning lifecycle management, model performance tracking, and data quality, with a focus on practical applications and real-world examples.

Key Takeaways

Create specific measurement scoring systems for data quality and customer satisfaction
Require data scientists to provide clear view of accuracy and performance of algorithms
Implement machine learning lifecycle management checklist
Evaluate statistical performance of models and make proper checks before sending to production
Monitor model inputs and outputs to ensure model performance
Train models with the same amount of data to ensure consistency
Develop a set of rules of thumb for data scientists to optimize model training
Create a database of tools for data scientists to perform tasks
Move towards automation of model deployment and management

💡 Agile machine learning requires a focus on practical applications, real-world examples, and continuous improvement, with a emphasis on data quality, model performance, and team collaboration.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Maths Basics

View skill →

Coding the GARCH Model : Time Series Talk

Coding the GARCH Model : Time Series Talk

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

AI and ERP: Hype vs. Reality

Digital Transformation with Eric Kimberling