Engineering Practical Machine Learning Systems with Xavier Amatriain - #3

The TWIML AI Podcast with Sam Charrington · Beginner ·📐 ML Fundamentals ·9y ago

Key Takeaways

Xavier Amatriain discusses his experiences leading the machine learning recommendations team at Netflix and his current role as VP of engineering at Quora, highlighting the importance of practical machine learning and the need to balance complexity with innovation speed, using tools like RNNs and deep learning judiciously

Full Transcript

[Music] hello everyone and welcome to twiml talk the podcast where I interview interesting people doing interesting things in machine learning and artificial intelligence I am very excited to share this interview with you for this show my guest is chavier amrien chavier is a former researcher who went on to lead the machine learning recommendations team at Netflix and is now the vice president of engineering at quora the Q&A site chavier and I spend quite a bit of time digging into each of these experiences in the interview here are just a few of the things you'll learn from our discussion why Netflix invested $1 million in a Netflix prize yet didn't use the winning solution what goes into engineering practical machine Learning Systems anyway the problem that chavier has with the Deep learn learning hype and what the heck is a multi-arm bandit and how can it help us of course I'll be linking to the resources we mentioned in the show notes which you'll be able to find at twiml ai.com talk3 it's Twi MLA ai.com SLT kthe number3 a quick note before the interview you've got just a few days left to enter into my drawing to win a free ticket to the O'Reilly a AI conference I'll talk about how to enter after the interview and in the show notes and now on to the [Music] show hey everyone I'm here with chavier amatan and chavier why don't we get started by you're at you're at quora now why don't we have you talk a little bit about what you do there sure so I'm at Kora and the BP of engineering so I lead the whole engineering organization right now uh my background though is more in machine learning uh previously to Kora I was at Netflix and I was leading the machine learning recommendations team at Netflix and even before that I was doing research and I was in Academia and my background again is on recommendations machine learning and so on and I've published papers on that uh space for some years uh so it's kind of interesting that somebody with this kind of background uh is now the VP engineering of a uh growing company like Kora where I I need to deal with a lot of different concerns not only machine learning right but it also tells you a little bit of story of what is important for Kora as a company as a product and that also aligned with with some of the trends that we're seeing in Industry right that more and more U the machine learning AI people that used to be like close uh in a room by a corner and they were like the weirdos in the lab now they're having a lot more influence on decisions that are being made on how to design products and how to run companies and uh in my case I'm that's probably like a one of the reasons that I'm uh in this position now leading the whole engineering organization because for us machine learning it's like like a big part of our success and how we're growing all right so there's there's a ton in there and and we'd really like to get to know you a little bit better so let's uh let's rewind a bit you mentioned that you spent some time in Academia you how did you learn machine learning where did you go to school and where did you where were you working in Academia yeah um that's a good question so I'm I'm actually kind of old for what you see right now in and I have a long uh uh history behind me and I'm saying that because so when I did my PhD which by the way I did it back in Spain I'm originally from Barcelona Spain so when I did my PhD I was mostly interested in Signal processing and particularly in Signal processing and systems design related to audio and music actually that's what my pH was based on uh and at that point in time uh it was that um um age when multi media and Signal processing was kind of like the hot thing and machine learning was not so much so I did use some machine learning here and there for different aspects of my research and particularly for some of the initial recommendation systems that I worked on uh that were related to music but it wasn't my core area so I was more into signal processing and systems during my PhD so I would say that I I got into machine learning more on the chops and uh after I Le left my you know my I did my PhD I went and did some more multimedia related research in uh the University of California Santa Barbara UCSB so I was there I was working on virtual reality inmersive environments and that was also very cool it's kind of coming back again now but uh I was really interested on that space combining signal processing and multimedia in this kind of immersive and virtual reality environment uh but after that I became more and more interested on uh the data side right like how do we use the data and how do we infer U information from the data and particularly very interested in how do we understand users from the data right so that's what kind of led me to for forget a little bit more about the signals that uh were a little bit like you know more there they're also data but they're like cold data that come from systems and focus more on the uh human generated data and try to build uh intelligent systems that understand so I I did then I I switched my research and went into uh working for a few years in recommendations and using machine learning and different kinds of approaches not only machine learning but also uh human computer interaction approaches to build this intelligent sort of like assistants that tell you what you like and what you don't like so that's what actually led me eventually into Netflix into leading the the recommendations theme there okay now you you dangled a big shiny object in front of my eyes and that is uh signals processing that was an area that I studied in grad school as well uh and I'm I'm curious um well a I'm curious if you could explain wavelets to me because that was one thing that always gave me a hard time but actually no we're not going to talk about that um I'm wondering if you see any parallels uh I'm wondering there are any interesting things happening at the intersection of signals processing and machine learning just out of curiosity do you have you seen anything uh there's uh actually a ton of those uh intersections there's there's more of like the principles and how they intersect but I would say probably more interesting now there is the intersection at the application side of things right so if you think about it um a lot of the uh systems that are now being be that are being built using um machine learning approaches uh particularly deep learning to understand uh things like speech recognition or image recognition those were considered in the past like signal processing uh applications and and for example uh although I didn't professionally Focus too much in speech recognition I did study quite a lot that and you know at that time we were using hidden Mark of models and these other techniques that for us in the signal processing World it wasn't you know they were just tools and means to an end so it wasn't like the most important part of the system although you know it was really like the core of it um but now that's moved towards uh some deep learning and rnns and so on uh so there there's always been an intersection right between machine learning and Signal processing and there's always a lot to say about how to interpret signals wherever they come from and those signals again could be audio could be me um speech music uh video uh images and you need to build system that actually uh either understand those things or even able to generate them in some way and there's always a uh well not always but at some point it's clear that that's evolved more into having a layer of intelligence in the middle that it's going to be learned and that comes from a machine Learning System that is sort of like at the heart of any of those systems great great so you you made your way from Academia and ended up at uh Netflix immediately prior to uh where you are now at quara and your focus there was on recommendation systems yeah uh I started with a very specific focus on recommendation systems we um which you could consider it as a continuation a natural continuation of the Netflix price the famous $1 million uh Netflix price which by the way that's what got me connected to Netflix because I was uh uh dabbling with it and also part of uh using that data set for uh for some of my research okay so um so yeah so I started with what you could consider like the continuation of that Netflix price but already working for Netflix and we eventually grew the team to to be more of a core machine learning algorithms team that was building not only recommendations but algorithms for search and for different things related to images and it was uh it grew to sort of like being a core machine learning SL algorithms theme that was serving different purposes uh Beyond recommendations but recommendations is uh something that is very important for Netflix right so that was really like probably the core of the team at any given time okay um so in terms of the you mentioned the next the Netflix prize am I correct that the the winning prize entry was never really implemented at Netflix uh I'm glad you asked this because I I I get this question all the time and uh I I react to it by saying it is correct the final entry was not used that doesn't mean that it was useless right so there's I'm saying that because people immediately when I say that and we wrote it in a blog post at when I was in Netflix at some point and even though it was very clearly explained people still took away like oh Netflix wasted a million dollars and they didn't use the outcome that's not true uh actually Netflix got way more than1 million back in research and in interesting stuff that is being used and was used in different parts of different systems right so uh so it's there's a difference between was the final entry used and the answer is no it was not used uh there were over 130 different uh machine learning models combining an ensemble uh most of the different models that were there were adding just a tiny increas in accuracy uh and a lot of complexity and they were not worth it so the reality is that two of the models on their own gave like uh enough uh accuracy that uh the other3 some were not needed or they were not worth the RI that said it doesn't mean that they were not useful to understand what they were adding and how they were adding it um so again the the the story is the final prizewinning entry with a complex combination of all of those methods in an emble was not used as it was but um the learnings were worth much more than what was invested in the price and part of the final winning entry the most important method were actually used directly in production okay okay yeah this is I came across this recently in an interesting blog post by Josh Bloom uh over at WISE and he talked about the economics of uh machine learning basically uh all of the various tradeoffs that get you know that come up when a real business is trying to figure out how to put machine learning in production and that was one of the examples uh he used about how I forget how many pages or something the the final algorithm was but 130 models that's a huge that's a huge model yeah and yeah and I know Josh very well so we're friends okay uh and he knows a lot about so we've talked about this um uh in person um and you know the the thing is that story is so uh juicy that you can spin it in many different ways uh I actually recently got uh this is pretty crazy but I did get in my Facebook feed an advertisement from math Works trying to sell me madlab that was using that story and saying something like Netflix did not use their final winning entry we can help you with madlab and I like what what does that have to do with mad lab right so I don't even I don't get where they're going at all with that well I I don't know but you know that that's the point is that uh yeah the the the real story is uh yes you do need to be concerned and I'm I I I'll always say the same I mean you know you need to be concerned about system complexity and about making sure that whatever you do in research is actually Deployable and it's uh and it's good to or easy to uh build engineering around it but that's very different from saying that the Netflix prize was a waste of time or money sure so can you maybe spend some time uh walking walking through some of the various factors right so you mentioned uh engineering time and there's you know so there's obviously like an implementability you from a complexity perspective you there are going to be data aspects um there's computational obviously you know when you think about uh you know practical machine learning and the the issues that you know you're you're in engineering uh VP of engineering now not a VP of machine learning research or something when you think about you know engineering these systems at Large Scale what are the things that you need to think about oh there's like a long list of things and you mentioned a few of them uh system complexity is one which actually spans into uh different sub areas and different uh concerns that relate to the system to the complexity of the system one of them which is often overlooked is simply cost right it's like if you can do something in a single machine which I have this kind of Infamous uh slide that I when I show people some people don't like very much is I I tell people that they can do probably almost everything they need to do in machine learning in a single machine um and I have reasons to say that but the the point is that uh if you add unnecessary system complexity first of all you're going to have a lot more cost so you're going to have this now U huge number of Mach of Machin that you're going to have to maintain in a cluster or pay Amazon for AWS cost right so that's one and it's probably obvious and it's probably not the most important the most important one is system complexity uh reduces your uh speed of innovation and if you have a system that is really complex from the get-go innovating on it becomes like a huge pain right because then I'm trying to tweak something and it turns out that that's something is just one of the 10,000 knobs that are in the system and it's hard to know what it did it's hard to understand whether it improved things and if you keep your system as simple as possible as long as possible your Innovation is going to improve and your Innovation speed because you're going to be in a much better decision to then uh change things dramatically improve them understand what you're doing and what is improving and at some point you need to add complexity there's no way around it it's like complexity might add enough um Improvement H either in accuracy or basically in whatever metric you care about that it's worth adding but the problem is you don't want uh arbitrary complexity from the start because that midterm and longterm is going to impact uh you're going to be end up in a local optimo s to uh sort to speak and you're never going to reach that U Global one that you would be getting if you keep your option simple as much as possible interesting to me the thing that it brought up was the the notion of technical debt that's typically applied to code right code debt is has anyone have you come across anyone that's uh thought this through in terms of algorithmic debt oh yeah uh there there's this interesting paper that was published actually originally was published in a workshop that I co-organized in nips uh and it's called a a highin credit card of machine learning depth and uh it's it's a very good read um it's by a couple of authors from Google by the way so they know what they're talking about in terms of machine learning dep um uh so it's something that it's been discussed again even in papers right so right so it's it's it's an uh something that any organization will face at some point and it's something that it's really important and it's really important at many levels not only at the level of the system itself but also and I would go further that that's part of sort of like the the core of the machine learning algorithm algorithmic design right it's like uh it's okam razor principle of you know if you have a possibility of choosing between two things always choose the simplest one and part of the reason is because you want to minimize your depth as long as possible and only make things more complicated when they really need to be and they're uh adding up enough so that goes back to the lesson learned from the Netflix prize it's like you know yeah sure you can have you can go for the more complex solution but is the Delta Improvement that is adding worth the huge increase in complexity and many times the answer is going to be no that's an interesting segue to uh one of the topics that I wanted to chat with you about you recently tweeted about a a natural language processing course and the hashtag you use was no deep learning and you across a number of of your public appearances you've maybe developed uh a little reputation for Mr # no deeplearning and of course I'm being I'm being uh artificially you know controversial here I I understand that this is you know it's a it's a tool in the toolbox um but some of our earlier discussion about system complexity I think is one of the issues that that you have with deep learning maybe walk us through you know what your position how you think of your position on deep learning uh and you know why you bring it up interesting when when I talk about deep learning I always start by having a few slides in my presentation that uh explain how deep learning works right so I want to get that out of the way say hey I know the Deep learning works and and it's great for a few things actually particularly for natural language processing I think that it's getting to a point where it's the default uh tool for many things and it's great so the reason I was using the hashtag is just to warn people that if they were looking for deep learning it wasn't available in that course so I think it is it's very important for uh people to understand um what is the the right tool for the right task and uh for example we use deep learning at quora for several things right we have a lot of text and going back to the NLP example there's many things now in text processing that rnns are you know they're actually the simplest solution there is because you can you can find uh some of this uh uh ready available open source Tool uh tool kids that have already been train and you can even use the model as it is you don't even need to have your own data set uh or then you can retrain it but uh basically it becomes simple enough that that could be your default uh approach to a an NLP task uh that you have at hand but that's very different from saying that that's equally true for all machine learning applications and you need to understand like what is the complexity you're paying for default into machine learning for everything you have and I've seen a couple of examples recently where I think we're you know in a dangerous situation where um a lot of people especially like um more Junior researchers or Engineers that they're you know they've come into industry right at the cusp of the deep learning bubble or wave or whatever we want to call it and their their mind goes straight into deep learning as the default solution for anything and I've seen cases where I've had um engineers and some companies tell me hey I'm using this uh tensor flow architecture on a problem where I have uh 10,000 examples and 30 features and I want to ask you a question and my answer like why are you doing this to yourself right I mean if you have 10,000 examples and 30 features uh do you really think you need a deep learning model with uh a bunch of layers and most of the time the answer is no and even if the classifier that you're building with that deep learning um uh architecture is let's say in the best case 1% better than the one you could be building with a simple logistic regression you're still going to be better off going for the logistic regression because what going back to what I was saying before your ability to innovate uh on that initial model is going to be much bigger than your ability to innovate on a very complex uh deep neural net that you don't really understand what's going on in inside so I guess my the point that I'm trying to make when I talk about um quote unquote No deep learning is that deep learning should be another of the tools we have in our toolkit and there's a lot of other very interesting uh machine learning tools and even research that is going on that it's uh we should still pay attention to uh there's a problem also in the research world right now with deep learning is that because it's so new and there's so many so much low hanging fruit uh it feels like you know that it's the easiest way to get a paper accepted is to do an incremental Improvement or not so incremental but an improvement on some deep learning approach and that's why we're seeing all the conferences now dominated with uh deep learning things right even when you go to a uh conference like kdd or uh the ACM recommender systems conference that I'm going to be attending in September you start seeing like a bunch of deep learning papers because it's new it's easy to be uh innovating using deep learning but we run the risk of like saying oh yeah this is the one thing that works for everything and we're going to try to find all the nails that appli to this hammer and uh we'll think that they're all they all look the same and and I think that's uh there is a danger in that so you've touched a little bit on some of the things you're doing at quora maybe tell us a little bit about um you know tell us a bit about your experiences there and you know what are some of the interesting problems that you face there yeah sure um so I I that's a great question um one of the things that I love about Kora and one of the reasons as I said before that um we have a VP of engineering with uh this kind of background in machine learning and algorithms is that um everywhere I look uh on our product and our the issues that we're dealing with I see problems that are solvable and should be solved through machine learning right so and now if I sorry for interrupting but it's likely that most of the people listening know what quora is but maybe you can start with just an explanation of uh the site and um the mission sure that's yeah that's that's a very good point uh and it's a very good point because also even people that know us and use us frequently they have a misconception about uh what Kora is so quora is on the surface is a question and answer site and uh application but our Mission goes beyond that so the mission of Kora is to grow and share the world's knowledge and we think that the question answer Paradigm is really well suited for actually growing and sharing knowledge uh just to give a different example of the only other quote unquote company that has a similar Mission which would be Wikipedia Wikipedia uh also believes in spreading or growing the knowledge but they believe in the encyclopedic format and that leads to a bunch of different product decisions of course so we feel like question answering and a broader notion of what knowledge is so uh Wikipedia is about factual knowledge we think that for example an expert opinion is also knowledge and should be included in any knowledge base so all of that defines our decisions and uh using question answer for Now is working really well and we think it's the ideal vehicle but we are not close to trying different things and actually we do have even different things as of today in our product that enable that knowledge growing and knowledge sharing um so so another way to look at uh to understand quora is the different sort of like networks that overlay in the product so we do have obviously a knowledge Network and uh even another one that is a topical Network so we have entities of knowledge that are connected to each other topic that are related to each other and then on top of that we add the social aspect right so then we have people and we have people that are connected to other people and we have people that are connected to topics and to knowledge entities and this sort of like different overlays of different graphs at different levels and the different connections between them is what makes the whole data problem very exciting because we have a lot of applications across the different uh networks in different different directions and uh we have for example algorithms that are purely on the content space and they tell us how good is the quality of a given piece of content we have other algorithms that tell us How likely is a person to answer a question on a given topic uh we have different kinds of U machine learning algorithms that their purpose is sort of like trying to understand and predict different aspects of this dynamic system and um the relations between all these different entities so again examples of things that we do we do a lot of recommendations uh you have initially in your homepage you'll see a feed uh of different stories that include questions and answers that we're optimizing for you to be interested on and uh that's kind of similar to the Facebook feed but has other uh implications and a different objective function uh so recommendations like that recommendations that you get through email uh we optimize the notifications that you get through the different devices also using machine learning that's all on the personalization side of things then we have content uh approaches to infer the quality of a Content uh to do things like ranking answers according to how good they are uh we have things related to a lot of the text side of things uh automatic topic labeling how to infer a topic out of a given uh text um how to find similarities in questions and answers how to find duplicates uh and then also uh we have the whole abuse uh set of things which also uses machine learning uh we need to one of the things that cor is known about for is you know keeping high quality content that's the quality piece but also keeping very healthy positive Community um and we do that with very aggressive sort of norms and also algorithms that detect any form of spam harassment uh Bad actors and so on so forth and each one of them is a different machine learning algorithm so it's really exciting in that sense because we have covering sort of like a huge space of applications and data types that go into this applications interesting um can you talk a little bit about about the extent to which you use uh hybrid machine learning plus human yeah obviously there's a big component of the site that you could argue as hybrid as users are ranking uh different answers but are there ways that you're using uh hybrid approaches behind the scenes yes um we are um so so one way to think about it is uh initially all everything all of this was manual right then the first initial beta version of Kora there were no algorithms in place and all of it need needed to be manual um so we do have a team of moderators and people that look at content and there's always a point where algorithms are not going to be sufficient and you need somebody to look at the nuances of like is this answer uh about this politician really violating our Norms yes or no and it's like really nuance and we need to have person look at it so so the way we think about it is uh there's uh if you think about any content moderation issue there's always going to be a high uh portion of the stuff that you have on your side that is going to be good and it's going to be good with no doubt so you can have algorithms that say Hey above of this threshold I'm totally positive this is good stuff we don't need to worry about it there's always going to be a huge not a huge but some part of your uh content that is going to be really bad and there's no doubt about it so there's another threshold that tells you below this threshold I'm just going to remove this stuff because it's basically crap and you don't want that's how you keep the quality of your content in the site right now there's this gray area in between those two thres goals and that's the tricky part right so you have to do two things one is there's you know you have to have people then look at this gray area and decide yeah this is not really that bad it should we should be okay uh with it and at the same time you need to improve your algorithms to get those two thresholds as close to each other as possible and that's very interesting right because it represents sort of like a a research challenge for us to improve our machine learning algorithms to say hey we want the gray area of the things are unur to over time become as small as possible and we're doing that and at the same time the gray area is still there uh and when when we have things in the gray area we need to use some humans in the loop to understand what's going on uhhuh so if if quora were to do a a quora prize analogous to the ne Netflix P prize what would it be about what are some of the biggest challenges that you you face um well there there's uh in each of those uh Dimensions that I mentioned before there's uh there's challenges that are still not resolved but but I guess uh thinking of the Netflix priz and something that would be kind of similar and I think it's very interesting and uh probably that's uh an obvious direction we would go is that for something like uh knowledge there's also the problem which is similar to the Netflix price of how do you get the right piece of content to the right person and content is expressed in two ways right one is a content that you can consume so that's an answer that you can read and you can enjoy and you can learn from it and the other one is a question that you can answer so both of those things how to route them to the right person and how to optimize algorithm for those two things are at the core of what we're doing and they're very important for us so I think we could think of like uh again drawing the analogy of the Netflix prize of like uh question and answer recommendation uh being like a very interesting topic uh that for us it's like a a super interesting challenge uh it also connects like many different dimensions on the different overlays that I was talking about because it's not only about personalization but you also have to care about content quality right uh and you have to care about those different aspect and how they uh feed into what the users are going to be doing and reacting to shortterm but more importantly what they're going to be reacting to long-term uh I've talked about that in the past in some of my presentations like this sort of like tension between short-term metrics and long-term metrics and that's something that a lot of companies have done the wrong thing and they've gone downhill because of that and it's really important to understand uh for example in the context of content how to avoid clickbait right and and if you're optimizing for some things you're going to get clicks sure but those clicks are going to turn into people not visiting your site ever again after a couple weeks uh so all those things sort of like uh fit into this picture of sort of like content recommendation or knowledge recommendation H how do you address the short-term long-term tradeoff now maybe even in the context of uh clickbait type of application so so there's different things that go into it um I would say that that's uh that's one one of the most interesting research areas that I don't think it's been really solved even in uh research literature because there's it's very hard to get enough good quality data sets to even do something about it if you're if you're a research in Academia and in Industry uh I mean as far as as I know from the people that I talk there's obviously different things that we're all doing but a holistic approach to it is it's hard um the the one important thing is you do need to make sure that you're running your ab test with the right sort of metrics right because at the end of the day you can be optimizing whatever you want in the lab and say oh uh it's a ranking problem I'm going to be optimizing ndcg but the reality of that metric that you're optimizing in the lab with your algorithm might not really correlate perfectly to what you want to get in the product in that long-term metric so first you need to make sure that you whatever you tune in your in the lab you run AB test long enough term with the right metric to understand like what is the me what what are the effects that whatever you're doing have on the users and then you kind of uh work backward from that right once you have the right metric on your ab test you know oh if I do this my users end up not coming back after 2 weeks what did I do then you back you you kind of work backwards from that and try to understand like what are the metrics in the lab that you could have used to sort of like predict that kind of behavior and that kind kind of effect right so uh building um regression models from sort of like your uh easy to compute uh metrics which they're all going to be related to some kind of error or some kind of U information retrieval Precision or recall whatever you will into the real world of usage I think that's that's very important and then there's a there's a ton of other things that you can do once you understand those Dynamics and trying to uh Define your training set in a way that actually defines the problem in the right way uh and and sometimes uh uh I've talked about this also in the past people have this mistake of I need to use all the data that I have and I need to use the raw data that I have um and sometimes that's not really the answer you might need to use some data and not others because uh some of the data that you might be feeding into the um into your model might be teaching the model the wrong thing or you might need to wait your data in a way that some is more important than other because it leads to longer term effects that you're interested on while other might lead to a click but nothing else so there's there's a lot of sort of like um different details going into the recipe uh but again I I don't think there is a very holistic approach to it or not that I'm aware of okay uh one thing that that came to mind from me was and this is maybe going back to our discussion around deep learning there is some research happening around rnns and you know when the the reinforcement or the score you know comes later and how the RNN can optimize for um you know this delayed gratification so to speak and so you know maybe this is where you know if if this gets sophisticated enough this is where you get um some benefit from the introducing the complexity of rnns where an otherwise simple model might come into play yeah uh that's definitely true uh so models uh or approaches that have any sense of sequencing or time or Evolution over time do have some uh some benefits uh and and and you can use them it's not only about rnns another thing that comes to mind it's uh some reinforcement learning approaches uh I mean the typical one of the typical ways to deal with this is uh to and some form of multi-arm banded uh approach uh to deal with the exploration exploitation trade-off it's it's more of like yeah you know I know that you're clicking on this but let me try to explore more things let me try to come up over time uh have uh you know my model converge to something that is a global optimal rather than getting stuck on that local one where I am right now so yes um you're right I mean and and some of the sequential rnns with some form of memory and and ability to sort of like uh remember different stages and sort of like end up converging over time into a better optimal uh they're super interesting yeah before we before we get too far can you explain simply uh multi-arm Bandits yeah uh so the idea uh it's pretty simple I mean multiarm bandage comes from this notion of you have um the the typical image that people use is the uh slot machines in a casino uh you imagine that you're going to casino and you have 10 slot machines in front of you and you don't know which arm you should pull that's where the multi-arm bandits come from and you start trying one and say oh this one is giving me some interesting prices but should I try another one because maybe the one that I have next to me is actually better than this one and how to deal with this dilemma of out of multiple arms that you could be pulling there are some that you have more information about and you know uh with a degree of certainty how well they're doing and there are others that you don't really know anything about them should you risk yourself and go into the ones you don't know anything about them or should you just stick to the one that kind of works but maybe it's not the optimal one so I think that's the whole point of the multiarm Bandit um uh approaches it's like they try to define a way in which you can have an optimal policy to deciding whether you should continue pulling from the same arm or you should go to a different one and there's uh there's uh a lot of literature uh on on this uh uh in and you can read about it and I usually joke about it there's a lot of literature about multiarm Bandit but there's only one that actually Works in practice uh but uh I don't know if I want to give that away I mean it's it's it's pretty it's pretty uh well known in U in Industry that uh Thompson sampling is the easiest and sort of like more practical approach to multiarm Bandit so I think that I'm not giving too much away by saying that great so what uh what what are you finding most exciting about machine learning right now obviously there's a ton of things going on there's uh deep learning stuff there's the work that's happening around Bots there's applying deep learning to NLP like you know given everything that's going on like what what's the most exciting and and do you get to apply that in your work um and what's the most exciting thing that you're actually working on um so I think the most exciting thing for me is it's almost a a non-technical thing it's more of a this thing coming from society as a whole that it's uh accepted as a given that machine learning and AI is inevitably part of making a better future right and I think you know there are still sort of people that will argue about dangers and and about um robots taking over and so on but I think generally speaking Society is convinced and it's pretty uh much you know all bought in you know self-driving cars a couple years ago people thought uh we were crazy about self-driving cars and now they're already being tested right uh with um people riding in them so so I think this sort of like change in society and in mindset and people realizing that oh machine learning is not really evil it can be it's a tool it can be used in my uh benefit and it's something that I expect things to have to have so not very long ago seeing something that was an algorithm or machine learning was like whoa what's going on I'm losing control this is not something I like and now it's shifting to the opposite like you expect applications you expect uh gadgets to have intelligence and to have machine learning otherwise you're disappointed like oh my gosh I need to tell this phone everything I want the phone should know what I want right so uh I think that's that's a very very interesting uh shift and and it kind of uh connects a lot with some of the things we're doing at Kora right at Kora we are very user focused and we want to keep we want to keep this warm feeling of you're in a community you're sharing knowledge this is very important for you it's very important for the people but you're going to be surrounded by all this different algorithms that make your life much better and they protect you from bad people and they protect you from from horrible content that you don't want to read and they help you get your content to the right people that want to uh read about it and they're going to be helped by it U so this combination of sort of the warmth of community social aspects and knowledge but also surrounded by all this uh different algorithms in a seamless way I think that's super exciting and it's something that uh you need to uh strike the right balance but uh it's something that just a few years ago we would have thought about because you know again algorithms uh were this this cold evil thing that you kind of like wanted to stay away from uh so I think that's that's a very interesting Trend and uh something that I'm excited about we're coming to the end of our time but I've got a couple more quick questions for you the first is uh you go to a lot of conferences what are your favorite conferences in the space um I wouldn't say I go to a lot of conferences unfortunately uh especially now since my time as a BP of engineering is pretty precious and I don't get that much uh time there are some conferences that I have ties for a very long time and I keep going to them because I I'm very interested in the content but also I'm interested in the community one of them is is a small conference actually the it's the ACM recommender systems conference that's a conference that is purely focused on personalization and recommendations and I helped start the whole thing I was the general chair for that in 2010 uh back in Barcelona and I kep kind of keep in touch it's an one of the interesting things about this community which uh I think it's a little bit similar to for example kdd is that it's a very diverse kind of uh audience and you don't get the pure machine learning nips audience everyone focus on the uh algorithm and uh you know squeezing uh one % more or less uh rmsse or ma out of their algorithm there's a combination of algorithms but also application and then user oriented research which uh I think connects to the vision that I was saying right this connection between uh user orientation and algorithms uh it's very interesting so yeah uh the ACM recommended systems conference which by the way is happening in Boston if anyone is uh listening from Boston or wants to travel there this year is in the US and it's going to be super interesting uh and when is it it's coming up right yeah it's in September 15th so yeah in a few weeks uh we're going to be there um and for just to give an example I'm giving a tutorial uh with together with Deepak Aral from LinkedIn on uh all the latest research and um all the evolution of recommendation systems in industry and we're going to be giving a holistic perspective of me coming from Netflix and now Kora and him having been at yaho and now leading machine learning at uh LinkedIn U so it's going to be sort of like a an overview of all these kind of uh machine learning techniques for recommendations uh so that's that's an an example of a small Focus conference but also with a very broad audience which I kind of enjoy um kdd which uh just happened to be in San Francisco recently uh I like the community a lot and uh I think I I can find a lot of uh very interesting uh approaches and applications um I usually um yeah I'm very application driven in my uh approach to machine learning so although I will I will read all the papers or not not all sorry some papers from nips and icml I I tend to uh go to more sort of like application driven conferences and and there's also a lot of uh small conferences that are organized now uh that are kind of local and focused uh on the industry side of machine learning uh ml comp is one that comes to mind that I attend regularly because I find the audience to be very uh interesting and very engaging and uh it's a lot of uh practitioners from industry mixed together with uh a bunch of researchers and that intersection I think it's uh it's really interesting MH great great and then uh one more question that you're in a particularly good place to answer for us and that is who are the people to follow uh the machine learning folks to follow on quora oh that's a a great question but we have a lot of them so we've been doing doing actually a um a very strong push for this uh product feature that we have which is sessions which is similar to an AMA and we brought in uh I would say like all the top machine learning researchers uh to do some uh session in the past uh we've had people like uh I mean most of the deep learning folks like Yan leun and uh Joshua Beno and we've had Andrew Inc we've had Peter norvic uh we've had um a lot of different researchers and uh I would say uh most of the authors of the famous uh machine learning books like Kevin Murphy from Google and so on uh or we we we had Ian good fellow the main author of the deep learning book also recently so there's like like uh a good I would say 50 people that you would follow we've also had um people that lead machine learning in different companies like uh Amazon we have my friend Ralph herbage from Amazon uh or hen from Facebook uh so there's like a huge machine learning community in quora that is very active and very uh strong so it's one of our strongest areas right now so I would recommend uh people who are interested in in machine learning there's like a ton of knowledge there and growing so uh yeah great great uh well chavier thank you so much for spending the time with us I I learned a ton and I'm sure the folks that listen uh will as well uh anything you'd like to leave us with um no I mean thanks for having me and uh it was great to share a little bit of that knowledge uh in this different form which it's also a way of spreading knowledge and I look forward to interacting with people um especially on Cor I I myself write a lot of different answers on different topics including machine learning uh that's a good point before we go where can folks find you how how can folks uh engage with you uh I I'm pretty public on Twitter as you mentioned you you had seen a bunch of my tweets so I'm uh they can find me on Twitter on chamat XA mat or on Kora I'm also very active U so you can follow me on Kora and uh message me there um I usually keep uh a very active public uh profile so uh it it's not hard to find me and I have a pretty weird name and last name so it's like it's it's hard to uh to go into the wrong direction if you if you Google my name yeah all right great thanks so much chavier yeah thank you Sam all right everyone that's it for today's interview before we go a reminder that this week in machine learning and Ai and O'Reilly have partnered to offer one lucky listener a free pass to the inaugural O'Reilly AI conference which will be held at the end of September in New York City you can enter via Twitter or the twiml ai.com website by doing one of the following three things the preferred way of entering is via Twitter just follow at twiml AI Twi m l Ai and retweet the contest tweet that I'll pin to the account and post in the show notes do those two things and you'll be entered if you're not on Twitter you can sign up for my newsletter at twiml a.com newsletter and add a note please enter me in the additional comments field finally if you're not on Twitter and you aren't interested in the newsletter no problem just go to the contact form on twiml ai.com and send me a message with that form using AI contest as the subject the drawing will be open to entries through September 1st and I'll announce the winner on the September 2nd show good luck and hope to see you in New York thanks again for listening [Music] n e e

Original Description

My guest this time is Xavier Amatriain. Xavier is a former researcher who went on to lead the machine learning recommendations team at Netflix, and is now the vice president of engineering at Quora, the Q&A site. We spend quite a bit of time digging into each of these experiences in the interview. Here are just a few of the things we cover in our discussion: Why Netflix invested $1 million in the Netflix Prize, but didn’t use the winning solution; What goes into engineering practical machine learning systems; The problem Xavier has with the deep learning hype; And, what the heck is a multi-arm bandit and how can it help us. The notes for this show can be found at https://twimlai.com/talk/3. Subscribe! iTunes ➙ https://itunes.apple.com/us/podcast/this-week-in-machine-learning/id1116303051?mt=2 Soundcloud ➙ https://soundcloud.com/twiml Google Play ➙ http://bit.ly/2lrWlJZ Stitcher ➙ http://www.stitcher.com/s?fid=92079&refid=stpr RSS ➙ https://twimlai.com/feed Lets Connect! Twimlai.com ➙ https://twimlai.com/contact Twitter ➙ https://twitter.com/twimlai Facebook ➙ https://Facebook.com/Twimlai Medium ➙ https://medium.com/this-week-in-machine-learning-ai
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from The TWIML AI Podcast with Sam Charrington · The TWIML AI Podcast with Sam Charrington · 1 of 60

← Previous Next →
Engineering Practical Machine Learning Systems with Xavier Amatriain - #3
Engineering Practical Machine Learning Systems with Xavier Amatriain - #3
The TWIML AI Podcast with Sam Charrington
2 How to Build Confidence as an ML Developer with Siraj Raval - #2
How to Build Confidence as an ML Developer with Siraj Raval - #2
The TWIML AI Podcast with Sam Charrington
3 Open Source Data Science Masters, Hybrid AI, Algorithmic Ethics & More with Clare Corthell - #1
Open Source Data Science Masters, Hybrid AI, Algorithmic Ethics & More with Clare Corthell - #1
The TWIML AI Podcast with Sam Charrington
4 Interactive AI, Plus Improving ML Education with Charles Isbell - #4
Interactive AI, Plus Improving ML Education with Charles Isbell - #4
The TWIML AI Podcast with Sam Charrington
5 Machine Learning for the Stars & Productizing AI with Joshua Bloom - #5
Machine Learning for the Stars & Productizing AI with Joshua Bloom - #5
The TWIML AI Podcast with Sam Charrington
6 Generating Labeled Training Data for Your ML/AI Models with Angie Hugeback - #6
Generating Labeled Training Data for Your ML/AI Models with Angie Hugeback - #6
The TWIML AI Podcast with Sam Charrington
7 Explaining the Predictions of Machine Learning Models with Carlos Guestrin - #7
Explaining the Predictions of Machine Learning Models with Carlos Guestrin - #7
The TWIML AI Podcast with Sam Charrington
8 Deep Learning: Modular in Theory, Inflexible in Practice with Diogo Almeida - #8
Deep Learning: Modular in Theory, Inflexible in Practice with Diogo Almeida - #8
The TWIML AI Podcast with Sam Charrington
9 Emotional AI: Teaching Computers Empathy with Pascale Fung - #9
Emotional AI: Teaching Computers Empathy with Pascale Fung - #9
The TWIML AI Podcast with Sam Charrington
10 Statistics vs Semantics for Natural Language Processing with Francisco Webber - #10
Statistics vs Semantics for Natural Language Processing with Francisco Webber - #10
The TWIML AI Podcast with Sam Charrington
11 Building AI Products with Hilary Mason - #11
Building AI Products with Hilary Mason - #11
The TWIML AI Podcast with Sam Charrington
12 Reprogramming the Human Genome with AI, w/ Brendan Frey - #12
Reprogramming the Human Genome with AI, w/ Brendan Frey - #12
The TWIML AI Podcast with Sam Charrington
13 Understanding Deep Neural Networks with Dr. James McCaffery - #13
Understanding Deep Neural Networks with Dr. James McCaffery - #13
The TWIML AI Podcast with Sam Charrington
14 Scaling Deep Learning: Systems Challenges & More with Shubho Sengupta - #14
Scaling Deep Learning: Systems Challenges & More with Shubho Sengupta - #14
The TWIML AI Podcast with Sam Charrington
15 Domain Knowledge in Machine Learning Models for Sustainability with Stefano Ermon - #15
Domain Knowledge in Machine Learning Models for Sustainability with Stefano Ermon - #15
The TWIML AI Podcast with Sam Charrington
16 Machine Learning in Cybersecurity with Evan Wright - #16
Machine Learning in Cybersecurity with Evan Wright - #16
The TWIML AI Podcast with Sam Charrington
17 Interactive Machine Learning Systems with Alekh Agarwal - #17
Interactive Machine Learning Systems with Alekh Agarwal - #17
The TWIML AI Podcast with Sam Charrington
18 Location-Based Intelligence for Smarter Marketing with Klustera - #18
Location-Based Intelligence for Smarter Marketing with Klustera - #18
The TWIML AI Podcast with Sam Charrington
19 AI-Powered Customer Support with HelloVera - #18
AI-Powered Customer Support with HelloVera - #18
The TWIML AI Podcast with Sam Charrington
20 Using AI to Simplify the Programming of Robots with Cambrian Intelligence - #18
Using AI to Simplify the Programming of Robots with Cambrian Intelligence - #18
The TWIML AI Podcast with Sam Charrington
21 Increasing Efficiency of Healthcare Insurance Billing with NLP, w/ Behold.ai - #18
Increasing Efficiency of Healthcare Insurance Billing with NLP, w/ Behold.ai - #18
The TWIML AI Podcast with Sam Charrington
22 Creating a Worldwide Financial Knowledge Graph with AlphaVertex - #18
Creating a Worldwide Financial Knowledge Graph with AlphaVertex - #18
The TWIML AI Podcast with Sam Charrington
23 From Particle Physics to Audio AI with Scott Stephenson - #19
From Particle Physics to Audio AI with Scott Stephenson - #19
The TWIML AI Podcast with Sam Charrington
24 Selling AI to the Enterprise with Kathryn Hume - #20
Selling AI to the Enterprise with Kathryn Hume - #20
The TWIML AI Podcast with Sam Charrington
25 Engineering the Future of AI with Ruchir Puri - #21
Engineering the Future of AI with Ruchir Puri - #21
The TWIML AI Podcast with Sam Charrington
26 Deep Neural Nets for Visual Recognition with Matt Zeiler - #22
Deep Neural Nets for Visual Recognition with Matt Zeiler - #22
The TWIML AI Podcast with Sam Charrington
27 Introducing Psycholinguistics into AI with Dominique Simmons- #23
Introducing Psycholinguistics into AI with Dominique Simmons- #23
The TWIML AI Podcast with Sam Charrington
28 Reinforcement Learning: The Next Frontier of Gaming with Danny Lange - #24
Reinforcement Learning: The Next Frontier of Gaming with Danny Lange - #24
The TWIML AI Podcast with Sam Charrington
29 Offensive vs Defensive Data Science with Deep Varma - #25
Offensive vs Defensive Data Science with Deep Varma - #25
The TWIML AI Podcast with Sam Charrington
30 Global AI Trends with Ben Lorica - #26
Global AI Trends with Ben Lorica - #26
The TWIML AI Podcast with Sam Charrington
31 Intelligent Autonomous Robots with Ilia Baranov - #27
Intelligent Autonomous Robots with Ilia Baranov - #27
The TWIML AI Podcast with Sam Charrington
32 Reinforcement Learning Deep Dive with Pieter Abbeel  - #28
Reinforcement Learning Deep Dive with Pieter Abbeel - #28
The TWIML AI Podcast with Sam Charrington
33 Robotic Perception and Control with Chelsea Finn  - #29
Robotic Perception and Control with Chelsea Finn - #29
The TWIML AI Podcast with Sam Charrington
34 Natural Language Understanding for Amazon Alexa with Zornitsa Kozareva - #30
Natural Language Understanding for Amazon Alexa with Zornitsa Kozareva - #30
The TWIML AI Podcast with Sam Charrington
35 The Power of Probabilistic Programming with Ben Vigoda - #33
The Power of Probabilistic Programming with Ben Vigoda - #33
The TWIML AI Podcast with Sam Charrington
36 Intel Nervana Update + Productizing AI Research with Naveen Rao and Hanlin Tang - #31
Intel Nervana Update + Productizing AI Research with Naveen Rao and Hanlin Tang - #31
The TWIML AI Podcast with Sam Charrington
37 Video Object Detection at Scale with Reza Zadeh - #34
Video Object Detection at Scale with Reza Zadeh - #34
The TWIML AI Podcast with Sam Charrington
38 Enhancing Customer Experiences with Emotional AI, w/ Rana el Kaliouby - #35
Enhancing Customer Experiences with Emotional AI, w/ Rana el Kaliouby - #35
The TWIML AI Podcast with Sam Charrington
39 Expressive AI-Generated Music With Google's Performance RNN with Doug Eck  - #32
Expressive AI-Generated Music With Google's Performance RNN with Doug Eck - #32
The TWIML AI Podcast with Sam Charrington
40 Smart Buildings & IoT with Yodit Stanton - #36
Smart Buildings & IoT with Yodit Stanton - #36
The TWIML AI Podcast with Sam Charrington
41 Deep Robotic Learning with Sergey Levine - #37
Deep Robotic Learning with Sergey Levine - #37
The TWIML AI Podcast with Sam Charrington
42 Deep Learning for Warehouse Operations with Calvin Seward - #38
Deep Learning for Warehouse Operations with Calvin Seward - #38
The TWIML AI Podcast with Sam Charrington
43 Cognitive Biases in Data Science with Drew Conway - #39
Cognitive Biases in Data Science with Drew Conway - #39
The TWIML AI Podcast with Sam Charrington
44 Data Pipelines at Zymergen with Airflow, w/ Erin Shellman - #41
Data Pipelines at Zymergen with Airflow, w/ Erin Shellman - #41
The TWIML AI Podcast with Sam Charrington
45 Web Scale Engineering for Machine Learning with Sharath Rao - #40
Web Scale Engineering for Machine Learning with Sharath Rao - #40
The TWIML AI Podcast with Sam Charrington
46 Marrying Physics-Based and Data-Driven ML Models with Josh Bloom - #42
Marrying Physics-Based and Data-Driven ML Models with Josh Bloom - #42
The TWIML AI Podcast with Sam Charrington
47 Machine Teaching for Better Machine Learning with Mark Hammond - #43
Machine Teaching for Better Machine Learning with Mark Hammond - #43
The TWIML AI Podcast with Sam Charrington
48 LSTMs, Plus a Deep Learning History Lesson with Jürgen Schmidhuber  - #44
LSTMs, Plus a Deep Learning History Lesson with Jürgen Schmidhuber - #44
The TWIML AI Podcast with Sam Charrington
49 Learning From Simulated & Unsupervised Images through Adversarial Training - TWiML Online Meetup
Learning From Simulated & Unsupervised Images through Adversarial Training - TWiML Online Meetup
The TWIML AI Podcast with Sam Charrington
50 Jennifer Prendki Interview - Agile Machine Learning - TWiML Talk #46
Jennifer Prendki Interview - Agile Machine Learning - TWiML Talk #46
The TWIML AI Podcast with Sam Charrington
51 Evolutionary Algorithms in Machine Learning with Risto Miikkulainen - #47
Evolutionary Algorithms in Machine Learning with Risto Miikkulainen - #47
The TWIML AI Podcast with Sam Charrington
52 Learning Long-Term Dependencies with Gradient Descent is Difficult - TWiML Online  Meetup
Learning Long-Term Dependencies with Gradient Descent is Difficult - TWiML Online Meetup
The TWIML AI Podcast with Sam Charrington
53 Word2Vec & Friends with Bruno Gonçalves -#48
Word2Vec & Friends with Bruno Gonçalves -#48
The TWIML AI Podcast with Sam Charrington
54 Symbolic and Subsymbolic Natural Language Processing with Jonathan Mugan  - #49
Symbolic and Subsymbolic Natural Language Processing with Jonathan Mugan - #49
The TWIML AI Podcast with Sam Charrington
55 Bayesian Optimization for Hyperparameter Tuning with Scott Clark - #50
Bayesian Optimization for Hyperparameter Tuning with Scott Clark - #50
The TWIML AI Podcast with Sam Charrington
56 Intel Nervana DevCloud with Naveen Rao & Scott Apeland - #51
Intel Nervana DevCloud with Naveen Rao & Scott Apeland - #51
The TWIML AI Podcast with Sam Charrington
57 AI-Powered Conversational Interfaces with Paul Tepper - #52
AI-Powered Conversational Interfaces with Paul Tepper - #52
The TWIML AI Podcast with Sam Charrington
58 Topological Data Analysis with Gunnar Carlsson - #53
Topological Data Analysis with Gunnar Carlsson - #53
The TWIML AI Podcast with Sam Charrington
59 ML Use Cases at Think Big Analytics with Mo Patel & Laura Frølich - #54
ML Use Cases at Think Big Analytics with Mo Patel & Laura Frølich - #54
The TWIML AI Podcast with Sam Charrington
60 Ray:A Distributed Computing Platform for Reinforcement Learning with Ion Stoica -#55
Ray:A Distributed Computing Platform for Reinforcement Learning with Ion Stoica -#55
The TWIML AI Podcast with Sam Charrington

This video teaches the importance of balancing complexity with innovation speed in machine learning, using tools like RNNs and deep learning judiciously, and highlights the need for practical machine learning in real-world applications like recommendations and content moderation. The speaker shares his experiences at Netflix and Quora, and discusses the challenges of optimizing for long-term metrics and delayed gratification.

Key Takeaways
  1. Implement a system with unnecessary complexity can lead to high costs
  2. Keep a system simple as long as possible to improve innovation speed
  3. Use RNNs and deep learning judiciously, considering the complexity and potential benefits
  4. Optimize for long-term metrics and delayed gratification using techniques like multi-arm Bandits and Thompson sampling
  5. Use hybrid approaches with human moderation for nuanced content decisions
💡 Practical machine learning requires balancing complexity with innovation speed, and using tools like RNNs and deep learning judiciously, while also considering the importance of human moderation and nuanced content decisions.

Related AI Lessons

10 Python Concepts You Must Know Before Calling Yourself Advanced
Learn 10 essential Python concepts to take your skills to the advanced level and stand out as a developer
Medium · AI
10 Python Concepts You Must Know Before Calling Yourself Advanced
Learn 10 crucial Python concepts to elevate your skills from intermediate to advanced and become a proficient developer
Medium · Data Science
10 Python Concepts You Must Know Before Calling Yourself Advanced
Learn 10 essential Python concepts to take your skills to the advanced level and stand out as a developer
Medium · Programming
10 Python Concepts You Must Know Before Calling Yourself Advanced
Learn 10 essential Python concepts to take your skills to the advanced level and separate yourself from beginner developers
Medium · Python
Up next
Is Python Dead in 2026?| Truth About Python in AI Era | 90 Days Roadmap @FameWorldEducationalHub
FAME WORLD EDUCATIONAL HUB
Watch →