The story of Metaflow | Effective Data Science Infrastructure | Book author interview
Skills:
Systems Design Basics70%
Key Takeaways
Discusses the story of Metaflow and its role in effective data science infrastructure with the book author Ville Tuulos
Full Transcript
imagine that you're a data scientist and then you as a data scientists to build some kind of a system maybe a recommendation system or maybe optimized marketing budget or like whatever the company might need these like a systems that you build the machine Learning Systems they need to be very reliable they need to run on the Engineering Systems and this was exactly the situation that we had at Netflix as well back in the day question is that like how do you help data scientists to actually use all these production systems and that's really like what meta flow was created for us hello everyone welcome to another month of our book club chat this month our book club is reading the book effective data science infrastructure and we are very grateful to have the book author with us today Villa is the creator of metaflow and the co-founder of outer bounds uh welcome they were really appreciating you here and grateful to have you yeah thanks for having me for anyone who joined us today please feel free to ask questions directly in the zoom or as in the chat let's get started with the random introduction I'll get started my name is Sophia I am a data scientist and Anaconda I'm also the organizer of This Book Club Villa do you want to introduce yourself well I think you just introduced me so I'm I'm Billa toolos indeed um use the word cat Netflix before um leading the machine learning infrastructure team like where we studied metaplo and I've been doing machine learning infrastructure way before that as well and now CEO co-founder at outer bounds where we basically continue developing meta flow and infrastructure around it so again thanks for having me thank you anyway I also want to introduce yourself if you don't feel comfortable feel free to introduce yourself in the chat uh or just like show up that'll be even better hi Blake hi uh I'm Blake Moya I'm a statistics PhD student at the University of Texas at Austin and a former intern of Sophia introduced to the book club and I'm glad to be here awesome thank you anyway I also want you to introduce yourself no worries if you don't feel comfortable no okay um yeah if you prefer to typing the chat and tell us who you are that would be great as well uh I guess let's get started with some questions I have a list of questions that I want to ask um okay let's let's get started with the first one since the video will be online for viewers who are not familiar with meta flow could you give us a quick introduction of metaflow yeah yeah no that's a that's a great question um like let me kind of paint the picture to you maybe to answer that question and uh this picture may be familiar to you so now imagine that you're a data scientist at kind of any company today and um this company probably has an engineering team and the engineering team has been like building all kind of like a platform the infrastructure as well but typically the engineers are not the data scientists and then you as a data scientist maybe you are like asked to build um some kind of a system maybe a recommendation system or maybe optimized marketing budget or like whatever the company might need and as you know these days of course the hope is that like these like a systems that you build the machine Learning Systems they need to be very reliable they need to run on the Engineering Systems and this was exactly the situation that we had at Netflix as well back in the day that we had data scientists like who were asked to build all kinds of business critical systems there were engineers who had been building kubernetes workflow orchestrators data platforms all kinds of things but the question is that like how do you help data scientists to actually use all these production systems and that's really like what meta flow was created for so basically 80 percent say user interface for the data scientist to access all these modern like production systems so that they don't have to know the nitty-gritties of kubernetes and integrities of of let's say workflow orchestration and they can use plans simple python not far like from what they have been doing in the notebook already and and do it in a way that like the company is then comfortable with like deploying the production and running in production so that's I guess the the short version so so what's the score story behind meta flow like how did you come up with the idea and I know you created when you were at Netflix like how does Netflix give you the support to build this thing from scratch and let you just do it and be so successful right well I mean of course I mean as always with these things I mean you never never know in the beginning so I mean that's kind of other other parts like happen only afterwards but here's the deal that um uh I had been using uh other systems in the past like for instance there's this workflow orchestrator called Luigi that was created at Spotify back in the day and that's something that I used in the past and I kind of liked many parts of it um it was really nice like when it was used in production for instance but one kind of the Achilles heel of that system was that it was actually like quite hard to test anything locally and of course like with with machine learning data science like it's really important also always that like you can kind of a test things locally and experiment locally like before you even know if these ideas work at all and now going to Netflix they were in this interesting situation that they had almost like an infinite number of ideas how they could be leveraging machine learning and data science uh but I mean they definitely they were feeling the pain of like having that like a missing piece that the kind of the infrastructure was there but I mean really it wasn't easy enough for data scientists to do any of these things so there was there was a need and now remember like maybe some of you may remember back in 2017 which is now like how many five six years ago you didn't have things like a sage maker keep flow I mean it was kind of like really early days like for for any ml Ops I mean I guess the whole terminal Ops it didn't even exist back in the day so so that was the big question that like how do we help data scientists to kind of build these applications experiment with these things so it was really a real business need so I mean I what I really appreciate about Netflix is that they don't build anything I mean just like for for the tech being cool or like a machine learning being cool I mean they have I mean everybody knows Netflix I mean they are the line of like entertaining people and they do things that like really matter for them I mean like just kind of to drive towards that goal and that was a real business need and and and and now of course I mean there had been many smart people like kind of building parts of the stack but I mean there wasn't really like anything unifying so we kind of like started building that of course it was like an experiment in the beginning I mean testing some of those ideas that okay can we take the best parts of Luigi and like kind of make it easy to kind of test things uh locally can we take the best parts of like kind of the other similar systems can we make it easy to scale to the cloud and and all those things and then you know the another like really interesting thing about Netflix is that they don't have anything like a top-down that somebody like a big CTO would just mandated everybody must use this thing but we knew that the in order to this thing becomes successful is that it really has to be so human-centric so like user friendly that people actually want to use it then like what happens in open source as well that and that's why like we were really investing in the kind of even like things not in the code I mean documentation from the get-go and and then like then like when when some projects actually like felt that actually like this is useful this actually helps them I mean they of course like told their friends and they told their friends and then over over years I mean we kind of got into this positive feedback loop because basically with every new project we also heard like new requirements we heard like new pain points and then we were able to address those so I think that was really amazing so in that sense I mean it's of course like a massive group effort like of course many Engineers contributed to code base but also like many data scientists just like sharing that like what are what are kind of like things that are hard for them like in their day-to-day life I love it and and this is open source from the day one right how how is it because it's Netflix not everything is open source but how did you convince them to let you do open source right well actually like it wasn't open source from the day one uh it was like actually like we started developing it like 2017 and it then it was open sourced late uh 2019 so it had been already used like at least like one and a half couple of years in production at Netflix and uh now if you go to github.com Netflix you will see that there are many many projects there actually like some that are quite popular so the company definitely had had a history of open sourcing thing and promoting open source and and of course you can imagine that like Netflix is not exactly in the business of selling these things so they don't have necessarily a business interest so in that sense I mean it's not like conflicting with anything so always like for any company the big question is always about the time commitment so we know that um open source doesn't happen by itself I mean the communities need nurturing like you need to provide documentation you need to of course like constantly need to be responding to ticket simple requests and so forth so that was the big question that how do we manage the time commitment and uh I guess we just like managed to convince everybody that like well actually like kind of many of the questions that the data scientists outside Netflix uh would have are such that addressing them will then kind of help data scientists at Netflix as well which I think has to like proven to be true over time and I I think that things like that really help to make the case that's that's a great story I love it I love Netflix it's contributing to open source I love open source also the the talk about uh you know dealing I guess on the ground with other data scientists during development as it as its open source and uh during early development in Netflix uh was bringing up something that you know I had intended to ask for a friend who couldn't make it today which was you know was it developed in collaboration with action users uh because you know these kinds of tools I think are really nice to to find those kinds of pain points that people are actually struggling with uh rather than just the you know developers trying to customize something to their own workflow and being open source and being able to really uh not only view the problems that everyone is having but getting people to use the same framework and so then they're discovering very similar problems and kind of uniform workflow is really useful too uh a community data center yeah no that's that's absolutely right and that you know that's that's always harder than it sounds like I mean like you read any like a kind of books about product development that they always say that I mean go talk to to the users and so forth but that's actually like a complex dynamic because everybody is busy and uh and like kind of just kind of if you ask for feedback I mean it takes some energy and like really thoughtfulness to give high quality feedback so the nice thing about an environment let's say like at Netflix or it could happen in Academia or like maybe it could happen at other companies as well is that kind of everybody is working towards the same goal that like we all like want to make these things happen and hence we are instant device to work together to make these things happen and I think that that was really a fruitful environment for developing these systems and I think like Blake maybe another point is like um about like why it is so useful to kind of work close to data scientists is that I think oftentimes especially like engineers and like kind of maybe super experienced machine learning people as well like when they start developing systems I think that there's always a bit of a tendency to over complicate things or like kind of a go after the kind of the shiniest newest the most exciting problem that can we like automatically do this like amazingly all the differentiating thing in Python that like you can throw anything at it and it's a super exciting problem but when we work with people it turns out that the problems that people have on a day-to-day basis are super mundane I mean even things that are so embarrassingly mandate that they don't even dare to talk about they're like dependency management that like kind of everybody said oh my gosh I mean how can we build the fanciest language model in pytorch but the first question is how do you get pytorch installed in your laptop and like it's it's something that like it's so kind of embarrassing that like you don't even want to talk about it and that's why the way I mean what I like about anaconda and conda is also that like kind of that's one of the foundational questions that we need to solve first then that's why I mean like if you look at metaflow today we really started with this kind of things that when you look at metaphor you think that okay I mean like oh this is like a no-brainer simple I mean there's nothing too fancy and that's exactly by Design because those are the things that like people struggle with on a day-to-day basis so yeah with a lot of those kinds of issues in the r Community because uh in in Academia most of the the professors that I deal with use are pretty much exclusively and in trying to develop tools for R uh and I can either make them useful for me since I'm going to be the one executing it in the end anyway or I can make code to share and I have to reckon with the fact that the users that I'm going for have probably never touched python in their life never seen uh you know no hate to R but they've probably never seen beautiful or elegant interface uh for mathematical Computing before uh and so what they would want to see is the same kind of user interaction at their level that reminds them of all these older R functions that may not be the best but it's what they know and then under the hood to change everything up to make it work more efficiently and more volume yeah have you so it's definitely hard to balance Reckoning with you're actually going for at the end of the day rather than if you think you know best that's right that's exactly right yeah I'm going back to the uh environment management I was so excited when I was reading the uh conda decorator I was like yes yeah that's that's great um speaking of art does metaphor work with r it actually does so we have the meta flow R um extension like I'm like kind of going going back to what Blake was saying I mean of course like we work with many people and like from with the background in statistics and of course I mean it's actually maybe still today true that if you're doing pure statistics classical statistics or it might be still like maybe the best I mean Argo player at least like in the top three in environments for doing that so uh yeah no I mean we wanted to support R now um The Challenge there is really that there there's like very few people um who really know the internal so far they're like good number of users of our but I mean then like finding people who can really contribute who can like really understand kind of the internal so far I mean that's that's definitely a smaller community so um but yeah no I think like for for certain jobs I mean like it's it's still like it's a great tool so anyone else have other questions before I keep asking my questions don't be shy okay I guess I'll just ask um so yeah so you mentioned Luigi and I use airflow a lot also prefect like do you see all those other tools as competitors or collaborators for companies that already use all those pipeline tools um what's your recommendations to introduce metaflow to them and integrate with the existing systems right right well that's it that's a great question now again I mean going back to really thinking like what are all the day-to-day needs that the data scientists have like what the person who anyone like who wants to build end-to-end ml applications if you think about it they're like actually like a bunch of different concerns and that's why we have this concept of a stack the infrastructure stack like what's in the book as well that you kind of have to think about first there's the kind of the very basic question like about the data how do you connect to data warehouses and so forth and then like you have to think about like there has to be some way to kind of execute these functions run compute so you have to think about that and then like since the compute is never like just individual function functions but I mean usually like you need some kind of like a workflow of different functions you need workflow orchestration and now like kind of a back to your question about the workflow orchestration specifically for that question I mean they have of course like excellent tools like you mentioned airflow and then prefect and Argo workflows and Dexter and many many tools but a challenge um that that I have seen at many companies is that okay I mean you can pick let's say like airflow as an example but I mean still like you you kind of you're like left with the question that let's say how do I do compute and now of course like let's say you can run airflow on Uber and at least and like use the kind of the kubernetes operator there but now you've got to have two problems that like well I mean you kind of have airflow here and kubernetes here and like how do they actually work together and so forth and then let's say it like talking about the versioning or like kind of keeping track of like many workflows I mean that's something that's kind of outside airflow as well that I think like um a problem today with with many companies and many of these tools is that like you can kind of like a big pick kind of a best of the breed tool like for every layer of the stack you can use snowflake for data warehousing you can use let's say spark like for a computer then you can use airflow for orchestration and you can use whatever like DVC like for versioning and then like you use modeling libraries and so forth and you kind of end up with nine different tools and and of course like typically none of these tools are like kind of a they don't know each other so well so I mean like you as the user you have to navigate between them and and that's that can be like quite quite challenging so our stance has been that while we don't want to reinvent the wheel uh it is useful to have a unified interface so that like you don't have to think about every concern separately and uh and that's why I mean first I mean like meta flow comes with its own orchestrator so you can test things locally but the idea is that like once you want to go to production it is incredibly useful to uh of course like integrate with the existing infrastructure that your company has so let's say that like if you use airflow actually it will be possible we will be releasing support for that like really soon that you can take your meta flow flow and like run that on the on the airflow orchestrator or like let's say you use AWS step functions you can use that so that's that's kind of our thinking about it that uh that like it makes sense to kind of integrate with kind of different tools but not so that the user has to navigate these different different layers like by themselves and now I mean like most likely we won't be integrating with everything I mean they're like hundreds and hundreds of work for registrators but I mean they're like a few really popular ones like airflow or the workflows that we use of kubernetes is another one and step functions on AWS so that's that's kind of like how we're thinking about it so it's not so much of like a competitor like rather than a component in a stack so I love it I love it yes you're absolutely right airflow doesn't do versioning like you have to rely on other tools like DVC right provisioning and I especially unfortunately I feel like metaflow UI will be a great thing to have to see different statistics of different versions but meta flow UI is not in the book why why is it not in the book yeah good good question actually like for the very practical reason that metaflow UI was developed at Netflix and when the when I started writing the book like it was at open source it was only open source like after after I guess like it was like before the book was released but I mean then those chapters have been written already so I mean I I wish I could have covered it in the book but yeah no I mean that's the simple reason it just wasn't available at the time so uh that makes sense yeah when I was first reading I said go ahead I just said there's always time in the second edition that's right that's right in the second edition yeah coming soon yes please yeah because when I was reading it when I was reading the decks I was like oh I wonder if there's a UI and it'll be nice if I see the UI in the beginning it could now be like a lot easier for me to like have a visual understanding of things yeah you know I mean like if I might just like a comment on that so I think it that kind of touches the more fundamental question it's like why do they stack books exist in the first place that was the question I asked like myself that of course like many of the libraries and systems they kind of like move so fast that okay I mean there there's some feature that wasn't available at the time at the time of the writing of the book and then some things maybe become Obsolete and so forth so with this book um like really the idea that I had in mind that there are this like some principles like thinking about the stack um that I I think is actually something that will kind of survive the test of time hopefully because I have seen that these things haven't changed so much over the last even 10 years so even longer time but then the individual details may change so I mean let's say today let's say uh like you may use AWS batch maybe tomorrow I'll use kubernetes maybe like kind of have been three years time is something else but I mean it doesn't change the kind of the mental model and I think anyone new to this field I think it's really beneficial to have really strong mental models what's going on and then suddenly like everything starts making more sense because it's not like a random new tools here and there but I mean you kind of start saying that oh I mean this thing's actually like kind of they have a third place and they have their purpose um and it's not just like kind of like a totally fragmented landscapes yes I appreciate the book has a good balance between Theory and uh practical uh code so yeah I really enjoyed the book any other questions like or yeah if you don't mind if I hit more on the the balance of theory and practice or you know Theory versus tutorial uh is that the way that you're talking about you know this kind of model of work the model of machine learning design the model of model development uh is one that persists and I think that sometimes when uh you know Engineers or developers get very invested in the lower level the more literal um function or computational workflow that they're doing they forget that there is this higher level of what the community as a whole what the data science or software development Community as a whole is using and it doesn't get into that model and so I think a lot of these problems and what metaflow I think solves well is is just interfacing between different components of a larger model that yeah you know about airflow you know about uh you know working with kubernetes with conda and the problem is just if everyone knows how to use all these individually they're going to run into a lot of trouble trying to connect them together right and so I think it's very just very wise to have to have isolated a need to interface between so many of these other you know interesting tools and getting it to all fit in a model that most people already understand so in that sense it feels as if there is little to learn yeah no I think that's that's right and that that is also like really the interesting um kind of like um like a decision that one has to make when when starting to write any open source piece and like actually like if you go to GitHub you can see that there are amazing super interesting super sophisticated fashion projects that people do that like it are kind of let's say like starting with the premise that okay like all the existing programming languages suck let me start by creating my own programming language and with this language I can create these amazing abstractions and then by the way you notice that the operating system suck and let me create my own operating systems and then you go after that like ideal situation and in some cases you actually managed to create something that's super amazing and kind of like a fixes like everything that's wrong in the world but the challenge is that it's incredibly impractical for anyone else to use and uh I think that that's like to your point exactly that kind of in some sense you have to meet people where they are but also help them to kind of take the maybe the next step or maybe the 10 next steps forward and I think that that's if you look at our industry overall I mean that is the story like even over decades that we have some things that like have been like stuck around for a really long time um and and they are not perfect but I mean at the same time like am I creating a way would be incredibly hard and it's actually it's amazing like how much people can do by leveraging something that's old and adding something that's new and maybe fixing the worst parts and and then like adding something new and I I think that that's how we kind of like work together as an industry as well and I think it also goes back to this idea of human centricity and empathy that I mean in some like a selfish point of view as an engineer it might be that oh my gosh everything sucks I mean let me reinvent everything but I mean then like you are not exactly inviting everybody else to kind of join the journey because also like opinions differ and like what sucks for you maybe doesn't suck for someone else so like finding that balance like where you actually like trying to be empathetic that okay I mean like maybe there's a reason why these people use the tool X I mean like R is a great example you know that like R is kind of a divisive topic and like well they are like you mentioned your professors use R you could say that that's just old school and they don't know better or like maybe they have their reasons so yeah exactly I think it and it holds true I think with most Industries most human behaviors that people do what they're comfortable with they'll follow the path of least resistance and if you want them to take a new road you got to pave it for them yeah and uh and I yeah I find that that's something that's really important with developing new tools like this is I think even more important than can it do you know does it do anything new or interesting does it do it well is first and foremost is it usable and it's kind of the whole usability the entire premise of of metaflow is this this development of a a better fit to an already accepted model of the machine learning development workflow yeah and if I I just like added one point on that so it's not only the the beginning that like you meet the users where they are in the beginning but also like then like throughout the journey so now metaphor has been around internally since 2017 in open source in 2019 and actually like the code that you have written for metaflow in 2019 still works without any changes and this is really important and this is something that like applies to all open source projects that I think like one of the biggest problems with many projects and many ecosystems is that they move so fast that you are imposing this massive migration tasks all the time to your users because you are asking them to follow you all the time and especially as an engineer this is a huge Temptation that you notice that oh my gosh I wish I could change that API and you change that API and then everything breaks and you break all the users and and this is the Dilemma that in your point of view you might think that oh my gosh there's a technical depth and and like this is this is horrible but at the same time I mean I think oftentimes like the the owners of these libraries underestimate the the tax on the users like for asking them to migrate and of course like one of the most famous and like Infamous examples of this is the python 2 to Python 3 migration I mean they they like change things in Python 3 for a good reason somebody took like what 15 years for people to migrate and uh and I I think that that's why I mean like we are also like very adamant about the backwards compatibility of metaflow itself that now that you are using metaflow we are not going to break you and it it does cause pain and I know that like Congress sometimes Engineers are complaining to me that oh I mean this is horrible and like we have to keep doing this but it's part of that like empathy that look I mean I know that like you as a data scientist I mean the last thing that you probably want to do like today is to start migrating some new API of meta flow just because it broke so yeah yeah and that and that brings us back to how I think we we began this this spin-off conversation in the first place which is that balance between theory and practice where you know that these will stand the test of time because they are founded in a theoretical model that has already withstood the test exactly yeah at least the foreseeable future and definitely the visible past so that I think is is a thing that not that everyone knows they should do but not enough people stick to it well enough just for a reason and this is yeah this is what it is everything we've been talking about so thank you I love how easy meta flow is for data scientists to use like the chapter with AWS you just follow the instruction to set things up and you can just run your things on AWS so easily and then the last the chapter 9 I think recommendation system where you showed to add a model simply by adding a file of the model without need to change anything else it's just like incredible it's just so easy yeah I really appreciate that yeah no that's that that's great and and that's definitely the intention and you know the idea has always been that I think like if you are let's say working data scientist at the company they're like so many kind of worries that you have I mean kind of it's kind of in some sense it's a chill job in some other ways it can be quite stressful I mean the data is changing all the time the requirements are changing and I think that they can be like so much kind of like accidental complexity what the book talks about as well that we just like over complicate things like for no reason and especially like kind of when there's so much inherent complexity in these things and uh and like especially like kind of anything that like relates to data always you have a ton of things that are changing and moving and you have to understand that at least the things we can do simply we should be doing simply so I mean let's say like kind of if you add a new module I mean model and it shouldn't be shouldn't be hotter than like let's say heading a new file or something so so yeah I mean that's that's really the principle and that oftentimes I I find myself that the kind of one of the biggest roles is actually just to convince people that look I mean it's okay to stay simple it's okay to keep it simple because oftentimes like people are just or that oh my gosh I mean like kind of isn't it almost like illegal to kind of make it so simple that like somehow it doesn't feel right because I mean I think it like even in education and like kind of I guess like in in in in other parts of the industry there's a bit of a tendency that like kind of when things start complex it also like showcases how amazing you are and I think it like of course I mean like your your like kind of uh value um or like the value that you're producing as a data scientist should be much more tied to the actual outcomes and like an up to the lines of number of lines of code for instance yeah exactly and I I sometimes I feel like do I really need to learn all the devops stuff like there's just so much to learn but then sometimes I feel like maybe with the right setup with the right tools I could just stick to what I know about data science I don't need to really get into the devops yeah yeah and I think like also like philosophy is that um I mean it's also like a not like a super high level abstraction saying that like you can't I mean that's why it's called let's say at batching at kubernetes it's not hiding the fact that these are systems underneath so I I know some data scientists who have become really interested in devops I mean like out of their like our own personal curiosity and that's awesome I mean of course I mean by all means I mean I'll prevent you from doing that at the same time if your interests lie elsewhere yes no I mean like you shouldn't have to learn like the right Docker files and like learn to deal with Jenkins I mean it's I mean that's probably not relevant to you at all like if that's that's not your thing so yeah I mean am I interested just I'm not an expert in it so I might not do it as a good job as some other devops experts experts yeah so uh I have another question what is new in metaflow that's not in the book I know the book has been coming out for a while and you probably started re writing the book like a few years ago so that's right well I mean you already mentioned the UI that's definitely one of the things now the official support for all the other clouds the that as you have seen the book only talks about AWS now it works in Azure and gcp as well I mean all the all the kind of principles everything works the same way so I mean it's it's like not the big deal in that sense one nice thing that I I wish I could have covered in the book are the metaflow cards the kind of the visualizations and reporting that you're gonna embed so that's a that's a super cool feature that would have been a nice addition to it maybe in the second edition again and um let's see uh what else uh tagging so I mean that's something that likely released uh about about a year ago um so you can like so what do you think so you can basically um uh like you have your runs and you get the Run IDs and so forth but then you can attach like a like a label like a tag uh like in this runs and then you can add and remove them and this is really useful like if after the fact you can adjust on the label that look I mean this run is the best experiment this far or like this corresponds to the production model or something like that so that's that's not covered there yeah and I guess they're like a number of like well like other other features as well of course like now I mean maybe it would be fun to touch these topics about this uh kind of a cool new kind of foundation models and so forth so all the GPU stuff it's still a kind of a deep topic so where's the second edition that's right second edition Third Edition yeah nice anyone else have questions I'll keep asking if nobody asks okay I love all the stories and all the Doodles in the book did you draw the Doodles yeah I did I did actually like kind of a the secret story is that like kind of the the data science infrastructure was just an excuse to draw a cartoon so I knew that life kind of a nobody would have published the cartoon so I mean I had to write all the text around it I love it so much I mean this is the first tech book that ever written has like fun stories and then Doodles like the Doodles just add so much more character yeah yeah yeah no that's that's fun and I I'm always a bit concerned that like some of these things I mean can become a bit dry so and uh it's kind of and also like the other thing is that I think that like if you if you actually like to follow the storyline I mean that's these are really like kind of the the story is uh like really crowded to the actual like the real life experiences that I have like experienced it personally or seen other person's uh experience so um so I I really do hope that they kind of a crown it to the reality because I guess like as we discussed before about this like uh balance between abstract and then concrete so I think it's really really important that there's this like a connection to the to the real life as well any questions like others uh well did we get the story yet on kind of how uh outer bounds began or what the decision was to to spin off and create a whole uh company behind metaflow yeah that's it that's a great question and uh uh now I think the um the really the backstory there is that the open source meta flow 2019 and again I mean it was really quite an experiment to see like if this approach uh really resonates with other companies and um and like like since the beginning we really like wanting to invest in the kind of the the health of the community helping people I mean we always have had like quite an active like a slack where we help other companies and and just like I see that like is this something that like really really resonates and then I guess like early 2000 oh sorry I mean like early um 2021 it had got to the point that there was a good number of companies using the system and they were reaching out to Netflix asking if Netflix can support them and of course like at some point like during the time justification of saying that look I mean we have to help all these other companies just like doesn't make sense like for for like Netflix Netflix specifically because they have so many needs of their own so then there was a bit of a like a Crossroads that we could have said to all these other companies that look like you are on your own you do whatever you want or like kind of then like we really have to basically commit and say that okay other people can continue focusing on the Netflix's need and like we focus on like helping all the other companies in the world and that's the path that we chose without our bombs we decided that this this exciting enough it seems that there's enough good momentum and Tailwind behind behind metaflow and of course like also like this super interesting time in this industry like with with kind of many companies becoming much more mature like with their ML and data science and we thought that okay I mean like maybe we have something to contribute which then like motivated the founding of the companies it was still cold unfortunately I don't so I mean like I what I quote is that I mainly like help help users customers so I mean like just if there are like some examples patterns so but no I think that's a conscious decision kind of but I mean Savin like our CTO I mean he's of course like a very actively like leading the technical development and now we have like many many more people like who are like a super like experience like at different layers of the stack so is that the main difference between like working at Netflix and be a CEO that CEO has a lot more things to do yeah yeah different different things and I what I really like about open source um these days is that it's it's a whole package I think it's always the code that kind of like somehow um people fixate to then it's kind of easy to look at the code base and like easy to look at the issues but if you consider any healthy um open source Community today I mean like let's say like a Jupiter notebook just a great example I would argue that like maybe 60 percent of it relates to other things beside the code I mean there's a documentation there are examples there are conferences uh amazing individuals helping each other and then yeah it's really not about the code and that's why I mean I have always felt that like what we started investing in early on at outer bounds is also like on all the other things except the code I mean like especially documentation and the content so if you go to autobots.com you can find all the how-to articles because this was always the pain point that that people always have an amazingly interesting kind of use cases and questions and then like there wasn't a natural place to answer these questions like how do you send an email like if your model training fails or like how do I include my own dependencies in a Docker file or whatever it might be so now I mean like we are kind of collecting that information in one place and so that's that's kind of like what I really enjoy like about now doing this in a startup that like we can kind of really approach this question holistically like kind of thinking that okay how can we support the community and the content side and of course the code is one part like like the events like whatever else there might be so yeah sounds very important I love the sandbox playground um undergrounds yeah that's a great example great example yeah that's right anything else you would like to share with us before we call it a meeting someone said okay let me just read it so the viewers can know the questions the first questions when meta flow was online at the beginning Netflix could you share some story about how did it save mle DS or d a day-to-day work the second question is I have zero experience about ml Ops any prerequisites to leverage meta flow yeah yeah well I mean definitely I mean maybe on the first one let's see um I guess like some of the the most delightful experiences have been the ones like where let's say I was working with some data scientists who previously had um either like work in a notebook just in a notebook where like maybe they had Python scripts of Their Own and it was just like running on their laptop and then maybe they were getting um data from from some data Lake Running SQL and so forth and it was kind of small scale and uh and like it was kind of slow to get the data and everything and it kind of worked um but I mean it just it was clear that like it ever like couldn't handle the realistic amounts of data and like it was kind of like uncomfortably slow and like of course I mean it was nowhere near like production ready and then I was able to work with this data scientist and uh and sit down with them and actually like more or less like I take the code that they have take the models nothing wrong with the models but I mean like just like a use the kind of the middle flow patterns to access the data and especially like start running these things in the cloud and like suddenly everything was like way faster like it was the same code I mean like the all the work that they had already done I mean it's not that they had to rewrite anything and and like I mean like some people even commented that it feels that they have super powers now that like everything like just like they are not stuck inside their laptop but I mean suddenly the world is their oyster and and they can start doing this really at the large scale and like kind of the results became much more interesting and like you could even like a train multiple models in parallel so kind of like seeing seeing kind of how much more capabilities people suddenly had and they didn't have to kind of learn like much anything new I mean it was the and like people were really asking that can it be actually this easy so I think that that was really a great moment um there and and then like of course the fact after that that then like when they asked that okay I mean now now we actually have to start producing business value I mean like it seems to work well I mean how do we actually get it to production and then we show that look I mean it's only a single line of command that you run something like step functions create or article workflows create and it's just like goes to the production orchestrator so and and like it's really the fact that you know that that is like organizational aspects that oftentimes data scientists are even a bit afraid that like somebody comes and yells at them that you are doing it wrong and that we were able to show that look I mean if you do it this way I mean nobody's going to yell at you and they were like oh my gosh I mean like I mean the previously the experience used to be that like you did your Docker files wrong or like you didn't do the scheduling right or something but I mean like automatically you do it this way and like it will just work so that was great um so Anthony I guess the other question was that like about the the envelopes and then like what what do you need to start leveraging the flow well I mean I would definitely encourage you to go to the website you can go to metaflow.org and and like start reading the the examples the tutorials I mean you can just do PP install metaflow on your laptop so I mean it should be like rather easy to get started you don't need to know anything about ml Ops in the beginning I mean it really helps actually there wasn't mlops when Marathon was started so so and and the the idea that we have is that uh these projects anyway they should um Pro gradually over time kind of in iterations so nobody even in large companies even if you were the most experienced demo Ops person in the world you shouldn't start a new project with the idea that you will add all bells and whistles like from the get-go especially with something as experimental as data science and machine learning you should always kind of like start with the simplest possible experiment that you can do to test your hypothesis that let's say it could be that like you have a new model and like you have a hypothesis that this will improve something and like maybe you can you can a B test that like you can just like it do the simplest possible thing you can deploy just enough to production that you can run maybe test and then you can see and like look I mean oftentimes it happens that like the results are not better and you realize that oh my gosh I didn't think about that or like maybe there's some cohort of users like who don't see an improvement and then you go back to the drawing Port but you didn't waste time like on all the things that don't matter and only after you have really proven that the value is there then you can start adding all kinds of other things that like like more model monitoring and like more reporting and like more filter and so what it might be so in that sense like yeah even if you don't know mlops today I mean just like go there like read the documentation get started and like don't worry about like all the things that you have to do later there's time for that awesome thank you thank you so much I think uh we're at the time uh super grateful for you to chat with our book club today um we're looking forward to the version two of the book that's awesome that's awesome and like yeah I mean like I said like a parting thoughts so definitely go to powderabunge.com there's the slack button I mean like a joint community so um amazing questions so if you have any more questions I mean like we have a very friendly group like with over 2000 people so I mean just like a joint there and like asking more awesome thank you um I will post a video on YouTube and I will send you the link soon sounds great well thanks a lot Sofia for doing this thank you so much bye bye
Original Description
Our DS/ML book club is reading the Effective Data Science Infrastructure book this month and we are very fortunate to chat with the author Ville Tuulos on the story behind Metaflow, Netflix, and Outerbounds. (Apologies for the low resolution due to technical issues)
📚 Effective Data Science Infrastructure Book link 📚
- https://amzn.to/3Hr5P5A
🌼 About me 🌼
Sophia Yang is a Senior Data Scientist working at a tech company.
🔔 SUBSCRIBE to my channel: https://www.youtube.com/c/SophiaYangDS?sub_confirmation=1
⭐ Stay in touch ⭐
📚 DS/ML Book Club: http://dsbookclub.github.io/
▶ YouTube: https://youtube.com/SophiaYangDS
✍️ Medium: https://sophiamyang.medium.com
🐦 Twitter: https://twitter.com/sophiamyang
🤝 Linkedin: https://www.linkedin.com/in/sophiamyang/
💚 #datascience
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Sophia Yang · Sophia Yang · 32 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
▶
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Customer lifetime value in a discrete-time contractual setting (math and Python implementation)
Sophia Yang
Time series analysis using Prophet in Python — Math explained
Sophia Yang
Multiclass logistic/softmax regression from scratch
Sophia Yang
Deploy a Python Visualization Panel App to Google Cloud App Engine
Sophia Yang
Deploy a Python Visualization Panel App to Google Cloud Run
Sophia Yang
[Read a paper (with code)] Beyond Accuracy: Behavioral Testing of NLP models with CheckList
Sophia Yang
5-step data science workflow
Sophia Yang
Multi-armed bandit algorithms - ETC Explore then Commit
Sophia Yang
Multi-armed bandit algorithms - Epsilon greedy algorithm
Sophia Yang
User retention analysis framework | data science product sense
Sophia Yang
Visualization and Interactive Dashboard in Python: My favorite Python Viz tools — HoloViz
Sophia Yang
Multi-armed bandit algorithms: Thompson Sampling
Sophia Yang
The Easiest Way to Create an Interactive Dashboard in Python
Sophia Yang
Big Data Visualization Using Datashader in Python | How does Datashader work and why is it so fast?
Sophia Yang
Why do you want to be a data scientist? Don't be a data scientist if ...
Sophia Yang
Johnny Depp v Amber Heard Twitter Sentiment Analysis | Is Camille Vasquez the real winner | 🤗 NLP
Sophia Yang
How to build a product that sells itself | Product-led Growth | Book Summary | Read a book with me
Sophia Yang
Designing Machine Learning Systems | book summary | Read a book with me
Sophia Yang
Where do data scientists/analysts go next? Love and hate in data analytics (ft. Shashank Kalanithi)
Sophia Yang
Meet the Author: Fundamentals of Data Engineering | DS/ML book club
Sophia Yang
What's new in hvPlot releases 0.8.0 & 0.8.1?
Sophia Yang
Meet the Author: Machine Learning Design Patterns | What do ML/Research Engineers do at Google?
Sophia Yang
Machine Learning Design Patterns | Google Executive | Investor | Meet the Author
Sophia Yang
How to solve data quality issues | Data Reliability | Meet the Author
Sophia Yang
Reliable Machine Learning author interview | DS/ML book club
Sophia Yang
Toronto VLOG | First vlog | Meet my favorite author | Toronto ML Summit conference
Sophia Yang
TOP 6 tech news in 2022 #shorts
Sophia Yang
How to deploy a Panel app to Hugging Face using Docker?
Sophia Yang
Tech news this week | ChatGPT, Hacks, Snowflake, CES #shorts
Sophia Yang
🗞️ Tech news this week: ChatGPT, DreamerV3, Muse, VALL-E, Mineral, DoNotPay, Tesla, SBF... #shorts
Sophia Yang
Tech news this week | Boston Dynamics, Microsoft, Snowflake, Google, and more #shorts
Sophia Yang
The story of Metaflow | Effective Data Science Infrastructure | Book author interview
Sophia Yang
Tech news this week #shorts
Sophia Yang
A day in life of a data scientist | Data Day Texas | Interview 12 authors/speakers
Sophia Yang
Tech news this week #shorts
Sophia Yang
Explainable AI with Shapley Values (Part 1: Game Theory)
Sophia Yang
Explainable AI with Shapley Values (Part 2: Estimate Shapley Values)
Sophia Yang
Explainable AI with Shapley Values (Part 3: KernelSHAP)
Sophia Yang
Tech news this week | AI search war between Microsoft and Google #shorts
Sophia Yang
The Story of ChatGPT's creator OpenAI | From Riches to Fame
Sophia Yang
Explainable AI for Practitioners | Must-read for XAI | author interview
Sophia Yang
Train your own language model with nanoGPT | Let’s build a songwriter
Sophia Yang
The easiest way to work with large language models | Learn LangChain in 10min
Sophia Yang
The BEST browser? AI article summary, image generation, website insights. Microsoft Edge Copilot!
Sophia Yang
startup scene in data | insights from 50+ data startups from Data Council
Sophia Yang
NLP with Transformers author interview with Lewis Tunstall from Hugging Face
Sophia Yang
4 ways to do question answering in LangChain | chat with long PDF docs | BEST method
Sophia Yang
5 Steps to Build a Question Answering PDF Chatbot: LangChain + OpenAI + Panel + HuggingFace.
Sophia Yang
4 Autonomous AI Agents: “Westworld” simulation, Camel, BabyAGI, AutoGPT, Camel ⭐ LangChain ⭐
Sophia Yang
MiniGPT4: image understanding & open-source!
Sophia Yang
BEST Practices in Prompt Engineering: Learnings and Thoughts from Andrew Ng's New Course
Sophia Yang
Designing Machine Learning Systems author interview with Chip Huyen
Sophia Yang
Tech news this week: code interpreter, Mojo, Redpajama, MPT7b, StarCoder #shorts
Sophia Yang
🤗 Hugging Face Transformers Agent | LangChain comparisons
Sophia Yang
📢 Tech news this week #shorts
Sophia Yang
📢 Tech news this week #shorts
Sophia Yang
The BEST ChatGPT Plugins | Brand NEW Bing Search | Web browsing, CODING, summarizing, and more
Sophia Yang
Tech news this week #shorts #short
Sophia Yang
📢 Tech news this week #shorts
Sophia Yang
Deep Learning with PyTorch Author Interview with Eli Stevens, Luca Antiga, and Thomas Viehmann
Sophia Yang
More on: Systems Design Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Monolith vs Microservices: A Real-World Architectural Autopsy
Dev.to · Erwin Wilson Ceniza2
FOV in FPS Games: The Math Behind Field of View Settings
Dev.to · Alex Carter
How I Structured My Next.js 14 App Router Project — And Why It Scales
Dev.to · Mbanefo Emmanuel Ifechukwu
Let’s write a simple Lexer in Go
Medium · Programming
🎓
Tutor Explanation
DeepCamp AI