Survivorship Bias in machine learning tutorials

MLOps.community · Intermediate ·📐 ML Fundamentals ·6y ago

Skills: ML Pipelines80%AI Safety Engineering60%

Key Takeaways

The video discusses survivorship bias in machine learning tutorials, highlighting the lack of transparency in blog posts and the need to share stories of failed projects and challenges faced during deployment, with a focus on MLOps and machine learning fundamentals.

Full Transcript

when we talked earlier I thought it was super funny how you said there's a little bit of a lack of transparency when you look at different blog posts on ml ops right now or you look at people explaining how they're putting things into production can you go into that a bit more in depth for us yeah exactly I mean it's it's kind of the way that I feel that most of the the things that are published to right now in terms of machine learning suffers a very huge survivorship bias and know like we just see the the shiny stories like okay I implemented machine learning my company and we earned two thousand two thousand dollars per minute or something like that or we or we have the latest framework that solve all the problems of universe very shiny cases and so on but as long as we just highlight as aspects one thing that I think for me it's missing it's that what's the story of the guys that failed or what the stories of the guys that are right now in the trenches suffering to put some sums those systems in furniture right now because as we discussed before in in a forever wiener that we have big big tag companies are being bu burned so one we have a very big tray off of guys that did not survive and they're like some death march projects machine learning or some kind of delusions with machine learning or teams that were completely fired for example or or machine learning systems that we're replaces we put the rules and some point and stuff like that and my point is that it's super cool to see those those posted hacker news or medium blog posts a personal blog post that okay so we put the system production so one but one thing that I think it's it bothers me and it's really this with the high-stakes machine learning talk that we are discussing right now it's that let's let's turn a little bit more about the bad cases that the cases that fail or how those machine learning projects are suffering most you know in terms of moment so hugs huh looks like your deployment how looks like you're in code review oh it's like your data or code management or experiment experiment in tracking you know so and then once it's cuss about this everyone disclose only about this a very bright side of all those are those new technologies of course this is part of the hive of course but if you're talking about something that needs to take a little bit seriously in terms of put things in production that can be can be reliable and certainly we should discuss about the bad things also so that's my that's the way that I do

Original Description

What are common problems when learning from blogposts about machine learning? In our 5th meetup, we spoke with the Brasilian ML Engineer Flavio Clesio. In this video he talks to us about his feelings about survivorship bias within machine learning and how more transparency is needed within the community when sharing projects. This is taken from a longer conversation that can be found here: https://youtu.be/9g4deV1uNZo Machine Learning Systems play a huge role in several businesses from the Banking industry to recommender systems in entertainment applications until health domains. The era of "A Data Scientist with a Script in a single machine" is officially over in high stakes ML. We're entering an era of Machine Learning Operations (MLOps) where those critical applications that impact society and businesses need to be aware of aspects like active failures and latent conditions. This talk will discuss risk assessment in ML Systems from the perspective of reliability, safety and especially causal aspects that can lead to the rise of silent risks in said systems. Slides to the talk can be found here: https://docs.google.com/presentation/d/1gP0A_EXLYafeYAak_vM8lA2nwT7EBaNv3QAWkzBC004/edit?usp=sharing Bio: Flavio Clesio is Machine Learning Engineer (NLP, CV, Marketplace RecSys) and at the moment works at MyHammer AG, where he helps build Core Machine Learning applications to exploit revenue opportunities and automation in decision making. Prior to MyHammer, Flavio was a Data Intelligence lead in the mobile industry, and business intelligence analyst in financial markets, specifically in Non-Performing Loans. He holds a master’s degree in computational intelligence applied in financial markets (exotic credit derivatives). This was a virtual fireside chat between Flavio Clesio, Demetrios Brinkmann and the MLOps community. Relevant links can be found below. Join our MLOps slack community: https://bit.ly/3aOTwgR and register for the next meetup here. Connect with D

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from MLOps.community · MLOps.community · 37 of 60

← Previous Next →

Our 1st MLOps Meetup // Luke Marsden // MLOps Meetup #1

Our 1st MLOps Meetup // Luke Marsden // MLOps Meetup #1

MLOps.community

Remote Collaboration as a Data Scientist

Remote Collaboration as a Data Scientist

MLOps.community

MLOps Manifesto with Luke Marsden from Dotscience

MLOps Manifesto with Luke Marsden from Dotscience

MLOps.community

MLOps lifecycle description

MLOps lifecycle description

MLOps.community

What Does Best in Class AI/ML Governance Look Like in Fin Services? // Charles Radclyffe // MLOps #2

What Does Best in Class AI/ML Governance Look Like in Fin Services? // Charles Radclyffe // MLOps #2

MLOps.community

Life purpose and too many spreadsheets

Life purpose and too many spreadsheets

MLOps.community

Explainability, Black boxes and EU white paper on reproducibility

Explainability, Black boxes and EU white paper on reproducibility

MLOps.community

Hierarchy of Machine Learning Needs // Phil Winder // MLOps Meetup #3

Hierarchy of Machine Learning Needs // Phil Winder // MLOps Meetup #3

MLOps.community

Automatically Retrain Machine Learning Models? Are best practices worth it?

Automatically Retrain Machine Learning Models? Are best practices worth it?

MLOps.community

Building an MLOps Team? Key ideas to keep in mind

Building an MLOps Team? Key ideas to keep in mind

MLOps.community

Hierarchy of MLOps Needs

Hierarchy of MLOps Needs

MLOps.community

Bare necessities for getting an ML model into production

Bare necessities for getting an ML model into production

MLOps.community

MLOps and Monitoring

MLOps and Monitoring

MLOps.community

How Phil Winder got into Data Science and Software Engineering

How Phil Winder got into Data Science and Software Engineering

MLOps.community

Provenance and Reproducibility in Machine Learning; what is it and why you need it?

Provenance and Reproducibility in Machine Learning; what is it and why you need it?

MLOps.community

Friction Between Data Scientists and Software Engineers

Friction Between Data Scientists and Software Engineers

MLOps.community

MLOps Problems in different size companies

MLOps Problems in different size companies

MLOps.community

ML tooling in large companies

ML tooling in large companies

MLOps.community

ML Platforms - The build vs buy question

ML Platforms - The build vs buy question

MLOps.community

ML Services Gateway at SurveyMonkey

ML Services Gateway at SurveyMonkey

MLOps.community

Message buses, Async and sync architecture

Message buses, Async and sync architecture

MLOps.community

MLOps #4: Shubhi Jain - Building an ML Platform @SurveyMonkey

MLOps #4: Shubhi Jain - Building an ML Platform @SurveyMonkey

MLOps.community

Hybrid Data Science Teams @SurveyMonkey

Hybrid Data Science Teams @SurveyMonkey

MLOps.community

How do you handle ML version control at SurveyMonkey

How do you handle ML version control at SurveyMonkey

MLOps.community

Doing ML with Personal Information

Doing ML with Personal Information

MLOps.community

Evolution of the ML feature store @SurveyMonkey

Evolution of the ML feature store @SurveyMonkey

MLOps.community

Developing a Machine Learning Feature Store

Developing a Machine Learning Feature Store

MLOps.community

Auto retrain ML models is not the question

Auto retrain ML models is not the question

MLOps.community

3 key parts to Machine Learning monitoring

3 key parts to Machine Learning monitoring

MLOps.community

MLOps Meetup #6: Mid-Scale Production Feature Engineering with Dr. Venkata Pingali

MLOps Meetup #6: Mid-Scale Production Feature Engineering with Dr. Venkata Pingali

MLOps.community

MLOps meetup #5 High Stakes ML: Active Failures, Latent Factors with Flavio Clesio

MLOps meetup #5 High Stakes ML: Active Failures, Latent Factors with Flavio Clesio

MLOps.community

MLOps: Airflow Pros and Cons

MLOps: Airflow Pros and Cons

MLOps.community

Specific challenges in Machine Learning

Specific challenges in Machine Learning

MLOps.community

Current State Of Machine Learning

Current State Of Machine Learning

MLOps.community

Humans in the Loop are a defining factor in Machine Learning

Humans in the Loop are a defining factor in Machine Learning

MLOps.community

Learning from real life Machine Learning failures

Learning from real life Machine Learning failures

MLOps.community

Survivorship Bias in machine learning tutorials

Survivorship Bias in machine learning tutorials

MLOps.community

Swiss Cheese model in Machine Learning

Swiss Cheese model in Machine Learning

MLOps.community

Resume driven development in Machine learning & software engineering

Resume driven development in Machine learning & software engineering

MLOps.community

Who has the highest standards in ML?

Who has the highest standards in ML?

MLOps.community

Venkata Pingali of Scribble Data Thoughts on the Current State of Machine Learning

Venkata Pingali of Scribble Data Thoughts on the Current State of Machine Learning

MLOps.community

Dependable data and being able to Trust in your Data with Venkata Pengali of Scribble Data

Dependable data and being able to Trust in your Data with Venkata Pengali of Scribble Data

MLOps.community

Speed, Trust, Evolution and Scale in MLOps

Speed, Trust, Evolution and Scale in MLOps

MLOps.community

More difficult transition for data scientists to become ML engineers

More difficult transition for data scientists to become ML engineers

MLOps.community

How many models in prod til I need a dedicated ML platform?

How many models in prod til I need a dedicated ML platform?

MLOps.community

Deeper thinking from data scientists around platform blackholes

Deeper thinking from data scientists around platform blackholes

MLOps.community

Checkpointing, metadata, and confidence in your data

Checkpointing, metadata, and confidence in your data

MLOps.community

Adjacent usecases and multistep feature engineering

Adjacent usecases and multistep feature engineering

MLOps.community

Standardization of Machine Learning tools like in Software Engineering with Venkata Pingali

Standardization of Machine Learning tools like in Software Engineering with Venkata Pingali

MLOps.community

Reproducability flaws in end to end Machine Learning debugging

Reproducability flaws in end to end Machine Learning debugging

MLOps.community

3rd wave of data scientists

3rd wave of data scientists

MLOps.community

MLOps meetup #7 Alex Spanos // TrueLayer 's MLOps Pipeline

MLOps meetup #7 Alex Spanos // TrueLayer 's MLOps Pipeline

MLOps.community

MLOps Meetup #8 Optimizing Your ML Workflow with Kubeflow 1.0

MLOps Meetup #8 Optimizing Your ML Workflow with Kubeflow 1.0

MLOps.community

Are Kubeflow and Airflow complementary?

Are Kubeflow and Airflow complementary?

MLOps.community

Why Kubeflow gained so much traction=open community

Why Kubeflow gained so much traction=open community

MLOps.community

Who decides the dirrection of Kubeflow

Who decides the dirrection of Kubeflow

MLOps.community

What do Kubeflow and Arrikto do and how do they work together?

What do Kubeflow and Arrikto do and how do they work together?

MLOps.community

Versioning your ML steps with Kubeflow

Versioning your ML steps with Kubeflow

MLOps.community

Machine Learning Lifecycles//Perception vs Reality

Machine Learning Lifecycles//Perception vs Reality

MLOps.community

Kubeflow vs SageMaker in Machine Learning

Kubeflow vs SageMaker in Machine Learning

MLOps.community

The video highlights the importance of transparency in machine learning tutorials, discussing the need to share stories of failed projects and challenges faced during deployment, and how this can help improve the reliability of ML systems. The speaker, Flavio Clesio, emphasizes the need to discuss the bad cases and challenges faced in ML projects. By watching this video, viewers can gain a better understanding of the challenges faced in ML deployment and the importance of transparency in AI.

Key Takeaways

Identify potential biases in ML tutorials
Analyze the challenges faced in ML deployment
Develop strategies to improve transparency in ML projects
Implement MLOps to ensure reliable ML systems
Share stories of failed projects and challenges faced during deployment

💡 Survivorship bias in machine learning tutorials can lead to unrealistic expectations and a lack of understanding of the challenges faced in ML deployment, highlighting the need for transparency and sharing of failed projects and challenges.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Pipelines

View skill →

Building a Dog Breed Identifier App from scratch - DogNet

Building a Dog Breed Identifier App from scratch - DogNet

Aladdin Persson

Complete Dockers For Data Science Tutorial In One Shot

Complete Dockers For Data Science Tutorial In One Shot

Part 6 | Deploy ML Model on Kubernetes | Auto-Scaling with HPA and Monitoring with Prometheus

Part 6 | Deploy ML Model on Kubernetes | Auto-Scaling with HPA and Monitoring with Prometheus

Abonia Sojasingarayar

Vertex Pipelines: Qwik Start

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Automate R scripts with GitHub Actions: Deploy a model

Related AI Lessons

Stop Overfitting With Basically One Line of Code

Learn to prevent overfitting with a simple code tweak and understand the difference between Ridge and Lasso regression

Stop Overfitting With Basically One Line of Code

Learn to prevent overfitting in machine learning models with a simple code tweak and understand the difference between Ridge and Lasso regression

Medium · Machine Learning

Stop Overfitting With Basically One Line of Code

Prevent overfitting in models with a simple code tweak, understanding the difference between Ridge and Lasso regression

Medium · Data Science

Stop Overfitting With Basically One Line of Code

Learn to prevent overfitting in machine learning models with a simple code tweak, comparing Ridge and Lasso regression techniques

Medium · Python

Learn Deep Learning by Hand (Beginner's Guide - Part 1)