Detecting Drift

Data Skeptic · Advanced ·📐 ML Fundamentals ·5y ago

Skills: ML Pipelines80%ML Maths Basics60%

Key Takeaways

Detecting data drift and outliers affecting machine learning model performance over time using techniques such as statistical process control and machine learning algorithms

Full Transcript

[Music] this is data skeptic time series the podcast about how to predict the future based on historical sequential data episode number [Music] when you train a model on time series data it's common to only use a recent window of time for your training data alternatively one might use a longer time window but weigh more heavily on recent examples this is done because of the reasonable assumption that more recent data is probably more representative of what's going to happen in the immediate future so once you've trained a model it can start making forecasts about that immediate future and just like a new car losing a large percentage of its value the moment it's driven off the car lot our production models will become less and less predictive over time due to a process called drift today on the show i speak with sam ackerman about how to detect drift in outliers that can affect our machine learning models hi i'm sam ackerman from ibm research labs in haifa israel i originally started out in economics i worked for four years in the federal reserve board in washington dc i studied statistics at a temple university i graduated in 2018 and since then i've been living in haifa and i started working for ibm very shortly after in israel and primarily my work is about quality issues in machine learning algorithms and building various products to detect changes in performance of algorithms this work that we're going to talk about falls very neatly into that well if you don't mind before we get to it i'm a little curious what are the tools that a statistician brings to the table that are useful at the federal reserve primarily i was helping economists with forecasts it was more an introductory role but i started to get more interested in statistics because there was a unit there one of the few that really does statistical work that does a survey of consumer finances and this was during the uh crisis 2009-12 approximately so working on like an actual survey really exposed me to day-to-day work in statistics but primarily the work there is more economics that's why i also change tracks how did you first get a taste for machine learning pretty much in my graduate studies while i was there at george washington university i took a few classes in data science through the stats department well the title of the paper i invited you on to talk about is detection of data drift and outliers affecting machine learning model performance over time a couple of good things to unpack maybe we can just start with drift what is drift for people who maybe know some ml but haven't encountered that concept yet there's very many kinds of drift and unfortunately the terminology is used differently by different people so a given term will mean something different depending on who's saying it very very confusing but drift in general is some kind of change in a thing that you're monitoring it can be in the underlying data it can be in the thing you're trying to predict relative to the data or for example uh the thing we're specifically interested here is changes in how a model performs when say like the underlying data changes but part of the assumption is that different scenarios have different assumptions for example maybe you don't know something about the underlying data or it's unobserved so everybody knows a model can be over fit most people also realize it can be underfit as well how does drift compare to those two concepts actually i'm not really sure i suppose if a model is over fit which generally means that it is too specific to the given data it was trained on then if there's any kind of shift or drift in the data the model might not be robust that's why usually a model that's generally more robust and is not very specific to the data will generally perform better if the data changes which is something that's very important right yeah i guess my question was sort of a complex way of asking about could we label drift as something that is under fit in the sense that my model wasn't good enough i didn't train something to recognize the signal that was there i think it can be but i think it's more general than that i mean you could have a model that's appropriately fit to the data that's expected and then something sudden might change that's not really possible to foresee during the initial training that's a possibility yeah the data generating process has changed yeah exactly and again depending on how the actual changes are performance may drift or not i could imagine where a lot of algorithms that run on the stock market if they haven't been retrained recently they might be making confusing predictions about gamestop for example and of course there's a lot of complexity in the markets it's not surprising there'd be drift there do we see drift in more banal places like i don't know customer lifetime value models and machine learning in factory settings for sure as an example customer preferences are always changing technology changes say there was a problem i was recently talking about in terms of building a model for phone specifications something having to do with how phones like technical specs and then things of course totally change when smartphones come in so that's a kind of model that the relationships might totally be changed due to some unexpected change in the underlying thing that you're modeling that could be like a structural paradigm that could not possibly be anticipated well if there's a company or organization using machine learning and they've got some good engineers on staff maybe they've spun up a system that retrains every night or even continuously and so they kind of have a retrained model all the time or with very little latency does that solve the problem of drift or could there be more to it that's an interesting question i think that the assumptions are that it may be very easy in some cases to simply retrain a model i think there are two things first of all in some scenarios it might be very expensive depending on the size of the data but also i think that uh you may want to perhaps retrain on some kind of moving window and not have it be something that's learned a new every day you might want to have some more kind of a learning from experience i think the general assumption these kinds of problems though is you might want to avoid retraining as much as possible otherwise it's not a very interesting problem well traditionally how if people approach this this is not a brand new idea even if there are some new techniques we can bring to the table what's been done in the past i just want to admit i'm not an expert in this particular field i know that it's very difficult because there's a lot of aspects to consider there's different types of data different dimensionality what's the domain different types of models so i can't speak very well to all the things that have been done before it's just one thing i know that in my search for this paper is that i was kind of frustrated because it was hard to find something out there that was simple to use it was very general a lot of the things that i found were very parametric and assuming a particular kind of distribution for example so it's a well studied problem by a lot of people in different fields i think finding something that controls the type of error that you want that is as general and easy to use is difficult for all those points you'd mentioned previously and we could probably make a laundry list more if we spent some time dreaming about it there are lots of reasons why companies can't just retrain their model every day but that doesn't mean we just give up hope it would be nice if we could provide the you know ops team that monitors that some sort of measurement or a blinking red light when something has drifted too far if we acknowledge our models could have drift and we'd like to get some indicator of if drift has occurred maybe it's worth investing the cost of retraining the model but i don't want to retrain it and incur that cost if i don't have to how can i begin to measure drift and know if it's applicable to my current situation well presumably you'd have to test the model under i guess i would call them laboratory conditions or basically under conditions where you assume that something's not changing under a stable condition and then see what happens that's kind of the approach that we took i think i think is very reasonable to measure something for a long time particularly collect various model metrics and then try to see what happens when you know that there is change happening yeah let's talk about the specific data set you guys used so i know mnist is in the paper could you remind listeners what mnist is and maybe talk about the clever holdout it's not exactly a holdout but what you guys did to make this possible yeah so mnist is a data set that is a very well used data set of handwritten digits originally comes from i believe the post office and the point is to decide what digit zero through nine is appears in the image based on a handwritten digit no matter how bad the handwriting yes so we need to pick a digit exactly and also accounting for different people's handwriting styles by now of course the models for predicting these things have gotten extremely good so good at predicting from this class of digits but the underlying data distribution there i presume doesn't change much haven't been introduced any new digits recently so how do we turn this into a problem and add drift to it so that's one of the things we did in the paper i'll just add the example we have with mnist is i'm not quite sure i would call it toy but it's simply an example to test if we can detect the changes it's maybe not like a more realistic case in the sense that the problem of modeling these images is easily solvable so that's not the thing we're trying to do what we tried to do going back to your question was to see if having only seen nine of the digits and we alternate which of the nine they are we basically we train a model to only have seen nine of the digits and then we introduce in various proportions and gradually over time in various ways we introduce the omitted digit so for example the model was trained only to know the existence of digits 0 through 8 and then suddenly we see a 9 which is something it hasn't seen before we want to kind of see does the model behave differently when it sees this digit obviously because it doesn't know what a 9 is it only knows what 0 through 8s are for example you can't say it's a nine but we can see by the fact that maybe it behaves differently in terms of its confidence that's one of the ways we try to make this very general gotcha i want to come back to confidence but just to maybe keep some empirical flavor to it what does happen in that scenario you've given the model an unfair test right you're now asking it to classify a digit it doesn't know about does it say this is an upside down six or do you get random answers what ends up happening it depends on which digit is sometimes uh threes and fives get confused i remember and sevens well i have personal experience of sevens and ones getting confused so i relate to that exactly yeah probably nines and eights things like that the thing we're tracking in this paper is the confidence of its prediction the distribution of that might depend on specifically which of the digits is the omitted one yeah so perhaps let's take a zero zero probably looks the most different of all the digits possibly if a zero was the one that was omitted it was not seen then we're trying to track his confidence so if it sees something that looks totally different the confidence might be a lot lower basically a low confidence means i have no idea what this digit is so it might be an easier problem to detect interesting so that's sort of a diagnostic we might use maybe that i mean you and i know kind of from the as the experimenter's point of view that that model wasn't trained on the zero so when we give it the zero you can inspect how that model is operating and learn something about does it have a giveaway signal that tells us that hey it's struggling here and it sounds like you're saying confidence is that signal we can look for as teachers in a way and see our students struggling or have i over anthropomorphized it maybe but i can see the analogy so what do you see in practice when you look at the confidence scores and compare the distributions between you know digits that are in the training set and these uh hold out ones you've kept to the side or i guess the new character really that's been introduced we should say so typically we see that the confidence decreases although uh sometimes we do actually see that it becomes very very confident wrongly because obviously if it doesn't know what the digit is the answer will be wrong but the point is that sometimes it's overly confident in its wrong answer it sounds counterintuitive but if that's a change then that can also be important yeah if i don't know that sevens exist i might be very confident that something is a seven to me is a one right that would seem reasonable is that just a consequence of the models you know quote-unquote ignorance or is that a signal to you as well does this high confidence give a hint that the model is struggling i think that that would have to be situation dependent in the case of the amnesty digits that might be true but i think again this is just one example to illustrate this technique but i think it would really depend on whatever data you were looking at intuitively i really like this concept of check the confidence and something about the distribution will give us hints but of course we've got to balance that in some way the what's the parable of the boy who cried wolf that if you sign the alarm too many times people might stop coming how do we know when to flag something as you know the algorithm now believes there was a change in the data versus just noise and background fluctuation so that's actually a feature of the particular detection algorithm that we use the algorithm is called a cpm or change point model and it specifically addresses that issue and in particular what we're trying to do is a sort of sequential detection and this is one of the reasons why it's particularly a difficult statistical problem because like you said you don't want to cry wolf very many times and the issue here is that basically you can only cry wolf once because we say in our setting at least you can only say that there was drift once and then that's your final decision and if you were before it happened then you're wrong you don't get to keep on observing data so the statistical problem is very difficult because you both want to basically have your cake and eat it too you want to be able to repeatedly test the data in terms of examining whether the confidences that i've seen recently differ from what i've seen historically but you also want to control your error and the problem is of course the more data you see and basically the longer you're observing the more times you test and basically you get more chances to screw up by making a mistake so and this is basically why i chose this algorithm after looking for a lot because of the ones i could see it had all the features i was looking for it was very general in terms of being non-parametric and not making assumptions and it allowed for this repeated kind of look back at the history while also controlling the type one error or false positive rate which is basically the probability if a positive decision is a decision that there is drift you want to control the probability of declaring that there is drift when it is false when it hasn't happened [Music] thanks to this week's sponsor vertica vertica's ml powered analytics is designed for pioneers it's a truly unified platform vertical lets you keep your data where it is vertica offers cloud variety giving you the flexibility to use multiple clouds on-prem storage or a hybrid or both with the ability to read from different formats across all of them with superior query speeds sophisticated sql analytics and greater accuracy vertica gives you the power and performance to take your business to new heights verdict is built to adapt and provides the tools and agility to evolve with the future of analytics vertica is analytics for pioneers head to vertica.com insights to learn more there you can read how other data pioneers use vertica to realize over 2.2 million in benefits through better business insights and less effort that's vertica.com insights well i'd love to unpack a little bit more about that change point model could you give us a rough high level definition and i know it's an algorithm so there's you know at the end of the day it's go look at the steps but in broad strokes what are the mechanics of that system so this is an algorithm that was developed by adams and ross and i think it's very intuitive and it does exactly what you would want basically what it does is at every point in time that you're observing say for example i have data every day okay and say i'm at the 50th day of observation what i want to do or what what the algorithm basically does is it says i want to see if somewhere in this past 50 days there was a given day where before it the data look very different from the data after so maybe the 30th day or whatever it is so it basically it checks every possible split point so the second the third etc where there's a period you know some data before in sunday after and it says let me calculate some measure of distance between the after data and the before data the measure of distance is non-parametric which is one of the great things and then it finds the most anomalous split point and it says okay say for example the data was most different if you split after the 30th day okay so it says the 30th day is my candidate for a split now the threshold for declaring something an anomalous split point the threshold for how anomalous it has to be is actually increasing over time which is one of the key points here so basically the longer you observe the data for say the 100th day or the 300th day the higher kind of the the change has to be and so it compares a statistic based on that split point to in this case if you observe 50 days it'll compare that to like the 50th critical value which is again increasing over time and then if that passes then it will say that there was a split before the 50th day and then it happened at that 30th day the most anomalous split point and if not then you just continue doing this uh for the next time and you consider all the splits again the way it does this it does it statistically so you're guaranteed that whenever you make the decision that there is a change at any point in time that has the say the five percent false positive rate alpha that you are looking for gotcha yeah tell me a little bit about that knob you have for alpha where i can fine tune that i guess to one direction or the other for a certain sensitivity do you have any guidelines on how a person might set that for their specific use cases obviously we know that many people know that there's uh there's kind of standard values five percent one percent i really think it depends on as in pretty much every application how you value the different risks so for example say it's very costly to retrain the data or that basically there's a high penalty if you get it wrong is another way of setting it then uh you'd want your alpha to be lower say maybe one percent as opposed to five percent let me just add something i forgot to mention this before but when you mentioned that the costs of retraining might be low i think that um well that is kind of the motivation here it's not the be all and end law like it may be in practice very easy to train and so maybe you might think your costs are low but you still might want to detect drift or you assume i want to be able to detect it even if you're going to retrain anyways the motivation and the practicality can be disconnected a little bit that's an excellent point i could see a strong argument for an organization not wanting to automate that process because they need to understand the dynamics of that drift if the business is declining just because the models predict the decline very well is not going to be a good thing if no one's aware of the decline yeah and of course the model performance is only that happens to be the application for this particular paper but the cpm algorithm is it can be applied to any kind of in this case univariate data that's just our application can you tell us a little bit about the scoring function you guys developed uh what we did when we ran our experiments was we experimented with different kinds of scenarios under which drift which again drift here is a new digit being encountered various scenarios under which that can happen for example he introduced very gradually at a certain point suddenly 50 of the instances all of them can imagine various scenarios like this and of course a very more gradual change is more difficult to detect than a sudden change because initially when you observe it it looks initially like what you saw before so it's harder to differentiate it and what we introduced here in this paper is an algorithm that basically takes into account how long you took to detect the change and also the gradualness and the rate at which the change was introduced and basically says it penalizes you according to those aspects so a more sudden change that was detected and a greater delay would be basically say you had a larger delay on a more sudden change which is easier that might be scored the same as uh say a more gradual change that you took longer to detect so basically the easier the problem is to detect the less penalized you are for delay and i think that's pretty novel because otherwise you don't really have a good even basis on saying how well did i do on this problem in terms of applying this in practice does one have to work with a certain class of machine learning models or is this a generalized framework i think it's very generalized the cpm currently only takes univariate data in a very basic continuous data so as long as your algorithm can output some kind of metric that you want to track then i think that it works fine i mean i would for example we chose the confidence for specific reasons because it doesn't rely on knowing the underlying label so for example accuracy requires you to know what the actual label is which of course uh you might not know so it kind of defeats the purpose yeah i think any algorithm that has an output that you can track that's reasonable or it's the thing is that you want the distribution of that metric to change when there is some kind of underlying change that you're interested otherwise you can't really use it to detect anything i'm curious about the scalability and resource demands of change point models what if i'm thinking of a company like visa that surely has some machine learning going on for fraud detection and is processing what thousands of transactions a second probably i mean if they have the need to do that it's probably worthwhile investing in techniques like your research but how do those techniques what are the costs in production for in terms of computational time and resources and things along those lines so my impression is that the cpm is not very scalable that's fine for our application i suppose if you were doing something that was a lot more intense you might need some other kind of tool of course for example the cpm as we use it examines like i said before it examines every possible split point so if you have millions of data points that might not be even necessary you might be able to modify the procedure in some way to say consider only splitting on every day or some larger window as opposed to every individual point you know in our case we opted to examine the whole history that's basically what the cpm does every time you might also not need to do that you might only need to investigate say constant moving window not the whole history so that would be something maybe per use case you would fine-tune is that your vision of it yeah that's possible you would have to modify the statistical algorithm to deal with that well in many ways the paper could be described as a theoretical work i mean it's a good amount of math and some proofs and some simulation and things like that but it's for a very practical problem that is to say many organizations in their production systems need tools like this do you have a vision for how research like this makes it out into the real world that's one of the hard parts the team that i work on for example we develop various uh tools for enterprise clients for example for monitoring uh changes and models i'm not sure like this particularly would be a product but i think it also it informs other research that we do so everything's going to make it into a product but if it's useful in some way that's a product in its own right if it's delivering value somewhere i don't think any machine learning practitioner would read your paper and say this is unnecessary i mean i certainly want to hear why they would think that it doesn't seem like a wise response but maybe they'd be thinking oh we just have so much on our hands is this truly important at what scale or scope or maybe there's when some ethical concerns get involved do you have guidelines for when an organization needs to start doing serious consideration work like this on their machine learning models i think it depends on what's at stake obviously uh every large company that does serious kind of data analytics you know things will be at stake like uh some company's algorithm for predicting uh whatever it is say search results or customer preferences if that's the basis of their business and there's some kind of change in the model accuracy even small that can greatly affect the bottom line that's from a pure financial perspective but you can also consider say a product for monitoring say mammograms or something else where a person's life is at stake the stakes are a lot higher not just financial i remember there was an example i think some kind of mammogram image recognition algorithm and it was only trained on people from the west and then they used it in china and the images there are different or the resolution it might be totally different on other kinds of mammograms so you really need to make sure that either if your algorithm can't deal with it then you were alerted to that fact well when we started talking you'd mention a lot of your research is about analyzing the quality of machine learning models certainly change point modeling and work like we talked about today is enough that i'm sure you could occupy all your time just on that is that what you're thinking about or do you look at a bigger umbrella of things under the quality portfolio the change modeling actually isn't really what we're doing at least on my team now at all mostly we're building tools to help diagnose a model particularly to give a user some useful information about what data regions the model accuracy is lower and basically provide this information in some kind of human readable and usable form to help make decisions we are doing some drift testing in that respect but uh less on the the sequential work that was involved in this particular paper well i'm curious just broadly speaking or we could get into some specifics if you want when you provide a service like that how do you derive the value is it just that the data scientists can iterate faster because you've helped them more readily understand their model or is it for i don't there could be regulatory reasons why you have to have a model that's a little bit self-explanatory what are some of the ways in which people benefit from that work i think it depends on the context like you said for sure if it's a bank loan data or something with uh predicting arrest rates those have to be explanatory like you said there's regulations around those things but uh i think it doesn't have to be that kind of context it can just be helping data scientists develop their models i mean that's a good enough goal in its end yeah the human readable part is it's not for the benefit of it being necessarily fair or interpretable it's for the benefit of the user the user here being the data scientist in some way right yeah to identify ways in which maybe they need to get more data if they have some blind spot in their training but a lot of feedback like that i would close the loop and make me innovate a little faster very cool well sam where can people follow you online i have a blog that i write for ibm basically what i cover are some less discussed topics of statistics that can be applied to a machine learning and data science in practical situations well awesome i encourage everybody to check that out a lot of good content well sam thanks again for coming on the show to share your research and expertise thank you very much kyle i really appreciate it that concludes another installment of data skeptic time series our guest this week was sam ackerman claudia armbruster is our associate producer vanessa bly does guest coordination and i've been your host kyle police [Music]

Original Description

Sam Ackerman, Research Data Scientist at IBM Research Labs in Haifa, Israel, joins us today to talk about his work Detection of Data Drift and Outliers Affecting Machine Learning Model Performance Over Time.

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Skeptic · Data Skeptic · 0 of 60

← Previous Next →

Data Skeptic book giveaway contest winner selection

Data Skeptic book giveaway contest winner selection

OpenHouse - Front end and API overview

OpenHouse - Front end and API overview

OpenHouse Crawling with AWS Lambda

OpenHouse Crawling with AWS Lambda

[MINI] Logistic Regression on Audio Data

[MINI] Logistic Regression on Audio Data

Data Provenance and Reproducibility with Pachyderm

Data Provenance and Reproducibility with Pachyderm

[MINI] Primer on Deep Learning

[MINI] Primer on Deep Learning

Big Data Tools and Trends

Big Data Tools and Trends

[MINI] Automated Feature Engineering

[MINI] Automated Feature Engineering

The Data Refuge Project

The Data Refuge Project

[MINI] The Perceptron

[MINI] The Perceptron

[MINI] Feed Forward Neural Networks

[MINI] Feed Forward Neural Networks

Data Science at Patreon

Data Science at Patreon

[MINI] Backpropagation

[MINI] Backpropagation

[MINI] Generative Adversarial Networks

[MINI] Generative Adversarial Networks

[MINI] AdaBoost

[MINI] AdaBoost

[MINI] The Bootstrap

[MINI] The Bootstrap

[MINI] Gini Coefficients

[MINI] Gini Coefficients

[MINI] Random Forest

[MINI] Random Forest

[MINI] Heteroskedasticity

[MINI] Heteroskedasticity

Urban Congestion

Urban Congestion

[MINI] The CAP Theorem

[MINI] The CAP Theorem

Unstructured Data for Finance

Unstructured Data for Finance

Detecting Terrorists with Facial Recognition?

Detecting Terrorists with Facial Recognition?

Predictive Models on Random Data

Predictive Models on Random Data

[MINI] F1 Score

[MINI] F1 Score

Machine Learning on Images with Noisy Human-centric Labels

Machine Learning on Images with Noisy Human-centric Labels

The Library Problem

The Library Problem

Stealing Models from the Cloud

Stealing Models from the Cloud

Data Science at eHarmony

Data Science at eHarmony

Multiple Comparisons and Conversion Optimization

Multiple Comparisons and Conversion Optimization

Election Predictions

Election Predictions

[MINI] Calculating Feature Importance

[MINI] Calculating Feature Importance

MS Connect Conference

MS Connect Conference

The Police Data and the Data Driven Justice Initiatives

The Police Data and the Data Driven Justice Initiatives

Studying Competition and Gender Through Chess

Studying Competition and Gender Through Chess

[MINI] Goodhart's Law

[MINI] Goodhart's Law

Trusting Machine Learning Models with LIME

Trusting Machine Learning Models with LIME

Predictive Policing

Predictive Policing

Mutli-Agent Diverse Generative Adversarial Networks

Mutli-Agent Diverse Generative Adversarial Networks

[MINI] Convolutional Neural Networks

[MINI] Convolutional Neural Networks

Unsupervised Depth Perception

Unsupervised Depth Perception

[MINI] Max-pooling

[MINI] Max-pooling

Activation Functions

Activation Functions

[MINI] The Vanishing Gradient

[MINI] The Vanishing Gradient

Estimating Sheep Pain with Facial Recognition

Estimating Sheep Pain with Facial Recognition

[MINI] Conditional Independence

[MINI] Conditional Independence

MINI: Bayesian Belief Networks

MINI: Bayesian Belief Networks

Project Common Voice

Project Common Voice

[MINI] Recurrent Neural Networks

[MINI] Recurrent Neural Networks

Detecting data drift and outliers is crucial for maintaining machine learning model performance over time. This talk discusses techniques for detection and mitigation.

Key Takeaways

Collect and analyze data
Apply statistical process control
Monitor model performance
Detect data drift
Identify outliers
Mitigate drift and outliers

💡 Data drift and outliers can significantly impact machine learning model performance, and detecting them is crucial for maintaining model accuracy

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Pipelines

View skill →

Building a Dog Breed Identifier App from scratch - DogNet

Building a Dog Breed Identifier App from scratch - DogNet

Aladdin Persson

Complete Dockers For Data Science Tutorial In One Shot

Complete Dockers For Data Science Tutorial In One Shot

Part 6 | Deploy ML Model on Kubernetes | Auto-Scaling with HPA and Monitoring with Prometheus

Part 6 | Deploy ML Model on Kubernetes | Auto-Scaling with HPA and Monitoring with Prometheus

Abonia Sojasingarayar

Vertex Pipelines: Qwik Start

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Automate R scripts with GitHub Actions: Deploy a model

Related AI Lessons

How to Learn a Hard Technical Skill Without Burning Out

Learn how to acquire hard technical skills without burnout by creating a sustainable learning plan

Dev.to · Anas Kalthoum | FreeBrain

After interviewing over 100 ML Candidates. Last Week Someone Walked In and Made Me Take Notes.

Learn what makes a standout ML candidate after interviewing over 100 applicants

Medium · Machine Learning

How AI Learns with Less Labeled Data

Discover how AI can learn with less labeled data, a crucial aspect of machine learning beyond model selection

Medium · Machine Learning

Mastering TypeScript — Understanding the TypeScript Compiler (tsc) from Scratch — Lesson 2

Learn the basics of the TypeScript compiler to write better JavaScript code

Medium · JavaScript

Learn Deep Learning by Hand (Beginner's Guide - Part 1)