How does a Data Scientist Fight FRAUD?

CodeEmporium · Beginner ·📐 ML Fundamentals ·5y ago

Skills: Supervised Learning80%ML Maths Basics70%ML Pipelines60%

Key Takeaways

Data scientists use machine learning fundamentals to detect and prevent fraud by building datasets, setting up models, and evaluating their performance, leveraging tools like Kite for smarter coding.

Full Transcript

there are a bunch of kaggle notebooks and blogs online that take a credit card detection data set probably run this through some standard machine learning process and give out some performance with a generic metric but there are many nuances to dealing with fraud from thinking about potential features to how we can report results my goal here is to add some color to fraud detection and prevention with machine learning fraud is fun you just need to know how to deal with it but before we continue this video is sponsored partially by kite they provide a code completion service for machine learning code it integrates super well with your editors and even jupiter notebooks so click the link in the description to try kite for free now back to the video so let's start with a little base example here think about grandma she runs a laptop repair line where people place a work order online ship the laptop to a warehouse her workers repair the laptop they send it back we'll be using this business as an example throughout the video now let's first ask some basic questions how do we know that fraud has occurred it's through chargebacks now let's see what that is this is jj hey bank i don't recognize this 500 transaction that i paid to grandma fixes hello grandma it looks like jj's transaction may have been fraudulent we'll be taking the 500 back oh i see it is possible no dispute here hello jj your money is now back in your account oh nice people can file for chargebacks if they don't recognize a transaction on their phone this allows banks to forcefully reverse a transaction so another question why do we need fraud detection chargebacks do nothing to a merchant in the best case but they typically incur losses merchants can dispute the chargeback if they are confident it wasn't fraud so perhaps setting up a fraud detection system can prevent malicious users from making this transaction in the first place now chargebacks can be a pain to deal with sometimes people don't file chargebacks until months after a fraud transaction occurs they do their taxes and then they realize that they don't recognize a 500 transaction from six months prior hassle for customers big hassle for merchants so fraud detection can mitigate this now the main question how does fraud happen let's paint a few different scenarios the first being malicious actor malcolm malcolm starts by creating an account on grandma fixes hello grandma can i get a work order for a laptop why sure thing sweet pea that's 500. you can take it off my card wink wink why thank you malcolm here is your laptop fixed and good as new yay one week later enter jj uh hey bank i don't recognize this 500 transaction that i paid to grandma fixes hello grandma it looks like that 500 transaction may have been fraudulent we will be taking the 500 back i see it is possible this malcolm was winking a lot no disputes here i see hello jj your money is now bank in your account rejoice ah rejoice i will thank you this kind of fraud is harmful malcolm created an account on grandma's website and made a fraudulent transaction with malicious intent jj had to deal with the hassle grandma had to deal with the hassle and the loss and malcolm got a free work order in we ideally want a system that blocks malcolm's transactions but not all fraud happens this way incoming friendly fraud uh hey grandma can i get a work order for a laptop why sure thing sweet pea that's 300. you can take it off my card why thank you and here's your fixed laptop oh thank you so much six months later [Music] uh hey bank i don't recognize this 300 i paid to grandma fixes hello grandma it looks like the 300 transaction may have been a fraudulent one we will be taking the 300 back but i remember this young girl jj though from here grandma could file a dispute claiming the transaction was legit or just not deal with the hassle and jj gets her money back in this scenario the transaction was legit but it's being flagged as fraudulent because jj forgot that she made the transaction friendly fraud is harder to predict since there's no suspicious activity the situation isn't good for anyone though even though jj walked away with a free work order grandma is going to be extra cautious about jj in the future especially since this third scenario could have occurred too let's get to that third scenario account takeover malcolm starts by logging into jj's account hello i mean hey grandma can i get a work order from a laptop by the way i'm jj oh sure thing jj what a sweet little girl that's five hundred dollars oh oh take it off my card wink wink yes ma'am and here's your fix laptop jj yay thank you grandma i appreciate it one week later uh hey bank i don't recognize this 500 that i paid to grandma fixes um hello grandma it looks like that 500 transaction may have been fraudulent we will be taking that 500 back i thought that it was jj who indeed made that purchase though uh nope i didn't make that purchase whatsoever i think i've seen enough hello jj you get your money back oh very nice but who made that purchase from my account sounds kind of sus account takeovers happen when malicious actors get hold of credentials like login credentials of a person and proceed to masquerade as said person and this has another level of required fraud detection for the first two cases we were more concerned with fraud at the transaction level but for this account takeover case we need to be concerned with fraud at the login level too and this can be difficult for this video we will be looking only at transaction level fraud though so that's addressing mostly the first two cases and maybe take on account takeovers and these more complex cases in another video now incoming machine learning i feel like this is where most blog posts and tutorials for fraud detectors start but fraud isn't just about machine learning after all you need to think like a fraudster and understand how they behave if you want to fight against them i hope that intro helped paint the picture for fraud detection now we can think about the pieces of the machine learning pipeline with this fraud mindset so the first step here is defining the problem let's take the idea of fraud detection and define a concrete problem like i mentioned before we want to be able to catch bad actors when transactions are made so the input is some features about the user and their account the output would be a binary classification of fraudulent and not fraudulent now we need to build the data set in this way too so let's start with building the features to build features a good exercise is to open a google sheet and create three columns the first column being the feature the second being what your hunch is about this feature with fraudsters and the third is what the actual relationship is based on some exploratory data analysis let's walk through a few examples together so transactions are being made by a bad actor from their own account one potential feature could be how long has the account been active typically you would expect these accounts to be short-lived for the sole purpose of just getting lucky with fraud something else that may catch your eye is the number of successful purchases more the number of successful purchases could be indicative of slightly less fraudulent tendencies though this is not necessary and what about the time between sessions on grandma's platform shorter times between login attempts could be a little suspicious again although not necessarily once you have these ideas and hunches verify if your hunches are true with the eda process of course to do this you would also need to know what the labels look like so right now let's build the labels the labels for each transaction are either fraudulent or not fraudulent and we only know this label though if someone files a chargeback for that transaction so let's say 97 of chargebacks are filed within one month of a transaction occurring and you can verify this by just querying the data this means that you can take all the transactions up to a month ago that's up to like 30 days ago as your training data set since if they had been fraudulent you would have already seen a chargeback by now so overall things that we need to do is brainstorm the potential features for the fraud model verify if these features are useful by querying the data determine the time window you can comfortably say a chargeback occurs query all transactions that occurred up to that time window it's like until 30 days ago in our case and then get the corresponding labels for these transactions and your data set is ready now the next step is the model setup so a typical tendency of fraud data is imbalance we have way too many non-fraudulent transactions over the actual fraudulent transactions we could sample some of the non-fraudulent data and over-sample some of the fraudulent transactions so that the model learns something meaningful sometimes weighting the fraudulent examples more higher for your model may be useful you may have to play around with this though since it really depends on your data and your objective and the final step is evaluating the model so how good really is this fraud model so for fraud false negatives are bad we need to be able to call out fraud when it occurs but at the same time we also don't want to call out too many non-fraudulent examples as being fraudulent we can typically look at roc curves for a balance there are plots of true positives versus false positive rate ideally the graph should hug the top right corner um in some cases though like true positive rate and false positive rate may be a little too generic and we would want to make plots of more company specific metrics and that's all i have for you now hope this video paints a little more color to dealing with fraudulent data out there this is just the tip of the iceberg and remember fraud is fun once you know how to deal with it hope you enjoyed the video and until next time [Music] bye you

Original Description

SPONSOR Kite is a free AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite. Love it! Learn more: https://www.kite.com/get-kite/?utm_medium=referral&utm_source=youtube&utm_campaign=codeemporium&utm_content=description-only TIMESTAMPS 0:00 Introduction 0:53 Define Business 1:15 How do we know Fraud occurs? 2:14 Why Fraud Detection? 3:00 How does fraud Happen? 4:52 Friendly Fraud 6:26 Account Takeover 8:34 Define Problem from Machine Learning Standpoint 9:32 Building Dataset 11:56 Model Setup 12:26 Evaluation RESOURCES [1] Chargebacks: https://chargebacks911.com/chargebacks/ [2] Account Takeover: https://www.iovation.com/topics/account-takeover [3] Friendly Fraud: https://www.ethoca.com/payments-101-what-is-friendly-fraud

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from CodeEmporium · CodeEmporium · 56 of 60

← Previous Next →

Linear Regression and Multiple Regression

Linear Regression and Multiple Regression

Logistic Regression - THE MATH YOU SHOULD KNOW!

Logistic Regression - THE MATH YOU SHOULD KNOW!

Generative Adversarial Networks - FUTURISTIC & FUN AI !

Generative Adversarial Networks - FUTURISTIC & FUN AI !

Deep Learning on the Cloud - GPU TO LEARN FASTER

Deep Learning on the Cloud - GPU TO LEARN FASTER

Deep Mind's AlphaGo Zero - EXPLAINED

Deep Mind's AlphaGo Zero - EXPLAINED

Mask Region based Convolution Neural Networks - EXPLAINED!

Mask Region based Convolution Neural Networks - EXPLAINED!

Attention in Neural Networks

Attention in Neural Networks

Depthwise Separable Convolution - A FASTER CONVOLUTION!

Depthwise Separable Convolution - A FASTER CONVOLUTION!

One Neural network learns EVERYTHING ?!

One Neural network learns EVERYTHING ?!

Neural Voice Cloning

Neural Voice Cloning

AI creates Image Classifiers…by DRAWING?

AI creates Image Classifiers…by DRAWING?

Unpaired Image-Image Translation using CycleGANs

Unpaired Image-Image Translation using CycleGANs

K-Means Clustering - EXPLAINED!

K-Means Clustering - EXPLAINED!

Random Forest Classification

Random Forest Classification

Data Science in Finance

Data Science in Finance

Hypothesis testing with Applications in Data Science

Hypothesis testing with Applications in Data Science

A/B Testing - Simply Explained

A/B Testing - Simply Explained

The Kernel Trick - THE MATH YOU SHOULD KNOW!

The Kernel Trick - THE MATH YOU SHOULD KNOW!

Support Vector Machines - THE MATH YOU SHOULD KNOW

Support Vector Machines - THE MATH YOU SHOULD KNOW

Principal Component Analysis (PCA) - THE MATH YOU SHOULD KNOW!

Principal Component Analysis (PCA) - THE MATH YOU SHOULD KNOW!

History of Calculus - Animated

History of Calculus - Animated

Curiosity in AI

Curiosity in AI

DropBlock - A BETTER DROPOUT for Neural Networks

DropBlock - A BETTER DROPOUT for Neural Networks

Autoencoders - EXPLAINED

Autoencoders - EXPLAINED

Recurrent Neural Networks - EXPLAINED!

Recurrent Neural Networks - EXPLAINED!

LSTM Networks - EXPLAINED!

LSTM Networks - EXPLAINED!

Building an Image Captioner with Neural Networks

Building an Image Captioner with Neural Networks

10 Machine Learning Questions - ANSWERED!

10 Machine Learning Questions - ANSWERED!

How do neural networks work?

How do neural networks work?

Evolution of Face Generation | Evolution of GANs

Evolution of Face Generation | Evolution of GANs

How does Google Translate's AI work?

How does Google Translate's AI work?

How to keep up with AI research?

How to keep up with AI research?

How does YouTube recommend videos? - AI EXPLAINED!

How does YouTube recommend videos? - AI EXPLAINED!

Variational Autoencoders - EXPLAINED!

Variational Autoencoders - EXPLAINED!

Logistic Regression - VISUALIZED!

Logistic Regression - VISUALIZED!

Gradient Descent - THE MATH YOU SHOULD KNOW

Gradient Descent - THE MATH YOU SHOULD KNOW

Boosting - EXPLAINED!

Boosting - EXPLAINED!

Transformer Neural Networks - EXPLAINED! (Attention is all you need)

Transformer Neural Networks - EXPLAINED! (Attention is all you need)

Loss Functions - EXPLAINED!

Loss Functions - EXPLAINED!

Optimizers - EXPLAINED!

Optimizers - EXPLAINED!

NLP with Neural Networks & Transformers

NLP with Neural Networks & Transformers

Batch Normalization - EXPLAINED!

Batch Normalization - EXPLAINED!

Activation Functions - EXPLAINED!

Activation Functions - EXPLAINED!

Data Scientist Answers Interview Questions

Data Scientist Answers Interview Questions

Why use GPU with Neural Networks?

Why use GPU with Neural Networks?

How do GPUs speed up Neural Network training?

How do GPUs speed up Neural Network training?

BERT Neural Network - EXPLAINED!

BERT Neural Network - EXPLAINED!

ConvNets Scaled Efficiently

ConvNets Scaled Efficiently

Transformer Neural Net makes music! (JukeboxAI)

Transformer Neural Net makes music! (JukeboxAI)

What do filters of Convolution Neural Network learn?

What do filters of Convolution Neural Network learn?

We're hosting a Machine Learning Conference!

We're hosting a Machine Learning Conference!

MLconfEU 2020: Machine Learning Conference for Software Engineers

MLconfEU 2020: Machine Learning Conference for Software Engineers

Are Neural Networks Intelligent?

Are Neural Networks Intelligent?

Time Series Forecasting with Machine Learning

Time Series Forecasting with Machine Learning

Few Shot Learning - EXPLAINED!

Few Shot Learning - EXPLAINED!

How does a Data Scientist Fight FRAUD?

How does a Data Scientist Fight FRAUD?

How would a Data Scientist analyze Customer Churn?

How would a Data Scientist analyze Customer Churn?

Expectations with Machine Learning

Expectations with Machine Learning

Why Logistic Regression DOESN'T return probabilities?!

Why Logistic Regression DOESN'T return probabilities?!

How you SHOULD code Machine Learning

How you SHOULD code Machine Learning

This video teaches how data scientists use machine learning to fight fraud by understanding the problem, building datasets, setting up models, and evaluating their performance. It highlights the importance of fraud detection and prevention in business.

Key Takeaways

Define the business problem of fraud
Understand how fraud occurs
Build a dataset for fraud detection
Set up a machine learning model
Evaluate the model's performance

💡 Fraud detection is a critical application of machine learning that can help businesses prevent financial losses

🔒 Pro feature: Ask AI to explain this lesson →

More on: Supervised Learning

View skill →

Auto Machine Learning (AutoML) Using AutoGluon

Auto Machine Learning (AutoML) Using AutoGluon

Coding the SARIMA Model : Time Series Talk

Coding the SARIMA Model : Time Series Talk

Code With Me : Logistic Regression (from scratch) !

Code With Me : Logistic Regression (from scratch) !

Predicting the Winning Team with Machine Learning

Predicting the Winning Team with Machine Learning

Air Quality Index Prediction in Python | Machine Learning Projects | GeeksforGeeks

Air Quality Index Prediction in Python | Machine Learning Projects | GeeksforGeeks

What is K-Nearest Neighbors?

What is K-Nearest Neighbors?

Related Reads

How I Built an Anti-Fabrication Rule into Our CV Analysis Tool

Learn how to build an anti-fabrication rule into a CV analysis tool to prevent false positives and improve accuracy

Dev.to · Murtaza haider

Code Challenge of the Day — Reverse word order (easy)

Learn to reverse word order in a string with a simple coding challenge

Dev.to · I Want To Learn Programming

Kaggle Titanic: Improving Survival Prediction with Random Forest Age Imputation

Improve survival prediction in Kaggle's Titanic dataset by using RandomForestRegressor for age imputation, boosting CV score to 0.8519 and Kaggle Public Score to 0.78947

Dev.to · kito2718

Kaggle Titanic: Cabin Feature Engineering (Is It Really Effective?)

Learn to extract deck letters from the Cabin feature in Kaggle's Titanic competition and assess its effectiveness in predictive modeling

Dev.to · kito2718

Chapters (11)

Introduction

0:53 Define Business

1:15 How do we know Fraud occurs?

2:14 Why Fraud Detection?

3:00 How does fraud Happen?

4:52 Friendly Fraud

6:26 Account Takeover

8:34 Define Problem from Machine Learning Standpoint

9:32 Building Dataset

11:56 Model Setup

12:26 Evaluation

Is Python Dead in 2026?| Truth About Python in AI Era | 90 Days Roadmap @FameWorldEducationalHub

FAME WORLD EDUCATIONAL HUB