How does a Data Scientist Fight FRAUD?

CodeEmporium · Beginner ·📐 ML Fundamentals ·5y ago

Key Takeaways

Data scientists use machine learning fundamentals to detect and prevent fraud by building datasets, setting up models, and evaluating their performance, leveraging tools like Kite for smarter coding.

Full Transcript

there are a bunch of kaggle notebooks and blogs online that take a credit card detection data set probably run this through some standard machine learning process and give out some performance with a generic metric but there are many nuances to dealing with fraud from thinking about potential features to how we can report results my goal here is to add some color to fraud detection and prevention with machine learning fraud is fun you just need to know how to deal with it but before we continue this video is sponsored partially by kite they provide a code completion service for machine learning code it integrates super well with your editors and even jupiter notebooks so click the link in the description to try kite for free now back to the video so let's start with a little base example here think about grandma she runs a laptop repair line where people place a work order online ship the laptop to a warehouse her workers repair the laptop they send it back we'll be using this business as an example throughout the video now let's first ask some basic questions how do we know that fraud has occurred it's through chargebacks now let's see what that is this is jj hey bank i don't recognize this 500 transaction that i paid to grandma fixes hello grandma it looks like jj's transaction may have been fraudulent we'll be taking the 500 back oh i see it is possible no dispute here hello jj your money is now back in your account oh nice people can file for chargebacks if they don't recognize a transaction on their phone this allows banks to forcefully reverse a transaction so another question why do we need fraud detection chargebacks do nothing to a merchant in the best case but they typically incur losses merchants can dispute the chargeback if they are confident it wasn't fraud so perhaps setting up a fraud detection system can prevent malicious users from making this transaction in the first place now chargebacks can be a pain to deal with sometimes people don't file chargebacks until months after a fraud transaction occurs they do their taxes and then they realize that they don't recognize a 500 transaction from six months prior hassle for customers big hassle for merchants so fraud detection can mitigate this now the main question how does fraud happen let's paint a few different scenarios the first being malicious actor malcolm malcolm starts by creating an account on grandma fixes hello grandma can i get a work order for a laptop why sure thing sweet pea that's 500. you can take it off my card wink wink why thank you malcolm here is your laptop fixed and good as new yay one week later enter jj uh hey bank i don't recognize this 500 transaction that i paid to grandma fixes hello grandma it looks like that 500 transaction may have been fraudulent we will be taking the 500 back i see it is possible this malcolm was winking a lot no disputes here i see hello jj your money is now bank in your account rejoice ah rejoice i will thank you this kind of fraud is harmful malcolm created an account on grandma's website and made a fraudulent transaction with malicious intent jj had to deal with the hassle grandma had to deal with the hassle and the loss and malcolm got a free work order in we ideally want a system that blocks malcolm's transactions but not all fraud happens this way incoming friendly fraud uh hey grandma can i get a work order for a laptop why sure thing sweet pea that's 300. you can take it off my card why thank you and here's your fixed laptop oh thank you so much six months later [Music] uh hey bank i don't recognize this 300 i paid to grandma fixes hello grandma it looks like the 300 transaction may have been a fraudulent one we will be taking the 300 back but i remember this young girl jj though from here grandma could file a dispute claiming the transaction was legit or just not deal with the hassle and jj gets her money back in this scenario the transaction was legit but it's being flagged as fraudulent because jj forgot that she made the transaction friendly fraud is harder to predict since there's no suspicious activity the situation isn't good for anyone though even though jj walked away with a free work order grandma is going to be extra cautious about jj in the future especially since this third scenario could have occurred too let's get to that third scenario account takeover malcolm starts by logging into jj's account hello i mean hey grandma can i get a work order from a laptop by the way i'm jj oh sure thing jj what a sweet little girl that's five hundred dollars oh oh take it off my card wink wink yes ma'am and here's your fix laptop jj yay thank you grandma i appreciate it one week later uh hey bank i don't recognize this 500 that i paid to grandma fixes um hello grandma it looks like that 500 transaction may have been fraudulent we will be taking that 500 back i thought that it was jj who indeed made that purchase though uh nope i didn't make that purchase whatsoever i think i've seen enough hello jj you get your money back oh very nice but who made that purchase from my account sounds kind of sus account takeovers happen when malicious actors get hold of credentials like login credentials of a person and proceed to masquerade as said person and this has another level of required fraud detection for the first two cases we were more concerned with fraud at the transaction level but for this account takeover case we need to be concerned with fraud at the login level too and this can be difficult for this video we will be looking only at transaction level fraud though so that's addressing mostly the first two cases and maybe take on account takeovers and these more complex cases in another video now incoming machine learning i feel like this is where most blog posts and tutorials for fraud detectors start but fraud isn't just about machine learning after all you need to think like a fraudster and understand how they behave if you want to fight against them i hope that intro helped paint the picture for fraud detection now we can think about the pieces of the machine learning pipeline with this fraud mindset so the first step here is defining the problem let's take the idea of fraud detection and define a concrete problem like i mentioned before we want to be able to catch bad actors when transactions are made so the input is some features about the user and their account the output would be a binary classification of fraudulent and not fraudulent now we need to build the data set in this way too so let's start with building the features to build features a good exercise is to open a google sheet and create three columns the first column being the feature the second being what your hunch is about this feature with fraudsters and the third is what the actual relationship is based on some exploratory data analysis let's walk through a few examples together so transactions are being made by a bad actor from their own account one potential feature could be how long has the account been active typically you would expect these accounts to be short-lived for the sole purpose of just getting lucky with fraud something else that may catch your eye is the number of successful purchases more the number of successful purchases could be indicative of slightly less fraudulent tendencies though this is not necessary and what about the time between sessions on grandma's platform shorter times between login attempts could be a little suspicious again although not necessarily once you have these ideas and hunches verify if your hunches are true with the eda process of course to do this you would also need to know what the labels look like so right now let's build the labels the labels for each transaction are either fraudulent or not fraudulent and we only know this label though if someone files a chargeback for that transaction so let's say 97 of chargebacks are filed within one month of a transaction occurring and you can verify this by just querying the data this means that you can take all the transactions up to a month ago that's up to like 30 days ago as your training data set since if they had been fraudulent you would have already seen a chargeback by now so overall things that we need to do is brainstorm the potential features for the fraud model verify if these features are useful by querying the data determine the time window you can comfortably say a chargeback occurs query all transactions that occurred up to that time window it's like until 30 days ago in our case and then get the corresponding labels for these transactions and your data set is ready now the next step is the model setup so a typical tendency of fraud data is imbalance we have way too many non-fraudulent transactions over the actual fraudulent transactions we could sample some of the non-fraudulent data and over-sample some of the fraudulent transactions so that the model learns something meaningful sometimes weighting the fraudulent examples more higher for your model may be useful you may have to play around with this though since it really depends on your data and your objective and the final step is evaluating the model so how good really is this fraud model so for fraud false negatives are bad we need to be able to call out fraud when it occurs but at the same time we also don't want to call out too many non-fraudulent examples as being fraudulent we can typically look at roc curves for a balance there are plots of true positives versus false positive rate ideally the graph should hug the top right corner um in some cases though like true positive rate and false positive rate may be a little too generic and we would want to make plots of more company specific metrics and that's all i have for you now hope this video paints a little more color to dealing with fraudulent data out there this is just the tip of the iceberg and remember fraud is fun once you know how to deal with it hope you enjoyed the video and until next time [Music] bye you

Original Description

SPONSOR Kite is a free AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite. Love it! Learn more: https://www.kite.com/get-kite/?utm_medium=referral&utm_source=youtube&utm_campaign=codeemporium&utm_content=description-only TIMESTAMPS 0:00 Introduction 0:53 Define Business 1:15 How do we know Fraud occurs? 2:14 Why Fraud Detection? 3:00 How does fraud Happen? 4:52 Friendly Fraud 6:26 Account Takeover 8:34 Define Problem from Machine Learning Standpoint 9:32 Building Dataset 11:56 Model Setup 12:26 Evaluation RESOURCES [1] Chargebacks: https://chargebacks911.com/chargebacks/ [2] Account Takeover: https://www.iovation.com/topics/account-takeover [3] Friendly Fraud: https://www.ethoca.com/payments-101-what-is-friendly-fraud
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from CodeEmporium · CodeEmporium · 56 of 60

1 Linear Regression and Multiple Regression
Linear Regression and Multiple Regression
CodeEmporium
2 Logistic Regression - THE MATH YOU SHOULD KNOW!
Logistic Regression - THE MATH YOU SHOULD KNOW!
CodeEmporium
3 Generative Adversarial Networks - FUTURISTIC & FUN AI !
Generative Adversarial Networks - FUTURISTIC & FUN AI !
CodeEmporium
4 Deep Learning on the Cloud - GPU TO LEARN FASTER
Deep Learning on the Cloud - GPU TO LEARN FASTER
CodeEmporium
5 Deep Mind's AlphaGo Zero - EXPLAINED
Deep Mind's AlphaGo Zero - EXPLAINED
CodeEmporium
6 Mask Region based Convolution Neural Networks - EXPLAINED!
Mask Region based Convolution Neural Networks - EXPLAINED!
CodeEmporium
7 Attention in Neural Networks
Attention in Neural Networks
CodeEmporium
8 Depthwise Separable Convolution - A FASTER CONVOLUTION!
Depthwise Separable Convolution - A FASTER CONVOLUTION!
CodeEmporium
9 One Neural network learns EVERYTHING ?!
One Neural network learns EVERYTHING ?!
CodeEmporium
10 Neural Voice Cloning
Neural Voice Cloning
CodeEmporium
11 AI creates Image Classifiers…by DRAWING?
AI creates Image Classifiers…by DRAWING?
CodeEmporium
12 Unpaired Image-Image Translation using CycleGANs
Unpaired Image-Image Translation using CycleGANs
CodeEmporium
13 K-Means Clustering - EXPLAINED!
K-Means Clustering - EXPLAINED!
CodeEmporium
14 Random Forest Classification
Random Forest Classification
CodeEmporium
15 Data Science in Finance
Data Science in Finance
CodeEmporium
16 Hypothesis testing with Applications in Data Science
Hypothesis testing with Applications in Data Science
CodeEmporium
17 A/B Testing - Simply Explained
A/B Testing - Simply Explained
CodeEmporium
18 The Kernel Trick - THE MATH YOU SHOULD KNOW!
The Kernel Trick - THE MATH YOU SHOULD KNOW!
CodeEmporium
19 Support Vector Machines - THE MATH YOU  SHOULD KNOW
Support Vector Machines - THE MATH YOU SHOULD KNOW
CodeEmporium
20 Principal Component Analysis (PCA) - THE MATH YOU SHOULD KNOW!
Principal Component Analysis (PCA) - THE MATH YOU SHOULD KNOW!
CodeEmporium
21 History of Calculus - Animated
History of Calculus - Animated
CodeEmporium
22 Curiosity in AI
Curiosity in AI
CodeEmporium
23 DropBlock - A BETTER DROPOUT for Neural Networks
DropBlock - A BETTER DROPOUT for Neural Networks
CodeEmporium
24 Autoencoders - EXPLAINED
Autoencoders - EXPLAINED
CodeEmporium
25 Recurrent Neural Networks - EXPLAINED!
Recurrent Neural Networks - EXPLAINED!
CodeEmporium
26 LSTM Networks - EXPLAINED!
LSTM Networks - EXPLAINED!
CodeEmporium
27 Building an Image Captioner with Neural Networks
Building an Image Captioner with Neural Networks
CodeEmporium
28 10 Machine Learning Questions - ANSWERED!
10 Machine Learning Questions - ANSWERED!
CodeEmporium
29 How do neural networks work?
How do neural networks work?
CodeEmporium
30 Evolution of Face Generation |  Evolution of GANs
Evolution of Face Generation | Evolution of GANs
CodeEmporium
31 How does Google Translate's AI work?
How does Google Translate's AI work?
CodeEmporium
32 How to keep up with AI research?
How to keep up with AI research?
CodeEmporium
33 How does YouTube recommend videos? - AI EXPLAINED!
How does YouTube recommend videos? - AI EXPLAINED!
CodeEmporium
34 Variational Autoencoders - EXPLAINED!
Variational Autoencoders - EXPLAINED!
CodeEmporium
35 Logistic Regression - VISUALIZED!
Logistic Regression - VISUALIZED!
CodeEmporium
36 Gradient Descent - THE MATH YOU SHOULD KNOW
Gradient Descent - THE MATH YOU SHOULD KNOW
CodeEmporium
37 Boosting - EXPLAINED!
Boosting - EXPLAINED!
CodeEmporium
38 Transformer Neural Networks - EXPLAINED! (Attention is all you need)
Transformer Neural Networks - EXPLAINED! (Attention is all you need)
CodeEmporium
39 Loss Functions - EXPLAINED!
Loss Functions - EXPLAINED!
CodeEmporium
40 Optimizers - EXPLAINED!
Optimizers - EXPLAINED!
CodeEmporium
41 NLP with Neural Networks & Transformers
NLP with Neural Networks & Transformers
CodeEmporium
42 Batch Normalization - EXPLAINED!
Batch Normalization - EXPLAINED!
CodeEmporium
43 Activation Functions - EXPLAINED!
Activation Functions - EXPLAINED!
CodeEmporium
44 Data Scientist Answers Interview Questions
Data Scientist Answers Interview Questions
CodeEmporium
45 Why use GPU with Neural Networks?
Why use GPU with Neural Networks?
CodeEmporium
46 How do GPUs speed up Neural Network training?
How do GPUs speed up Neural Network training?
CodeEmporium
47 BERT Neural Network - EXPLAINED!
BERT Neural Network - EXPLAINED!
CodeEmporium
48 ConvNets Scaled Efficiently
ConvNets Scaled Efficiently
CodeEmporium
49 Transformer Neural Net makes music! (JukeboxAI)
Transformer Neural Net makes music! (JukeboxAI)
CodeEmporium
50 What do filters of Convolution Neural Network learn?
What do filters of Convolution Neural Network learn?
CodeEmporium
51 We're hosting a Machine Learning Conference!
We're hosting a Machine Learning Conference!
CodeEmporium
52 MLconfEU 2020: Machine Learning Conference for Software Engineers
MLconfEU 2020: Machine Learning Conference for Software Engineers
CodeEmporium
53 Are Neural Networks Intelligent?
Are Neural Networks Intelligent?
CodeEmporium
54 Time Series Forecasting with Machine Learning
Time Series Forecasting with Machine Learning
CodeEmporium
55 Few Shot Learning - EXPLAINED!
Few Shot Learning - EXPLAINED!
CodeEmporium
How does a Data Scientist Fight FRAUD?
How does a Data Scientist Fight FRAUD?
CodeEmporium
57 How would a Data Scientist analyze Customer Churn?
How would a Data Scientist analyze Customer Churn?
CodeEmporium
58 Expectations with Machine Learning
Expectations with Machine Learning
CodeEmporium
59 Why Logistic Regression DOESN'T return probabilities?!
Why Logistic Regression DOESN'T return probabilities?!
CodeEmporium
60 How you SHOULD code Machine Learning
How you SHOULD code Machine Learning
CodeEmporium

This video teaches how data scientists use machine learning to fight fraud by understanding the problem, building datasets, setting up models, and evaluating their performance. It highlights the importance of fraud detection and prevention in business.

Key Takeaways
  1. Define the business problem of fraud
  2. Understand how fraud occurs
  3. Build a dataset for fraud detection
  4. Set up a machine learning model
  5. Evaluate the model's performance
💡 Fraud detection is a critical application of machine learning that can help businesses prevent financial losses

Related Reads

📰
How I Built an Anti-Fabrication Rule into Our CV Analysis Tool
Learn how to build an anti-fabrication rule into a CV analysis tool to prevent false positives and improve accuracy
Dev.to · Murtaza haider
📰
Code Challenge of the Day — Reverse word order (easy)
Learn to reverse word order in a string with a simple coding challenge
Dev.to · I Want To Learn Programming
📰
Kaggle Titanic: Improving Survival Prediction with Random Forest Age Imputation
Improve survival prediction in Kaggle's Titanic dataset by using RandomForestRegressor for age imputation, boosting CV score to 0.8519 and Kaggle Public Score to 0.78947
Dev.to · kito2718
📰
Kaggle Titanic: Cabin Feature Engineering (Is It Really Effective?)
Learn to extract deck letters from the Cabin feature in Kaggle's Titanic competition and assess its effectiveness in predictive modeling
Dev.to · kito2718

Chapters (11)

Introduction
0:53 Define Business
1:15 How do we know Fraud occurs?
2:14 Why Fraud Detection?
3:00 How does fraud Happen?
4:52 Friendly Fraud
6:26 Account Takeover
8:34 Define Problem from Machine Learning Standpoint
9:32 Building Dataset
11:56 Model Setup
12:26 Evaluation
Up next
Is Python Dead in 2026?| Truth About Python in AI Era | 90 Days Roadmap @FameWorldEducationalHub
FAME WORLD EDUCATIONAL HUB
Watch →