How does a Data Scientist Fight FRAUD?
Key Takeaways
Data scientists use machine learning fundamentals to detect and prevent fraud by building datasets, setting up models, and evaluating their performance, leveraging tools like Kite for smarter coding.
Full Transcript
there are a bunch of kaggle notebooks and blogs online that take a credit card detection data set probably run this through some standard machine learning process and give out some performance with a generic metric but there are many nuances to dealing with fraud from thinking about potential features to how we can report results my goal here is to add some color to fraud detection and prevention with machine learning fraud is fun you just need to know how to deal with it but before we continue this video is sponsored partially by kite they provide a code completion service for machine learning code it integrates super well with your editors and even jupiter notebooks so click the link in the description to try kite for free now back to the video so let's start with a little base example here think about grandma she runs a laptop repair line where people place a work order online ship the laptop to a warehouse her workers repair the laptop they send it back we'll be using this business as an example throughout the video now let's first ask some basic questions how do we know that fraud has occurred it's through chargebacks now let's see what that is this is jj hey bank i don't recognize this 500 transaction that i paid to grandma fixes hello grandma it looks like jj's transaction may have been fraudulent we'll be taking the 500 back oh i see it is possible no dispute here hello jj your money is now back in your account oh nice people can file for chargebacks if they don't recognize a transaction on their phone this allows banks to forcefully reverse a transaction so another question why do we need fraud detection chargebacks do nothing to a merchant in the best case but they typically incur losses merchants can dispute the chargeback if they are confident it wasn't fraud so perhaps setting up a fraud detection system can prevent malicious users from making this transaction in the first place now chargebacks can be a pain to deal with sometimes people don't file chargebacks until months after a fraud transaction occurs they do their taxes and then they realize that they don't recognize a 500 transaction from six months prior hassle for customers big hassle for merchants so fraud detection can mitigate this now the main question how does fraud happen let's paint a few different scenarios the first being malicious actor malcolm malcolm starts by creating an account on grandma fixes hello grandma can i get a work order for a laptop why sure thing sweet pea that's 500. you can take it off my card wink wink why thank you malcolm here is your laptop fixed and good as new yay one week later enter jj uh hey bank i don't recognize this 500 transaction that i paid to grandma fixes hello grandma it looks like that 500 transaction may have been fraudulent we will be taking the 500 back i see it is possible this malcolm was winking a lot no disputes here i see hello jj your money is now bank in your account rejoice ah rejoice i will thank you this kind of fraud is harmful malcolm created an account on grandma's website and made a fraudulent transaction with malicious intent jj had to deal with the hassle grandma had to deal with the hassle and the loss and malcolm got a free work order in we ideally want a system that blocks malcolm's transactions but not all fraud happens this way incoming friendly fraud uh hey grandma can i get a work order for a laptop why sure thing sweet pea that's 300. you can take it off my card why thank you and here's your fixed laptop oh thank you so much six months later [Music] uh hey bank i don't recognize this 300 i paid to grandma fixes hello grandma it looks like the 300 transaction may have been a fraudulent one we will be taking the 300 back but i remember this young girl jj though from here grandma could file a dispute claiming the transaction was legit or just not deal with the hassle and jj gets her money back in this scenario the transaction was legit but it's being flagged as fraudulent because jj forgot that she made the transaction friendly fraud is harder to predict since there's no suspicious activity the situation isn't good for anyone though even though jj walked away with a free work order grandma is going to be extra cautious about jj in the future especially since this third scenario could have occurred too let's get to that third scenario account takeover malcolm starts by logging into jj's account hello i mean hey grandma can i get a work order from a laptop by the way i'm jj oh sure thing jj what a sweet little girl that's five hundred dollars oh oh take it off my card wink wink yes ma'am and here's your fix laptop jj yay thank you grandma i appreciate it one week later uh hey bank i don't recognize this 500 that i paid to grandma fixes um hello grandma it looks like that 500 transaction may have been fraudulent we will be taking that 500 back i thought that it was jj who indeed made that purchase though uh nope i didn't make that purchase whatsoever i think i've seen enough hello jj you get your money back oh very nice but who made that purchase from my account sounds kind of sus account takeovers happen when malicious actors get hold of credentials like login credentials of a person and proceed to masquerade as said person and this has another level of required fraud detection for the first two cases we were more concerned with fraud at the transaction level but for this account takeover case we need to be concerned with fraud at the login level too and this can be difficult for this video we will be looking only at transaction level fraud though so that's addressing mostly the first two cases and maybe take on account takeovers and these more complex cases in another video now incoming machine learning i feel like this is where most blog posts and tutorials for fraud detectors start but fraud isn't just about machine learning after all you need to think like a fraudster and understand how they behave if you want to fight against them i hope that intro helped paint the picture for fraud detection now we can think about the pieces of the machine learning pipeline with this fraud mindset so the first step here is defining the problem let's take the idea of fraud detection and define a concrete problem like i mentioned before we want to be able to catch bad actors when transactions are made so the input is some features about the user and their account the output would be a binary classification of fraudulent and not fraudulent now we need to build the data set in this way too so let's start with building the features to build features a good exercise is to open a google sheet and create three columns the first column being the feature the second being what your hunch is about this feature with fraudsters and the third is what the actual relationship is based on some exploratory data analysis let's walk through a few examples together so transactions are being made by a bad actor from their own account one potential feature could be how long has the account been active typically you would expect these accounts to be short-lived for the sole purpose of just getting lucky with fraud something else that may catch your eye is the number of successful purchases more the number of successful purchases could be indicative of slightly less fraudulent tendencies though this is not necessary and what about the time between sessions on grandma's platform shorter times between login attempts could be a little suspicious again although not necessarily once you have these ideas and hunches verify if your hunches are true with the eda process of course to do this you would also need to know what the labels look like so right now let's build the labels the labels for each transaction are either fraudulent or not fraudulent and we only know this label though if someone files a chargeback for that transaction so let's say 97 of chargebacks are filed within one month of a transaction occurring and you can verify this by just querying the data this means that you can take all the transactions up to a month ago that's up to like 30 days ago as your training data set since if they had been fraudulent you would have already seen a chargeback by now so overall things that we need to do is brainstorm the potential features for the fraud model verify if these features are useful by querying the data determine the time window you can comfortably say a chargeback occurs query all transactions that occurred up to that time window it's like until 30 days ago in our case and then get the corresponding labels for these transactions and your data set is ready now the next step is the model setup so a typical tendency of fraud data is imbalance we have way too many non-fraudulent transactions over the actual fraudulent transactions we could sample some of the non-fraudulent data and over-sample some of the fraudulent transactions so that the model learns something meaningful sometimes weighting the fraudulent examples more higher for your model may be useful you may have to play around with this though since it really depends on your data and your objective and the final step is evaluating the model so how good really is this fraud model so for fraud false negatives are bad we need to be able to call out fraud when it occurs but at the same time we also don't want to call out too many non-fraudulent examples as being fraudulent we can typically look at roc curves for a balance there are plots of true positives versus false positive rate ideally the graph should hug the top right corner um in some cases though like true positive rate and false positive rate may be a little too generic and we would want to make plots of more company specific metrics and that's all i have for you now hope this video paints a little more color to dealing with fraudulent data out there this is just the tip of the iceberg and remember fraud is fun once you know how to deal with it hope you enjoyed the video and until next time [Music] bye you
Original Description
SPONSOR
Kite is a free AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite. Love it!
Learn more: https://www.kite.com/get-kite/?utm_medium=referral&utm_source=youtube&utm_campaign=codeemporium&utm_content=description-only
TIMESTAMPS
0:00 Introduction
0:53 Define Business
1:15 How do we know Fraud occurs?
2:14 Why Fraud Detection?
3:00 How does fraud Happen?
4:52 Friendly Fraud
6:26 Account Takeover
8:34 Define Problem from Machine Learning Standpoint
9:32 Building Dataset
11:56 Model Setup
12:26 Evaluation
RESOURCES
[1] Chargebacks: https://chargebacks911.com/chargebacks/
[2] Account Takeover: https://www.iovation.com/topics/account-takeover
[3] Friendly Fraud: https://www.ethoca.com/payments-101-what-is-friendly-fraud
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from CodeEmporium · CodeEmporium · 56 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
▶
57
58
59
60
Linear Regression and Multiple Regression
CodeEmporium
Logistic Regression - THE MATH YOU SHOULD KNOW!
CodeEmporium
Generative Adversarial Networks - FUTURISTIC & FUN AI !
CodeEmporium
Deep Learning on the Cloud - GPU TO LEARN FASTER
CodeEmporium
Deep Mind's AlphaGo Zero - EXPLAINED
CodeEmporium
Mask Region based Convolution Neural Networks - EXPLAINED!
CodeEmporium
Attention in Neural Networks
CodeEmporium
Depthwise Separable Convolution - A FASTER CONVOLUTION!
CodeEmporium
One Neural network learns EVERYTHING ?!
CodeEmporium
Neural Voice Cloning
CodeEmporium
AI creates Image Classifiers…by DRAWING?
CodeEmporium
Unpaired Image-Image Translation using CycleGANs
CodeEmporium
K-Means Clustering - EXPLAINED!
CodeEmporium
Random Forest Classification
CodeEmporium
Data Science in Finance
CodeEmporium
Hypothesis testing with Applications in Data Science
CodeEmporium
A/B Testing - Simply Explained
CodeEmporium
The Kernel Trick - THE MATH YOU SHOULD KNOW!
CodeEmporium
Support Vector Machines - THE MATH YOU SHOULD KNOW
CodeEmporium
Principal Component Analysis (PCA) - THE MATH YOU SHOULD KNOW!
CodeEmporium
History of Calculus - Animated
CodeEmporium
Curiosity in AI
CodeEmporium
DropBlock - A BETTER DROPOUT for Neural Networks
CodeEmporium
Autoencoders - EXPLAINED
CodeEmporium
Recurrent Neural Networks - EXPLAINED!
CodeEmporium
LSTM Networks - EXPLAINED!
CodeEmporium
Building an Image Captioner with Neural Networks
CodeEmporium
10 Machine Learning Questions - ANSWERED!
CodeEmporium
How do neural networks work?
CodeEmporium
Evolution of Face Generation | Evolution of GANs
CodeEmporium
How does Google Translate's AI work?
CodeEmporium
How to keep up with AI research?
CodeEmporium
How does YouTube recommend videos? - AI EXPLAINED!
CodeEmporium
Variational Autoencoders - EXPLAINED!
CodeEmporium
Logistic Regression - VISUALIZED!
CodeEmporium
Gradient Descent - THE MATH YOU SHOULD KNOW
CodeEmporium
Boosting - EXPLAINED!
CodeEmporium
Transformer Neural Networks - EXPLAINED! (Attention is all you need)
CodeEmporium
Loss Functions - EXPLAINED!
CodeEmporium
Optimizers - EXPLAINED!
CodeEmporium
NLP with Neural Networks & Transformers
CodeEmporium
Batch Normalization - EXPLAINED!
CodeEmporium
Activation Functions - EXPLAINED!
CodeEmporium
Data Scientist Answers Interview Questions
CodeEmporium
Why use GPU with Neural Networks?
CodeEmporium
How do GPUs speed up Neural Network training?
CodeEmporium
BERT Neural Network - EXPLAINED!
CodeEmporium
ConvNets Scaled Efficiently
CodeEmporium
Transformer Neural Net makes music! (JukeboxAI)
CodeEmporium
What do filters of Convolution Neural Network learn?
CodeEmporium
We're hosting a Machine Learning Conference!
CodeEmporium
MLconfEU 2020: Machine Learning Conference for Software Engineers
CodeEmporium
Are Neural Networks Intelligent?
CodeEmporium
Time Series Forecasting with Machine Learning
CodeEmporium
Few Shot Learning - EXPLAINED!
CodeEmporium
How does a Data Scientist Fight FRAUD?
CodeEmporium
How would a Data Scientist analyze Customer Churn?
CodeEmporium
Expectations with Machine Learning
CodeEmporium
Why Logistic Regression DOESN'T return probabilities?!
CodeEmporium
How you SHOULD code Machine Learning
CodeEmporium
More on: Supervised Learning
View skill →Related Reads
📰
📰
📰
📰
How I Built an Anti-Fabrication Rule into Our CV Analysis Tool
Dev.to · Murtaza haider
Code Challenge of the Day — Reverse word order (easy)
Dev.to · I Want To Learn Programming
Kaggle Titanic: Improving Survival Prediction with Random Forest Age Imputation
Dev.to · kito2718
Kaggle Titanic: Cabin Feature Engineering (Is It Really Effective?)
Dev.to · kito2718
Chapters (11)
Introduction
0:53
Define Business
1:15
How do we know Fraud occurs?
2:14
Why Fraud Detection?
3:00
How does fraud Happen?
4:52
Friendly Fraud
6:26
Account Takeover
8:34
Define Problem from Machine Learning Standpoint
9:32
Building Dataset
11:56
Model Setup
12:26
Evaluation
🎓
Tutor Explanation
DeepCamp AI