How to Simulate NBA Games in Python
Key Takeaways
The video demonstrates how to simulate NBA games using Python 3.6, specifically the 2017-2018 NBA Finals between Golden State and Cleveland, utilizing Monte Carlo simulation and NBA team game stats from a Kaggle data set. It showcases the use of libraries such as pandas, matplotlib, and numpy for data analysis and visualization.
Full Transcript
what's up Ken here from flying numbers today I'm showing you how to simulate NBA game outcomes using Python 3.6 more specifically I'll be using a Monte Carlo simulation for this analysis now I bet you're wondering what a simulation is in the most basic terms a simulation is randomly sampling from a distribution so if we randomly sample let's say from team points here enough times will actually just recreate that distribution somewhere else by itself that doesn't really tell us a whole lot but when we compare one distribution with another distribution it gives us a surplus so let's say we're comparing or randomly sampling from team points and opponent points we can determine on our samples what percent of the time team points is actually higher than opponent points and that tells us a little bit more information if we're adding another distribution to that or anything like that it continues to give us more information and the mathematical complexity of those problems increases if we're using simulation we don't have to worry about the mathematical complexity we just have to run the simulation more times and we get closer and closer to that limit so simulation is a great way to actually simplify problems and get really really good results regardless for this example I'm going to be using the NBA team game stats from 2014 to 2018 data set from Kaggle the link will be in the description below and in our analysis here I'm going to recreate the NBA Finals or actually simulate the NBA Finals from the 2017 2018 season where the Golden State defeated Cleveland now let's just jump straight into the code so we import these modules the most important ones here are going to be pandas and the random module from Python I don't actually use numpy in this video but it's used for the more advanced version of this of this code that I've written on my github playing numbers for that I show you you know the code is actually flexible enough to simulate all any of the team in the data set rather than just this one example I don't use great programming paradigms here but I'm doing this code in a certain way to illustrate a point now we also use matplotlib to actually visualize the histograms which is very very important in simulation so we're just going to read in the data there let's take a look at the columns that we're going to use so in this analysis we're only concerned with four columns the first being the team so I care about Golden State in Cleveland the next being the date we only want the 2017-2018 season because that's going to be most representative of what happened in the files or what we're trying to estimate what happened in class we're also looking at team points and opponent points so that's just the number of points a team scores as a distribution and the number of points that are scored against them as a distribution so right here we're going to break the data into two data frames one for Golden State and one for Cleveland this line right here I just use a lambda function to filter out all games that are not from the 2017-2018 season now let's take our first stab at looking at the actual total point distributions so in blue we can see Golden State the distribution of points and in orange we can see Cleveland's distribution of points it looks like Golden State has a slightly higher average point total per game than Cleveland does here but both of these distributions appear to be normally distributed which is exactly what we want in this type of simulation we look at points against and we see that same almost normal distribution and we also see that Cleveland it appears has slightly more points scored against them than Golden State so now we just tabulate those things into variables so we take the team point averages and before Cleveland and Golden State and save those in two variables we take the standard deviations which are also very important so whenever you make any type of normal distribution you really only need two the first being the mean and the second being the standard deviation so those are kind of the magic components for us actually running the simulations here we also look at the mean and standard deviations of the opponent points against so as you can see it looks like our quick analysis from the histogram is right Golden State in fact on average score slightly more points than Cleveland now just as an example before we get into the simulation code what we're going to be doing is randomly sampling again from a specific distribution so this Gaussian is a normal distribution with a mean of the total number of points that Golden's date and with the average number of points Golden State scores and the standard deviation of that distribution so if we run it enough times it should reach a limit of an average of 113 and that appears to be fairly close it might be a little lower until we run it a certain number of times and we get a very realistic distribution so with that thought in mind let's actually look at the first real component of our distribution and our simulation code so the game sim simulates just one game and the game Sim just runs game Sim over and over again and tabulates the results of the simulations it just keeps track of what happens in game set so for game Sim we want to simulate one the Golden State score to the cleveland score and then we compare them so when we submit when we simulate a score we take a sample from the random distribution of Golden State and we average that with the randoms distribution of the number of points that Cleveland allows so in my opinion that's a fairly good estimator because you're looking at how those teams specifically would match up in terms of how good at Golden State's offenses and how good Cleveland's defenses and we the exact opposite for Cleveland we look at how many points you know random sample of from their total points distribution and a random sample from Golden State's defensive distribution now we compare those two variables that we created and if Golden State wins this matchup we get a 1 if Cleveland wins the matchup we get minus 1 and if it's a tie which I know can't happen we get a 0 so I just built pause in here because then we won't have any holes in the data if you'd like to you can go forward and create some tiebreaker criteria now let's just run a couple example games so in that scenario Golden State one in that scenario Cleveland won so you can see that it's not just going to be one outcome over and over again now let's run a couple of actual game simulations so if we run this ten times it appears that Golden State won seven of those towns in Cleveland won three now let's run it a hundred times and the more we run it the closer to the limit we actually get so we got we're at 63 percent that's one a thousand we're at 55 ten thousand fifty five point nine one so it looks like we're rounding out right around 55 56 percent up so if we were to do this analysis in perpetuity we rented an infinite number of times it looks like Golden State would win between 55 and 57 percent of the time and that tells us a lot of information if we were interested in sports betting for example we might be able to evaluate if the line is good on a certain night based on what are calculated winning percentages you can also use this in fantasy sports we can use this for all other types of analysis but again specifically in sports simulation can be really really fun and interesting if you're so inclined you can definitely build on this model and add in you know more features you can look at specific positions or even at the shot level if you're really really interested in getting your hands dirty so hopefully this is a great starting place for a lot of pee people if you have any questions or comments please leave them in the section below and if you'd like me to keep producing compound or if you like videos like this please subscribe and I'll try and produce more interesting videos of this nature thank you so much again and have a great one
Original Description
In this video I show you how to simulate NBA Games using Python 3.6.
As an example I simulate the NBA Finals from the 2017-2018 season where Golden State played Cleveland.
Data: https://www.kaggle.com/ionaskel/nba-games-stats-from-2014-to-2018
Github: https://github.com/PlayingNumbers/NBASimulator
#DataScience #SportsAnalytics #Basketball #Simulation
#KenJee
⭕ Subscribe: https://www.youtube.com/c/kenjee1?sub_confirmation=1
🎙 Listen to My Podcast: https://www.youtube.com/c/KensNearestNeighborsPodcast
🕸 Check out My Website - https://kennethjee.com/
✍️Sign up for My Newsletter - https://www.kennethjee.com/newsletter
📚 Books and Products I use - https://www.amazon.com/shop/kenjee (affiliate link)
Partners & Affiliates
🌟 365 Data Science - Courses ( 57% Annual Discount): https://365datascience.pxf.io/P0jbBY
🌟 Interview Query - https://www.interviewquery.com/?ref=kenjee
MORE DATA SCIENCE CONTENT HERE:
🐤My Twitter - https://twitter.com/KenJee_DS
👔 LinkedIn - https://www.linkedin.com/in/kenjee/
📈 Kaggle - https://www.kaggle.com/kenjee
📑 Medium Articles - https://medium.com/@kenneth.b.jee
💻 Github - https://github.com/PlayingNumbers
🏀 My Sports Blog -https://www.playingnumbers.com
Check These Videos Out Next!
My Leaderboard Project: https://www.youtube.com/watch?v=myhoWUrSP7o&ab_channel=KenJee
66 Days of Data: https://www.youtube.com/watch?v=qV_AlRwhI3I&ab_channel=KenJee
How I Would Learn Data Science in 2021: https://www.youtube.com/watch?v=41Clrh6nv1s&ab_channel=KenJee
My Playlists
Data Science Beginners: https://www.youtube.com/playlist?list=PL2zq7klxX5ATMsmyRazei7ZXkP1GHt-vs
Project From Scratch: https://www.youtube.com/watch?v=MpF9HENQjDo&list=PL2zq7klxX5ASFejJj80ob9ZAnBHdz5O1t&ab_channel=KenJee
Kaggle Projects: https://www.youtube.com/playlist?list=PL2zq7klxX5AQXzNSLtc_LEKFPh2mAvHIO
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Ken Jee · Ken Jee · 5 of 60
1
2
3
4
▶
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Predicting Crypto-Currency Price Using RNN lSTM & GRU
Ken Jee
Predicting Season Long NBA Wins Using Multiple Linear Regression
Ken Jee
How I Became A Data Scientist From a Business Background
Ken Jee
Should You Get A Masters in Data Science?
Ken Jee
How to Simulate NBA Games in Python
Ken Jee
Demystifying Data Science Roles
Ken Jee
The Best Way to Predict NBA Minutes Played
Ken Jee
IT'S NOT TOO LATE TO LEARN CODE!
Ken Jee
My Top 5 Data Science Resources for 2019
Ken Jee
Watch This Before Applying to Data Science Jobs
Ken Jee
Where YOU Should Start With Data Science Projects
Ken Jee
Welcome To My Channel | Ken Jee | Data Science
Ken Jee
Why You DON'T Want to be a WFH Data Scientist
Ken Jee
Was Captain Marvel Bad? A Sentiment Analysis of Twitter Data
Ken Jee
Data Science, Machine Learning, and AI: What's the Difference?
Ken Jee
Data Science: Startup vs. Large Corporation
Ken Jee
Where to Look for Data Science Jobs
Ken Jee
Work From Home Data Scientist: Day in the Life
Ken Jee
Scrape Twitter Data in Python with Twitterscraper Module
Ken Jee
Should You Learn R for Data Science?
Ken Jee
NASA Physicist Turned Data Scientist (Tim Bowling) - KNN EP. 02
Ken Jee
I Wish I Had Known THIS Before Starting in Data Science
Ken Jee
What I Learned From My Three Degrees
Ken Jee
Most Data Science Hopefuls Overlook This Important Skill
Ken Jee
Golf STATS: Strokes Gained Explained
Ken Jee
My Top 5 Data Science Internship Tips
Ken Jee
How I Got My First Data Science Internship (And How You Can Land One)
Ken Jee
Data Science: Pros and Cons
Ken Jee
Data Science Fundamentals: Data Exploration in Python (Pandas)
Ken Jee
Data Science Fundamentals: Data Manipulation in Python (Pandas)
Ken Jee
What Does a Data Scientist Actually Do?
Ken Jee
The Projects You Should Do To Get A Data Science Job
Ken Jee
Take Your Data Science Projects From Good to Great
Ken Jee
How To Get Data Science Experience (Without a Job)
Ken Jee
Data Science Fundamentals: Data Cleaning in Python
Ken Jee
Is Data Science Right For You?
Ken Jee
Thank You For The Support | What's Next | Ken Jee | Data Science
Ken Jee
How To Build A Word Cloud From Scraped Data (Python)
Ken Jee
6 Habits of Successful Data Scientists
Ken Jee
How Far Should the NBA 3-Point Line Actually Be?
Ken Jee
How to Stay Productive & Motivated When Learning Data Science
Ken Jee
Why is Balance Important in Data Science?
Ken Jee
By The Numbers: Where Should The NBA Put a 4 Point Line?
Ken Jee
Why Selling Is An Important Data Science Skill
Ken Jee
Applying Data Science To My YouTube Data: My Surprising Findings
Ken Jee
9 Ways You Can Make Extra Income as a Data Scientist
Ken Jee
Sports Analytics 101: The Pythagorean Theorem of Sports
Ken Jee
Golf: Would You Rather Be the LONGEST or STRAIGHTEST Driver on the PGA Tour?
Ken Jee
Data Science Fundamentals: Linear Regression
Ken Jee
How YOU Can Land a Sports Analytics Job
Ken Jee
The 5 Stages of Data Science Adoption
Ken Jee
Math Needed for Mastering Data Science
Ken Jee
5 Sports Analytics Books to Get You Started
Ken Jee
3 Reasons You Should NOT Become a Data Scientist
Ken Jee
Collision Course: Sports Betting + Data Science
Ken Jee
How to Scrape NBA Data Using the nba_api Python Module
Ken Jee
5 Data Science Resolutions for 2020
Ken Jee
The Data Science Interview: What to Expect
Ken Jee
The 9 Books That Changed My Perspective in 2019
Ken Jee
Questions You Should Ask Your Data Science Interviewers
Ken Jee
More on: ML Maths Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
The AI Moat Paradox: The Better Models Become, the Less Models Matter
Medium · AI
170,927 AI Papers Reveal the Biggest Research Shifts of the First Half of 2026
Medium · Machine Learning
170,927 AI Papers Reveal the Biggest Research Shifts of the First Half of 2026
Medium · Data Science
[PoV] When Everyone Is Smart, No One Is
Medium · AI
🎓
Tutor Explanation
DeepCamp AI