Pareto, Power Laws, and Fat Tails

Shaw Talebi · Beginner ·🔢 Mathematical Foundations ·2y ago

Key Takeaways

The video discusses Power Laws, Pareto's principle, and Fat Tails, highlighting the limitations of traditional statistical methods in analyzing these distributions. It covers the characteristics of Power Law distributions, the issues with using mean and variance, and the importance of considering rare events and their impact on statistics.

Full Transcript

statistics is the Bedrock of Science and data analysis this is why we all learn about it in some form or fashion in school however many of our favorite statistical techniques are completely useless when applied to a certain type of data this specific type of data are called Power laws in this video I'll be giving a beginner friendly introduction to power laws and describe three problems that come up when trying to apply our standard statistical tools to analyze them if you're new to the channel I'm Shaw I make content about data science and Entrepreneurship and if you enjoyed this video please consider subscribing that's a great no cost way to support me in all the videos that I make and with that let's get into it so the official title of this talk is Paro power laws and fat tales what they don't teach you in statistics we'll start with the background information I'll talk about the gaussian distribution Fredo's 8020 rule introduce the power lock class class and describe the difference between weight and wealth then I'll move on to three big problems when trying to use traditional statistical approaches to analyze data following a power law distribution and then finally I will introduce the idea of fat tails which generalizes a key property of these power law distributions so many quantities in nature tend to Clump around a typical value one example of this is if you go to a busy coffee shop and measure the weights of all the customers coming in and out of the coffee shop you would eventually observe a pattern like the one shown here so in other words the weights would tend to Clump around some typical value and then Decay rapidly toward these Tails this is a distribution that most people are familiar with it's called a gaussian distribution also called a bell curve and the great thing about data that follows a gausian distribution is that we can capture a lot of the essential information of the underlying data with just a single number which is the mean and you can go even further and capture how spread out this distribution is via measures like the standard deviation and so these concepts of a gaussian the mean the standard deviation variance Etc these are all Concepts that people will learn in an introductory statistics course or a business statistics course and indeed these are powerful techniques for analyzing ing data solving problems and making decisions however not all data that we care about follows a distribution like a gaussian and a great example of this comes from the work of vredo paredo and so many people have probably heard of paro's principle or the 8020 Rule and typically how this is quoted is that 80% of sales come from 20% of customers however this idea did not originate from the business world or sales and marketing it actually originated from the work of an Italian economist IST and mathematician vredo paredo in his study of Italian land ownership where he found that about 80% of the land in Italy was owned by about 20% of the citizens this simple observation is indicative of Statistics that are very different from the gaussian distribution that we saw in the coffee shop and so what this 8020 rule or Paro principle implies is that the underlying data follows a Paro distribution which looks like this this just qualitatively this looks very different than the gaussian distribution from the previous slide and the biggest difference here is that there's no typical value around which the data is clumped so in the case of a gaussian the mean is very representative of the overall distribution however when looking at a Paro distribution the mean doesn't give you a whole lot of information so in this case the mean is going to be somewhere around here which doesn't tell you much about a lot of the dat data that's living in the so-called tail over here putting this another way while knowing the average weight of an Italian man gives you a good idea of what to expect on your next trip to Rome knowing the average population of an Italian city which is about 7500 is completely useless in grounding your expectations and the reason for this is that weight tends to follow a giian distribution while city populations tend to follow a parade of distribution so the parade distribution is actually part of a broader class of distributions called Power laws and so here are a few different Power laws in red we actually see a power law matching this 8020 rule like we saw in the previous slide making this a bit more General a power law is defined by this equation here so PDF is the probability density function X is a random variable Little X is some specific value of that random variable L of X is some slowly varying function and then Alpha is is just some number which defines the shape of the distribution here and another important note is that power laws are only defined Beyond a minimum value so in these plots here the minimum value is one but this value could be anything these two types of distributions the gaussian like distributions and now these like parol like power law distributions they give us these two conceptual anchors by which we can qualitatively categorize data that we observe in the the real world author Nim Nicholas TB in his book The Black Swan defines these two categories as mediocre Stan and extremist where mediocris are the gaussian like data while extremist are the Paro like data and so the key property of data from mediocris is that no single observation will significantly impact the aggregate statistics to see an example of this suppose on your trip to Rome you go visit the Coliseum and then again you have your scale with you and you decide to start weighing random strangers at the Coliseum so let's say you weigh a th000 people at the Coliseum and compute the average and it turns out to be 175 lb then suppose you add to this 1,000 person sample the heaviest Italian that you can find and so if you do this this will have very little impact on the mean the average might go from 175 lb to 175.2 lb and this is the key property of data from mediocre Stan which is again that no single observation will significantly impact the aggregate statistics there's going to be no person on Earth that you can add to this sample that will dramatically change the mean of the weight distribution however data from extremist on is different in this case a single observation can and often will drive the aggregate statistics so let's say instead of weighing people at the Coliseum you ask them what their net worth is again you get that same sample of 1,000 people and you compute their mean net worth and you find it to be about $300,000 and then let's say you add the richest Italian to the sample what's going to happen here is that the average net worth is going to go from about $300,000 to $7.5 million so about a 25x increase in the average from just a single observation and so that's the key property of data from extremist on and data following a Paro like distribution to get a bit more intuition about this here are some more examples from mediocris Stan and extremist respectively gaussian like data will be things like IQ weight height calorie consumption test scores car accidents mortality rates blood pressure on the other side data from extremist on will be things like wealth as we saw at the Coliseum sales as people talk about with the 8020 rule in business city populations which we mentioned earlier pandemics deaths in wars and terrorist attacks word occurrences and text a very small number of words will be used the most amount of times academic citations a very small number of researchers get the bulk of the citations and Company sizes there are very few number of companies that employ most of the world's Workforce as you can see the things that live an extremists on isn't some trivial set of things in fact you could argue that most of the things that we care about as a society and civilization are not gaussian likee at all while this may seem just like splitting hairs some like technical exercise of categorizing data as gaussian like or par like it turns out there are major limitations to our standard statistical Tools in analyzing data from extremist on and so here I'll highlight three such problems with using our so-called stat 101 techniques to try to analyze these quantities that we care about and so this all boils down to one thing the law of large numbers which basically says if we take n random samples the sample mean will approach the true mean as the number of samples goes to Infinity put another way if we start collecting data generated from a gaussian distribution as we collect more and more samples more and more observations the average that we compute from our sample will approach the true average of the underlying distribution this is also true for the Poo distribution and a uniform distribution and a log normal distribution any distribution that has a finite mean the law of large numbers is true however in practice we never have infinite data we can only have a certain number of observations and this results in some complications with the law of large numbers assumption if we take 10 observations we'll get a pretty accurate sample mean of a gaussian distribution however if we take 10 observations of something generated from a Paro distribution the sample mean is going to be biased this is all because the the law of large numbers Works more slowly for power laws than gaussian distributions which brings us to our first problem the mean is meaningless as well as many other metrics when it comes to working with finite sample sizes of data that follows a power law distribution is that it takes much longer for the mean to converge to the true value compared to a gaussian so we can see this from the plots shown here so on the left we have the number of samples on the x-axis and then on the y- AIS we have the sample mean so this black line here is the true mean and then the blue line is the mean that we compute when the data is generated from a gaussian while this orange line here is the mean that we compute when the data is generated from a Paro distribution as you can see the gussian is never too far off from the True Value you know maybe in the super small sample sizes you have a biased mean but pretty quickly it starts to get really close to the True Value however for the power lot we can see the sample mean is not only much more biased than the gaussian but it's also much more erratic and this extends to not just small sample sizes like 100 observations but to a th000 observations and even 10,000 observations this whole time the Paro sample me is much more erratic than the gaussian and much more biased this even extends to when we 10x the sample size even more to a 100,000 observations at this point the gaussian is right on the money the mean isn't changing at all with additional observations however with the power law the mean is still wiggling around and not quite the True Value and so we're seeing bias at 100,000 observations for the power law similar to what we were seeing at about 10 observations for the gaussian but this isn't limited to just the mean we see this for many other standard statistical quantities that's what's being shown here on the left hand side of these plots we have the respective quantities so we have the median the standard deviation the variance the mean the max first percentile the 99th percentile ptosis and entropy and then horizontally oriented we have 100 samples the th sample case and the 10,000 sample case so while some of these quantities are relatively stable like the median once you get to sufficient sample size it tends to level out the minimum value even in small sample size it's pretty accurate and the first percentile in small sample size is pretty accurate and stable some of these other quantities can't seem to land on a particular value so namely standard deviation variance the maximum the 99th percentile to some extent curtosis and then entropy seems to continually be changing Without End so the one quantity I want to highlight here is the maximum and that's because given this property that rare events Drive the statistics of power LW distributions as sample size increases we see a order of magnitude increase in the maximum value when we go from a th000 samples to 10,000 samples the danger here is that you could have a maximum value that seems stable in a relatively small sample size let's say you have 7,000 observations and the max value seems to have plateaued and it seems pretty stable but then as you collect more data you have this huge jump in the max value and so the danger here is that you can be in this period where it seems like things are stable and predictable but then all of a sudden you have this huge change in the data that you're observing so to connect this to the real world if this data were say deaths from a pandemic what this might look like is the deadliest pandemic in a 100-year time span will be in order of magnitude less severe than a pandemic in a Thousand-Year time span the deadliest pandemic in the past 100 years was the Spanish Flu which killed about 50 million people and we might think okay that was the deadliest pandemic it's not going to get any worse than that if the data is following a power law we can't be surprised if over a Thousand-Year time period the deadliest pandemic claims 500 million victims so this is highlighting this key property of data from extremist on which is that rare events Drive the underlying statistics however this doesn't stop with the mean and all the other standard statistical quantities that we see here it also impacts our ability to make predictions effectively which brings us to problem two regression doesn't work so what regression boils down to is predicting future events from past data and intuitively if your data is driven by rare events you may simply just not have enough past observations to make good predictions about the future and this problem is exacerbated when working with power law distributions so let's look at a particular example let's suppose that we want to do linear regression between the variable X and Y here x is a normally distributed random variable m and b are the parameters that we're trying to learn and e is a noise term that follows a power law distribution so one case where regression just completely breaks down is when this noise term has an alpha value that tail index we saw earlier when we defined power laws is less than or equal to two because in this case the power law has infinite variance so the variance of this noise term is going to be infinity and it turns out if the variance of this noise term is infinite then the variance of this whole equation will be infinite which makes the R 2 value go to zero there's a quick derivation of this in citation number two Linked In the description below in chapter 6.7 but of course you can't observe infinite variance in practice because your data is necessarily finite so what's going to happen when doing regression in practice is going to going be similar to what we saw before with the max value where the results might seem stable in small sample size but then break down as more data are collected we can see this through an example taking our normally distributed random variable with the added power law noise term and doing a linear regression with a 100 samples the results of our regression might look like this which looks pretty good you know maybe there's some outliers here but overall we get a pretty good fit and the r squ isn't bad however this is incorrect correct because the noise term has infinite variance which means r s should actually be zero in this case and indeed as we collect more and more data we can see the R squ value quickly deteriorating so we go from 100 samples to a th000 to 10,000 to 100,000 to a million to 10 million to 100 million and so on this is the danger of doing regression with data that follows a power LW your results might look deceivingly well in small sample size but then as you collect more data your model performance quickly deteriorates but at this point you might say sha what's the big deal you know so what if our model can't predict some super rare events like these like 1 in a th000 one in 10,000 Etc events the model can predict 99% of things pretty well why do we care about these super rare events and I agree with you when data are generated from a power law it's not hard to be right most of the time because most of the data do not live in this long t of the power law however when solving problems and making decisions in the real world probabilities are only half of the story the other half of the story are payoffs which brings us to problem number three payoffs diverge from probabilities in other words it's not just about how often you are right or wrong but also what happens when you're right or wrong so let's see what this might look like in a business context consider a software company with three key offerings offer one is they have a free software that has ads they have a premium offer which it's no ads with some monthly subscription and then they have a third offer which is a Enterprise level software with different customizations and add-ons and whatever those clients need and let's say that the 8020 rule is in play so 80% of sales comes from 20% of customers what this might look like is that 80% of customers go with offer one they just use the free version 16% of customers use the premium version and then 4% of clients are the Enterprise clients what this means for revenue is that 20% of the revenue comes from the free users 16% of the revenue comes from the premium users and 64% of the revenue most of the revenue comes from the Enterprise customers so let's say the software company wants to optimize the core service making it run 25% more efficiently and as any good company might do they're not just going to roll this out blindly they're going to ask the customers first they're going to ask their customers you like this update is this something that you need so they do a survey and they find that 95% of the customers like the update 4% of the customers don't really care and 1% of the customers said the update was bad seeing that the overwhelming majority of the customers like the update the company decides to move forward with the update but now fast forward 6 weeks and the company notices a 50% drop in Revenue so what happened it turns out that the company's three biggest clients dropped the service because the software update killed some Legacy data Integrations that were critical to their business while this is just like a madeup artificial example it's meant to illustrate the point that in extremist being wrong one time can erase the gains of being right 99 times and even Beyond if 1% of your customers are driving 50% of your Revenue that means that you can do something that 99% of your customers love and 1% of your customers hate and be much worse off and so now we're going to talk about about fat tales there has been a bit of controversy in extremist on an example of this is Illustrated around wealth going back to Paro this idea that 80% of the land is owned by 20% of the citizens has kind of been applied throughout economics with the prevailing sentiment being that wealth follows a parol like distribution so maybe you've heard something like this when it comes to income inequality where it's like the top 1% has like a third of the wealth or something like that but there's a bit of contr I around whether wealth truly follows a Paro distribution or power law distribution or not so the story goes something like this I'll summarize wealth distribution via the mean and standard deviation but of course if wealth is following this power law the mean and standard deviation are going to be useless because these are parameters for a gaussian distribution not so much helpful for a power Lot distribution so someone will say that's useless because wealth follows a power law but then you have someone else that's saying actually wealth fits a log normal distribtion tion better and then you'll have someone else that says Well Log normal behaves like a power log distribution for high Sigma so this kind of summarizes the controversy here and to just avoid this altogether instead of trying to say does some particular data set follow some particular distribution we can instead focus on fat tails this idea of fat tailedness we can Define as the degree to Which rare events Drive the aggregate statistics of the distribution so this Maps directly onto what we were talking about before with mediocris Stan and extremist where in mediocris Stan rare events do not drive the aggregate statistics while in extremist they do to kind of connect this to different distributions we have a sort of map of mediocris and extremist here so on the far left we have the Gan distribution that we all know and love and then more generally we can call these like student te distributions on the right hand side in extremist da we have the power law distributions that we've been discussing but then we have this land in between and we can Define this as the subexponential domain so an example subexponential distribution is the log normal distribution so we can see for low Sigma it kind of looks like a gaussian but for high Sigma it kind of looks like the Paro distribution and we can kind of index different Power lot distributions according to this Alpha parameter so if Alpha is greater than or equal to two the distribution has finite mean and variance which allows us to do some productive statistics with it if the alpha value is between 1 and two it has finite mean but infinite variance so now regression blows up but at least we have a mean we can work with however when the alpha value is below one the mean is infinite and this is what author Nim TB calls the forget about it domain you can't really do much when the power law has a tail as fat as this as you can see the space between mediocris and extremist ston between gaussian distributions and power law distributions is really a spectrum so instead of thinking of this as like a binary thing as like fat tailed or not this is really a quantity that lives on a spectrum from not very fat tailed to very fat tailed while there's no like true way to quantify fat tailedness there are a few heris that we can employ and so here's some ideas the first one is power Latin and we kind of saw this on the right hand side of that image in the previous slide where as the alpha parameter of the power law got smaller and smaller the tail got fatter and fatter so we can use this tail index to kind of quantify how fat the tails are in other words the lower the alpha value the fatter of the Tails and this is kind of demonstrated in this plot here on the other side instead of thinking of it as like power law we can think of it as like non gaussian there are measures for non-gaussianity the most popular being curtosis however the problem with curtosis is that it breaks down when the alpha value is less than or equal to four because it has infinite curtosis another idea is to use the variance of the log normal distribution and this kind of goes from what we saw in the previous slide where for low Sigma log normal distribution looks gaussian but for high Sigma it looks like a power LW so if you have a log normal distribution you can look at the variance to quantify the fat tailedness and then finally TB defines this Kappa metric which generalizes to any type of distribution where lower values have thin Tails or don't have fat tails and large values have fat tails and Kappa has a max value of one so if you want to learn more about that he talks about in reference number six Linked In the description below so that was a ton of information but to try to boil everything down when it comes to data that follows a power law distribution to Fat tailed data the central problem that comes up in practice is insufficient sample size essentially we don't have enough data to truly capture the underlying statistics to cope with this fact I want to leave the data practitioner with a few key takeaways that I like to think about when navigating these types of problems so first and foremost is to plot distributions plot histograms plot PDFs plot cdfs to get an impression of how fat tailed the data might be just kind of visually another takeaway is to ask yourself is this data from mediocris or extremist or somewhere in between maybe turning to some of those heris in the previous slide to try to quantify the fat tailedness another key take away is ask yourself what's the value of a correct prediction but just as importantly what is the cost of an incorrect prediction and then finally if working with fat tailed data don't ignore rare events don't just chop off outliers if 50% of your Revenue comes from 1% of your clients instead of this being something detrimental to your analytics figure out how you can come up with efficient interventions in that 1% to drive even more business and then a couple things I want to call out is if you enjoyed this video and you want to learn more check out the blog published in towards data science Linked In the description below there I cover a bit more details that I may not have covered in the video here all the code to generate the plots that I showed here are available on the GitHub repository linked here and if you enjoyed this content please consider subscribing to the channel that's a great no cost way to support me and the content that I generate and as always thank you so much for your time and thanks for watching

Original Description

🤝 Work with me: https://aibuilder.academy/yt/Wcqt49dXtm8 🚀 Ship AI apps in weeks, not months: https://aibuilder.academy/courses/yt/Wcqt49dXtm8 In this video, I give a beginner-friendly guide to Power Laws and describe 3 problems with using traditional statistical methods to analyze them. 📹 Detecting Power Laws: https://youtu.be/x5-IW1m3zPo 📹 Fat Tails: https://youtu.be/15Kd9OPn7tw 📰 Read more: https://medium.com/towards-data-science/pareto-power-laws-and-fat-tails-0355a187ee6a?sk=2c4da32a8f5d6d90cf515f7ce5204933 💻GitHub Repo: https://github.com/ShawhinT/YouTube-Blog/tree/main/power-laws References [1] Pareto principle. (2023, October 30). In Wikipedia. https://en.wikipedia.org/wiki/Pareto_principle [2] arXiv:2001.10488 [stat.OT] [3] Taleb, N.N. (2007). The Black Swan: the impact of the highly improbable. New York; Random House. [4] https://www.archives.gov/exhibits/influenza-epidemic/ [5] arXiv:0706.1062 [physics.data-an] [6] Taleb, N. N. (2019). How much data do you need? An operational, pre-asymptotic metric for fat-tailedness. International Journal of Forecasting, 35(2), 677–686. https://doi.org/10.1016/j.ijforecast.2018.10.003 You can find many great lectures on this topic here: @nntalebproba Intro - 0:00 Outline - 0:45 The Gaussian Distribution - 1:21 The Pareto Distribution - 2:42 Power Laws - 4:30 Mediocristan vs Extremistan - 5:31 3 Problems with STAT 101 - 8:44 Problem 1: The Mean is Meaningless - 10:07 Problem 2: Regression Doesn't Work - 14:30 Problem 3: Payoffs Diverge from Probabilities - 17:41 Controversy in Extremistan - 20:00 Fat Tails - 21:17 Takeaways - 24:38
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Shaw Talebi · Shaw Talebi · 46 of 60

1 biometricDashboard2 DEMO
biometricDashboard2 DEMO
Shaw Talebi
2 biometricDahboard3 DEMO
biometricDahboard3 DEMO
Shaw Talebi
3 Time Series, Signals, & the Fourier Transform | Introduction
Time Series, Signals, & the Fourier Transform | Introduction
Shaw Talebi
4 The Fast Fourier Transform | How does it (actually) work?
The Fast Fourier Transform | How does it (actually) work?
Shaw Talebi
5 The Wavelet Transform | Introduction & Example Code
The Wavelet Transform | Introduction & Example Code
Shaw Talebi
6 Principal Component Analysis (PCA) | Introduction & Example (Python) Code
Principal Component Analysis (PCA) | Introduction & Example (Python) Code
Shaw Talebi
7 Independent Component Analysis (ICA) | EEG Analysis Example Code
Independent Component Analysis (ICA) | EEG Analysis Example Code
Shaw Talebi
8 Kmeans-based Blink Detecter DEMO
Kmeans-based Blink Detecter DEMO
Shaw Talebi
9 Shit Happens, Stay Solution Oriented
Shit Happens, Stay Solution Oriented
Shaw Talebi
10 Why Conflict Is Good & How You Can Use It
Why Conflict Is Good & How You Can Use It
Shaw Talebi
11 Causality: An Introduction | How (naive) statistics can fail us
Causality: An Introduction | How (naive) statistics can fail us
Shaw Talebi
12 Causal Inference | Answering causal questions
Causal Inference | Answering causal questions
Shaw Talebi
13 Causal Discovery | Inferring causality from observational data
Causal Discovery | Inferring causality from observational data
Shaw Talebi
14 How to Be Antifragile | 7 Practical Tips
How to Be Antifragile | 7 Practical Tips
Shaw Talebi
15 Multi-kills: How to Do More With Less (no, not by multi-tasking)
Multi-kills: How to Do More With Less (no, not by multi-tasking)
Shaw Talebi
16 Topological Data Analysis (TDA) | An introduction
Topological Data Analysis (TDA) | An introduction
Shaw Talebi
17 The Mapper Algorithm | Overview & Python Example Code
The Mapper Algorithm | Overview & Python Example Code
Shaw Talebi
18 Persistent Homology | Introduction & Python Example Code
Persistent Homology | Introduction & Python Example Code
Shaw Talebi
19 What Is Data Science & How To Start? | A Beginner's Guide
What Is Data Science & How To Start? | A Beginner's Guide
Shaw Talebi
20 How to do MORE with LESS - multikills
How to do MORE with LESS - multikills
Shaw Talebi
21 Causal Effects | An introduction
Causal Effects | An introduction
Shaw Talebi
22 Causal Effects via Propensity Scores | Introduction & Python Code
Causal Effects via Propensity Scores | Introduction & Python Code
Shaw Talebi
23 Causal Effects via the Do-operator | Overview & Example
Causal Effects via the Do-operator | Overview & Example
Shaw Talebi
24 Causal Effects via DAGs | How to Handle Unobserved Confounders
Causal Effects via DAGs | How to Handle Unobserved Confounders
Shaw Talebi
25 Smoothing Crypto Time Series with Wavelets | Real-world Data Project
Smoothing Crypto Time Series with Wavelets | Real-world Data Project
Shaw Talebi
26 Causal Effects via Regression w/ Python Code
Causal Effects via Regression w/ Python Code
Shaw Talebi
27 5 Reasons Why Every Data Scientist Should Consider Freelancing
5 Reasons Why Every Data Scientist Should Consider Freelancing
Shaw Talebi
28 An Introduction to Decision Trees | Gini Impurity & Python Code
An Introduction to Decision Trees | Gini Impurity & Python Code
Shaw Talebi
29 10 Decision Trees are Better Than 1 | Random Forest & AdaBoost
10 Decision Trees are Better Than 1 | Random Forest & AdaBoost
Shaw Talebi
30 Dimensionality Reduction & Segmentation with Decision Trees | Python Code
Dimensionality Reduction & Segmentation with Decision Trees | Python Code
Shaw Talebi
31 How to Make a Data Science Portfolio With GitHub Pages (2025)
How to Make a Data Science Portfolio With GitHub Pages (2025)
Shaw Talebi
32 My $100,000+ Data Science Resume (what got me hired)
My $100,000+ Data Science Resume (what got me hired)
Shaw Talebi
33 How to Create a Custom Email Signature in Gmail (2025)
How to Create a Custom Email Signature in Gmail (2025)
Shaw Talebi
34 I Spent $675.92 Talking to Top Data Scientists on Upwork—Here’s what I learned
I Spent $675.92 Talking to Top Data Scientists on Upwork—Here’s what I learned
Shaw Talebi
35 Lessons from Spending $675.92 to Talk to Top Data Scientists on Upwork #freelance #datascience
Lessons from Spending $675.92 to Talk to Top Data Scientists on Upwork #freelance #datascience
Shaw Talebi
36 A Practical Introduction to Large Language Models (LLMs)
A Practical Introduction to Large Language Models (LLMs)
Shaw Talebi
37 The OpenAI (Python) API | Introduction & Example Code
The OpenAI (Python) API | Introduction & Example Code
Shaw Talebi
38 The Hugging Face Transformers Library | Example Code + Chatbot UI with Gradio
The Hugging Face Transformers Library | Example Code + Chatbot UI with Gradio
Shaw Talebi
39 Why I Quit My $150,000 Data Science Job
Why I Quit My $150,000 Data Science Job
Shaw Talebi
40 Prompt Engineering: How to Trick AI into Solving Your Problems
Prompt Engineering: How to Trick AI into Solving Your Problems
Shaw Talebi
41 The REALITY of entrepreneurship. #entrepreneurship #startup #smallbusiness
The REALITY of entrepreneurship. #entrepreneurship #startup #smallbusiness
Shaw Talebi
42 Fine-tuning Large Language Models (LLMs) | w/ Example Code
Fine-tuning Large Language Models (LLMs) | w/ Example Code
Shaw Talebi
43 How to Build an LLM from Scratch | An Overview
How to Build an LLM from Scratch | An Overview
Shaw Talebi
44 I Have 90 Days to Make $10k/mo—Here's my plan
I Have 90 Days to Make $10k/mo—Here's my plan
Shaw Talebi
45 I Spent $716.46 Talking to Data Scientists on Upwork—Here’s what I learned.
I Spent $716.46 Talking to Data Scientists on Upwork—Here’s what I learned.
Shaw Talebi
Pareto, Power Laws, and Fat Tails
Pareto, Power Laws, and Fat Tails
Shaw Talebi
47 Do NOT become an entrepreneur #entrepreneurship
Do NOT become an entrepreneur #entrepreneurship
Shaw Talebi
48 Detecting Power Laws in Real-world Data | w/ Python Code
Detecting Power Laws in Real-world Data | w/ Python Code
Shaw Talebi
49 How I’d learn data analytics (if I had to start over in 2024) #dataanalytics
How I’d learn data analytics (if I had to start over in 2024) #dataanalytics
Shaw Talebi
50 4 Ways to Measure Fat Tails with Python (+ Example Code)
4 Ways to Measure Fat Tails with Python (+ Example Code)
Shaw Talebi
51 Fine-tuning EXPLAINED in 40 sec #generativeai
Fine-tuning EXPLAINED in 40 sec #generativeai
Shaw Talebi
52 How Much YouTube Paid Me in My First 6 Months of Monetization (as a Data Science Creator)
How Much YouTube Paid Me in My First 6 Months of Monetization (as a Data Science Creator)
Shaw Talebi
53 5 Questions Every Data Scientist Should Hardcode into Their Brain
5 Questions Every Data Scientist Should Hardcode into Their Brain
Shaw Talebi
54 AI for Business: A (non-technical) introduction
AI for Business: A (non-technical) introduction
Shaw Talebi
55 LLMs EXPLAINED in 60 seconds #ai
LLMs EXPLAINED in 60 seconds #ai
Shaw Talebi
56 3 Ways to Make a Custom AI Assistant | RAG, Tools, & Fine-tuning
3 Ways to Make a Custom AI Assistant | RAG, Tools, & Fine-tuning
Shaw Talebi
57 What is #ai? — Simply Explained
What is #ai? — Simply Explained
Shaw Talebi
58 QLoRA—How to Fine-tune an LLM on a Single GPU (w/ Python Code)
QLoRA—How to Fine-tune an LLM on a Single GPU (w/ Python Code)
Shaw Talebi
59 How to Improve LLMs with RAG (Overview + Python Code)
How to Improve LLMs with RAG (Overview + Python Code)
Shaw Talebi
60 Text Embeddings, Classification, and Semantic Search (w/ Python Code)
Text Embeddings, Classification, and Semantic Search (w/ Python Code)
Shaw Talebi

This video provides an introduction to Power Laws, Pareto's principle, and Fat Tails, covering the limitations of traditional statistical methods and the importance of considering rare events. It highlights the characteristics of Power Law distributions and the issues with using mean and variance. The video also discusses the alpha parameter, tail index, and Kappa metric, and provides resources for further learning.

Key Takeaways
  1. Understand the characteristics of Power Law distributions
  2. Recognize the limitations of traditional statistical methods
  3. Consider rare events and their impact on statistics
  4. Apply the alpha parameter, tail index, and Kappa metric to analyze Power Law data
  5. Use resources such as GitHub and Towards Data Science for further learning
💡 Power Law distributions have unique characteristics that require special consideration when analyzing data, and traditional statistical methods may not be sufficient to capture the underlying patterns and relationships.

Related AI Lessons

Up next
How to Open OSM Files (OpenStreetMap Data)
File Extension Geeks
Watch →