Favorite Stats Books: Seven Pillars of Statistical Wisdom

Jay Alammar · Beginner ·🧠 Large Language Models ·4y ago

Skills: LLM Foundations80%ML Maths Basics70%

Key Takeaways

The video discusses the book 'The Seven Pillars of Statistical Wisdom' by Stephen Stiegler, which explores seven foundational statistical ideas that are revolutionary for their time and heavily used in science, technology, and machine learning. The seven pillars are aggregation, information, likelihood, intercomparison, regression, design of experiments, and residuals.

Full Transcript

hello everybody welcome back to a new video in this video we'll be talking about a new one of my favorite books it is the seven pillars of statistical wisdom why statistics because my introduction to statistics in let's say recent years so i studied a little bit of statistics when i was doing my computer science degree but the more i went into machine learning and ai you start to have to deal with a lot of statistical concepts because a lot of machine learning i mean statistics is one of the let's say two or three foundations of machine learning next to computer science and mathematics and when you learn about these statistical ideas in the let's say in your journey to learn machine learning a book like this is very interesting because it pulls out these threads of statistical ideas um and it puts them into historical context it's a very accessible book we'll go into exactly the seven ideas but what the book does is to say these are seven major ideas of statistics that are foundational to the statistics as we know it today and then built on top of these seven is a lot of other statistical ideas but also machine learning and ai sort of come on top of this structure of statistics i love this book because it's very accessible it's easy to pick up and learn and read it's not explaining the ideas in sort of mathematical ways it's it's a really smooth way of storytelling the origins of these ideas how they developed the people around them and the kinds of problems that they were trying to solve when they came up with these let's say revolutionary methods so as somebody who's maybe didn't have the best time with the statistics textbooks uh back in school because they went right into uh you know when you toss a coin 100 times or a thousand times what happens and that's a very important and sort of rigorous understanding of statistics uh but this is a very sort of human look at looking at the history of those ideas those main extremely important and that you can maybe take for granted now if you don't see them in the proper historic light that this book puts them in so let's get into the seven pillars of statistical wisdom and see what those seven are so this is the seven pillars of statistical wisdom by stephen stiegler it's a very small book and it's very easy to go about there are a bunch of visuals on there highly really good sort of storytelling style off of the book but then what are the seven so the seven ideas or the seven pillars that a lot of modern statistics is built on top of one is aggregation from tables and means to least squares information its measure and rate of change likelihood calibration on a probability scale intercomparison within sample variation as a standard regression so multivariate analysis bayesian inference and causal inference design experimental planning and the role of randomization and residual scientific logic model comparison and diagnostic display now for me i would say the three that i sort of most enjoyed were aggregation information i really want to spend a bunch more time on likelihood but regression was also extremely important so aggregation is basically the idea that you can gain more information sometimes by throwing away information that you have so let's say an average if you have 100 measurements you can average them to have only one number and that number tells you something that maybe the hundred don't tell you and with that you can gain more information the intro [Music] chapter explains these in in a very good way aggregation is the combination of ideas so you gain you can gain information by throwing information away and that is sort of revolutionary the second pillar is information so information measurement and that's the idea that if you have 20 measurements of let's say a phenomena and then you have you go out and take 20 more measurements you're not doubling the information that you have the first 20 actually gave you more information than the second 20 and that's the square root of the number n of observations likelihood is the calibration of inference with the use of probability intercomparison which is the idea that you can gain some insight by comparing a data set to itself intercomparison is the fourth pillar the fifth is regression and the idea here is to get from galton's uh ideas about regression to the mean and how that sort of explains some of the questions raised by darwin's theory of evolution and from regression to the mean you have concepts like regression in prediction so the work in fact introduces modern multivariate analysis design of experiments is the sixth pillar and then the seventh is residuals to paraphrase these seven ideas the author puts these lists which is what is the value of targeted reduction or compression of data so that's aggregation the diminished value of an increased amount of data so your first 20 is maybe has more information than your second 20. how to put a probability measuring stick to what we do that is likelihood how to use internal variation in the data to help in that how asking questions from different perspectives can lead to revealingly different answers that would be regression and then the essential role of the planning of observations so how you design experiments and the importance of being careful in that and how all these ideas can be used in exploring and comparing competing explanations in science so this has been a quick look to the seven pillars of statistical wisdom very readable uh intro to very interesting and important statistical ideas highly recommended i've been going over it and i plan to spend even more time with it curious to see what you think about it let me know in the comments below thank you for watching

Original Description

The Seven Pillars of Statistical Wisdom is a wonderful small book about seven foundational statistical ideas that were revolutionary for their time. These seven are heavily used in science, technology, and machine learning. Jay goes over the seven ideas and what makes this an accessible and enjoyable book. Contents: Introduction (0:00) Looking Inside: Seven Pillars (2:39) Looking Inside: Paraphrasing The Seven (6:00) Closing (6:54) ------ Twitter: https://twitter.com/JayAlammar Blog: https://jalammar.github.io/ Mailing List: https://jayalammar.substack.com/ ------ More videos by Jay: Language Processing with BERT: The 3 Minute Intro (Deep learning for NLP) https://youtu.be/ioGry-89gqE Seeing Voices: 1 - Intro to Spectrograms https://www.youtube.com/watch?v=37zCgCdV468 The Narrated Transformer Language Model https://youtu.be/-QH8fRhqFHM Jay's Visual Intro to AI https://www.youtube.com/watch?v=mSTCz... How GPT-3 Works - Easily Explained with Animations https://www.youtube.com/watch?v=MQnJZ...

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Jay Alammar · Jay Alammar · 16 of 38

← Previous Next →

Jay's Visual Intro to AI

Jay's Visual Intro to AI

Making Money from AI by Predicting Sales - Jay's Intro to AI Part 2

Making Money from AI by Predicting Sales - Jay's Intro to AI Part 2

How GPT3 Works - Easily Explained with Animations

How GPT3 Works - Easily Explained with Animations

The Narrated Transformer Language Model

The Narrated Transformer Language Model

My Visualization Tools (my Apple Keynote setup for visualizations and animations)

My Visualization Tools (my Apple Keynote setup for visualizations and animations)

Explainable AI Cheat Sheet - Five Key Categories

Explainable AI Cheat Sheet - Five Key Categories

The Unreasonable Effectiveness of RNNs (Article and Visualization Commentary) [2015 article]

The Unreasonable Effectiveness of RNNs (Article and Visualization Commentary) [2015 article]

Neural Activations & Dataset Examples

Neural Activations & Dataset Examples

Up and Down the Ladder of Abstraction [interactive article by Bret Victor, 2011]

Up and Down the Ladder of Abstraction [interactive article by Bret Victor, 2011]

Probing Classifiers: A Gentle Intro (Explainable AI for Deep Learning)

Probing Classifiers: A Gentle Intro (Explainable AI for Deep Learning)

Inspecting Neural Networks with CCA - A Gentle Intro (Explainable AI for Deep Learning)

Inspecting Neural Networks with CCA - A Gentle Intro (Explainable AI for Deep Learning)

Language Processing with BERT: The 3 Minute Intro (Deep learning for NLP)

Language Processing with BERT: The 3 Minute Intro (Deep learning for NLP)

Behavioral Testing of ML Models (Unit tests for machine learning)

Behavioral Testing of ML Models (Unit tests for machine learning)

Favorite AI/ML Books: Intro to ML with Python (Book Review)

Favorite AI/ML Books: Intro to ML with Python (Book Review)

Favorite Python Books: Effective Python

Favorite Python Books: Effective Python

Favorite Stats Books: Seven Pillars of Statistical Wisdom

Favorite Stats Books: Seven Pillars of Statistical Wisdom

Understanding Animal Languages - Seeing Voices 2

Understanding Animal Languages - Seeing Voices 2

How digital assistants like Siri work #shorts

How digital assistants like Siri work #shorts

Writing Code in Jupyter Notebooks #shorts

Writing Code in Jupyter Notebooks #shorts

Experience Grounds Language: Improving language models beyond the world of text

Experience Grounds Language: Improving language models beyond the world of text

pandas for data science in python #shorts

pandas for data science in python #shorts

The Illustrated Retrieval Transformer

The Illustrated Retrieval Transformer

AI Image Generation is MIND BLOWING! #shorts

AI Image Generation is MIND BLOWING! #shorts

A Generalist Agent (Gato) - DeepMind's single model learns 600 tasks

A Generalist Agent (Gato) - DeepMind's single model learns 600 tasks

The Illustrated Word2vec - A Gentle Intro to Word Embeddings in Machine Learning

The Illustrated Word2vec - A Gentle Intro to Word Embeddings in Machine Learning

AI Art Explained: How AI Generates Images (Stable Diffusion, Midjourney, and DALLE)

AI Art Explained: How AI Generates Images (Stable Diffusion, Midjourney, and DALLE)

What is Generative AI? 4 Important Things to Know (about ChatGPT, MidJourney, Cohere & future AIs)

What is Generative AI? 4 Important Things to Know (about ChatGPT, MidJourney, Cohere & future AIs)

AI is Eating The World - This is Where YOU Can Use it to Compete (AI Product Moats)

AI is Eating The World - This is Where YOU Can Use it to Compete (AI Product Moats)

What is LangChain? Where does it fit with LLMs like ChatGPT and Cohere? #shorts

What is LangChain? Where does it fit with LLMs like ChatGPT and Cohere? #shorts

Are language models with more parameters better? #shorts #chatgpt

Are language models with more parameters better? #shorts #chatgpt

How to manage LLM prompts with tools like LangChain #languagemodels #chatgpt

How to manage LLM prompts with tools like LangChain #languagemodels #chatgpt

What is Llama Index? how does it help in building LLM applications? #languagemodels #chatgpt

What is Llama Index? how does it help in building LLM applications? #languagemodels #chatgpt

prompt chains are important for building large language model applications

prompt chains are important for building large language model applications

ChatGPT has Never Seen a SINGLE Word (Despite Reading Most of The Internet). Meet LLM Tokenizers.

ChatGPT has Never Seen a SINGLE Word (Despite Reading Most of The Internet). Meet LLM Tokenizers.

What makes LLM tokenizers different from each other? GPT4 vs. FlanT5 Vs. Starcoder Vs. BERT and more

What makes LLM tokenizers different from each other? GPT4 vs. FlanT5 Vs. Starcoder Vs. BERT and more

Building LLM Agents with Tool Use

Building LLM Agents with Tool Use

SWE-Bench authors reflect on the state of LLM agents at Neurips 2024

SWE-Bench authors reflect on the state of LLM agents at Neurips 2024

Learn how ChatGPT and DeepSeek models work: How Transformer LLMs Work [Free Course]

Learn how ChatGPT and DeepSeek models work: How Transformer LLMs Work [Free Course]

The video introduces the book 'The Seven Pillars of Statistical Wisdom' and explores the seven foundational statistical ideas that are essential for machine learning and data analysis. The book provides a historical context and explains the concepts in an accessible way.

Key Takeaways

Read the book 'The Seven Pillars of Statistical Wisdom'
Understand the seven pillars of statistical wisdom
Apply the statistical concepts to machine learning and data analysis
Explore the historical context of statistical ideas
Learn about aggregation, information, likelihood, intercomparison, regression, design of experiments, and residuals

💡 The seven pillars of statistical wisdom provide a foundation for understanding statistical concepts and applying them to machine learning and data analysis.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

The 2026 AI Model Release Race: Every Major LLM Launch You Need to Know

Stay updated on the 2026 AI model release race, including major LLM launches like Claude Sonnet 5 and GPT-5.6, to leverage the latest advancements in AI technology

Call GPT, Claude, and Gemini from one API key — a 3-step setup

Access GPT, Claude, and Gemini through one API key with a 3-step setup using Modelishub

Your LLM Doesn’t Pick Stocks — It Remembers Them

Discover how LLMs remember stock picks rather than making actual predictions, and why this matters for AI-driven investment strategies

Medium · Machine Learning

Word Representation

Learn how word representation works in NLP and its importance in understanding human language, enabling applications like text classification and language translation

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)