Python Tutorial : Optimal parameters

DataCamp · Beginner ·🔢 Mathematical Foundations ·6y ago

Skills: Python for Data90%ML Maths Basics80%

Key Takeaways

This video tutorial covers optimal parameters in statistical thinking using Python, specifically using NumPy and Matplotlib to compute and plot the cumulative distribution function (CDF) of a normal distribution, and finding the optimal parameters by comparing the theoretical CDF with the empirical CDF.

Full Transcript

after completing the prequel to this course you are now beginning to think probabilistically outcomes of measurements follow probability distributions defined by the story of how the data came to be when we looked at Michelson speed of light and air measurements we assumed that the results were normally distributed we verified that both by looking at the PDF and the CDF which was more effective because there is no binning bias to compute and plot the CDF we needed our old friends numpy and matplotlib pipeline so the first step was to import them with their traditional aliases to compute the theoretical CDF by sampling we passed two parameters into NP at random dot normal the mean and standard deviation the values we chose for these parameters were in fact the mean and standard deviation we calculated directly from the data the result was that the theoretical CDF overlaid beautifully with the empirical CDF how did we know that the mean and standard deviation calculated from the data were the appropriate values for the normal parameters we could have chosen others what if the standard deviation differs by 50% the CDF's no longer match or if the mean varies by just point O 1% so if we believe that the process that generates our data gives normally distributed results the set of parameters that brings the model in this case the normal distribution and closest agreement with the data uses the mean and standard deviation computed directly from the data these are the optimal parameters remember though the parameters are only optimal for the model you choose for your data when your model is wrong the optimal parameters are really not meaningful finding the optimal parameters is not always as easy as just computing the mean and standard deviation from the data we will encounter this later in this chapter when we do linear regressions and we rely on built-in numpy functions to find the optimal parameters for us I pause here to note that there are great tools in the Python ecosystem for doing statistical inference including by optimization psy PI dot stats and stats models being two good examples in this course however we focus on Hacker statistics because the technique is like a Swiss Army knife the same simple principle is applicable to a wide variety of statistical problems now it's time for you to do some exercises to demonstrate how choosing optimal parameters results in best agreement between the theoretical model distribution and your data

Original Description

Want to learn more? Take the full course at https://learn.datacamp.com/courses/statistical-thinking-in-python-part-2 at your own pace. More than a video, you'll learn hands-on coding & quickly apply skills to your daily work. --- After completing the prequel to this course, you are now beginning to think probabilistically. Outcomes of measurements follow probability distributions defined by the story of how the data came to be. When we looked at Michelson's speed of light in air measurements, we assumed that the results were Normally distributed. We verified that both by looking at the PDF and the CDF, which was more effective because there is no binning bias. To compute and plot the CDF, we needed our old friends Numpy and matplotlib dot pyplot, so the first step was to import them with their traditional aliases. To compute the theoretical CDF by sampling, we passed two parameters into np dot random dot normal, the mean and standard deviation. The values we chose for these parameters were in fact the mean and standard deviation we calculated directly from the data. The result was that the theoretical CDF overlayed beautifully with the empirical CDF. How did we know that the mean and standard deviation calculated from the data were the appropriate values for the Normal parameters? We could have chosen others. What if the standard deviation differs by 50%? The CDFs no longer match. Or if the mean varies by just point-01%. So, if we believe that the process that generates our data gives Normally distributed results, the set of parameters that brings the model, in this case a Normal distribution, in closest agreement with the data uses the mean and standard deviation computed directly from the data. These are the optimal parameters. Remember though, the parameters are only optimal for the model you chose for your data. When your model is wrong, the optimal parameters are not really meaningful. Finding the optimal parameters is not always as easy as just computi

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from DataCamp · DataCamp · 0 of 60

← Previous Next →

SQL Server Tutorial: Date manipulation

SQL Server Tutorial: Date manipulation

R Tutorial: Intermediate Interactive Data Visualization with plotly in R

R Tutorial: Intermediate Interactive Data Visualization with plotly in R

R Tutorial: Adding aesthetics to represent a variable

R Tutorial: Adding aesthetics to represent a variable

R Tutorial: Moving Beyond Simple Interactivity

R Tutorial: Moving Beyond Simple Interactivity

Python Tutorial: Why use ML for marketing? Strategies and use cases

Python Tutorial: Why use ML for marketing? Strategies and use cases

Python Tutorial: Preparation for modeling

Python Tutorial: Preparation for modeling

Python Tutorial: Machine Learning modeling steps

Python Tutorial: Machine Learning modeling steps

R Tutorial: The prior model

R Tutorial: The prior model

R Tutorial: Data & the likelihood

R Tutorial: Data & the likelihood

R Tutorial: The posterior model

R Tutorial: The posterior model

R Tutorial: An Introduction to plotly

R Tutorial: An Introduction to plotly

R Tutorial: Plotting a single variable

R Tutorial: Plotting a single variable

R Tutorial: Bivariate graphics

R Tutorial: Bivariate graphics

Python Tutorial: Customer Segmentation in Python

Python Tutorial: Customer Segmentation in Python

Python Tutorial: Time cohorts

Python Tutorial: Time cohorts

Python Tutorial: Calculate cohort metrics

Python Tutorial: Calculate cohort metrics

Python Tutorial: Cohort analysis visualization

Python Tutorial: Cohort analysis visualization

R Tutorial: Building Dashboards with flexdashboard

R Tutorial: Building Dashboards with flexdashboard

R Tutorial: Anatomy of a flexdashboard

R Tutorial: Anatomy of a flexdashboard

R Tutorial: Layout basics

R Tutorial: Layout basics

R Tutorial: Advanced layouts

R Tutorial: Advanced layouts

Python Tutorial: Time Series Analysis in Python

Python Tutorial: Time Series Analysis in Python

Python Tutorial: Correlation of Two Time Series

Python Tutorial: Correlation of Two Time Series

Python Tutorial: Simple Linear Regressions

Python Tutorial: Simple Linear Regressions

Python Tutorial: Autocorrelation

Python Tutorial: Autocorrelation

R Tutorial: The gapminder dataset

R Tutorial: The gapminder dataset

R Tutorial: The filter verb

R Tutorial: The filter verb

R Tutorial: The arrange verb

R Tutorial: The arrange verb

R Tutorial: The mutate verb

R Tutorial: The mutate verb

R Tutorial: What is cluster analysis?

R Tutorial: What is cluster analysis?

R Tutorial: Distance between two observations

R Tutorial: Distance between two observations

R Tutorial: The importance of scale

R Tutorial: The importance of scale

R Tutorial: Measuring distance for categorical data

R Tutorial: Measuring distance for categorical data

Python Tutorial: Plotting multiple graphs

Python Tutorial: Plotting multiple graphs

Python Tutorial: Customizing axes

Python Tutorial: Customizing axes

Python Tutorial: Legends, annotations, & styles

Python Tutorial: Legends, annotations, & styles

Python Tutorial: Introduction to iterators

Python Tutorial: Introduction to iterators

Python Tutorial: Playing with iterators

Python Tutorial: Playing with iterators

Python Tutorial: Using iterators to load large files into memory

Python Tutorial: Using iterators to load large files into memory

SQL Tutorial: Introduction to Relational Databases in SQL

SQL Tutorial: Introduction to Relational Databases in SQL

SQL Tutorial: Tables: At the core of every database

SQL Tutorial: Tables: At the core of every database

SQL Tutorial: Update your database as the structure changes

SQL Tutorial: Update your database as the structure changes

Python Tutorial: Classification-Tree Learning

Python Tutorial: Classification-Tree Learning

Python Tutorial: Decision-Tree for Classification

Python Tutorial: Decision-Tree for Classification

Python Tutorial: Decision-Tree for Regression

Python Tutorial: Decision-Tree for Regression

Python Tutorial: Census Subject Tables

Python Tutorial: Census Subject Tables

Python Tutorial: Census Geography

Python Tutorial: Census Geography

Python Tutorial: Using the Census API

Python Tutorial: Using the Census API

R Tutorial: A/B Testing in R

R Tutorial: A/B Testing in R

R Tutorial: Baseline Conversion Rates

R Tutorial: Baseline Conversion Rates

R Tutorial: Designing an Experiment - Power Analysis

R Tutorial: Designing an Experiment - Power Analysis

R Tutorial: Introduction to qualitative data

R Tutorial: Introduction to qualitative data

R Tutorial: Understanding your qualitative variables

R Tutorial: Understanding your qualitative variables

R Tutorial: Making Better Plots

R Tutorial: Making Better Plots

SQL Tutorial: OLTP and OLAP

SQL Tutorial: OLTP and OLAP

SQL Tutorial: Storing data

SQL Tutorial: Storing data

SQL Tutorial: Database design

SQL Tutorial: Database design

Python Tutorial: Introduction to spaCy

Python Tutorial: Introduction to spaCy

Python Tutorial: Statistical Models

Python Tutorial: Statistical Models

Python Tutorial: Rule-based Matching

Python Tutorial: Rule-based Matching

This video teaches how to find optimal parameters for a statistical model using Python, and how to compare the theoretical and empirical distributions to ensure the best agreement. The tutorial uses NumPy and Matplotlib to demonstrate the concept.

Key Takeaways

Import necessary libraries (NumPy and Matplotlib)
Compute the mean and standard deviation of the data
Use the mean and standard deviation to compute the theoretical CDF
Compare the theoretical CDF with the empirical CDF
Adjust the parameters to find the optimal values
Use built-in functions (e.g. SciPy, statsmodels) for more complex statistical problems

💡 The optimal parameters for a statistical model are those that bring the model into closest agreement with the data, and can be found by comparing the theoretical and empirical distributions.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Python for Data

View skill →

Monte Carlo: Forecasting Stock Prices Part I

Monte Carlo: Forecasting Stock Prices Part I

365 Data Science

Real Estate Data Visualization Using Map in Python

Real Estate Data Visualization Using Map in Python

Python Tutorial : Importing flat files from the web

Python Tutorial : Importing flat files from the web

Python Tutorial : Meet the Tuples

Python Tutorial : Meet the Tuples

Advanced Python for Data Analysis: Build & Optimize

Advanced Python for Data Analysis: Build & Optimize

Apply Data Analytics Using Python and Pandas

Apply Data Analytics Using Python and Pandas

Related AI Lessons

Super Mario is mathier than you think

Super Mario's world is full of mathematical concepts, making it a great example of how math is used in real-world problem-solving

MIT Technology Review

A Geometry Puzzle With 3 Circles

Solve a geometry puzzle involving 3 circles using mathematical reasoning and visualization techniques

Medium · Data Science

The Consecutive Integers Divisibility Trick

Learn the Consecutive Integers Divisibility Trick to simplify difficult proofs in mathematics and programming

Medium · Programming

The Mayans Invented Zero Before Most of the World — Here Is Their Number System in Python

Learn about the Mayan number system and its implementation in Python, highlighting the importance of zero in their base-20 system

Medium · Python

How to Open OSM Files (OpenStreetMap Data)

File Extension Geeks