How to effortlessly work with spreadsheets in Python using MITO

Data Professor · Beginner ·📐 ML Fundamentals ·5y ago

Key Takeaways

The video demonstrates how to use MITO, a tool that generates Python code for working with spreadsheets in Jupyter notebooks, to effortlessly analyze tabular data in Python. It covers importing data, merging datasets, pivoting data, sorting and filtering data, performing simple graphs, and computing statistics via a point-and-click feature.

Full Transcript

in this video we're going to have jake over from the middle team to give us an updated tutorial on how to use mido from the beginning and this includes how to import the data merge data set pivot data sorting the data filtering the data performing simple graphs as well as computing statistics and also perform various functions via a point and click feature of my tool and so without further ado we're starting right now hey this is jake from mido i'm going to show you how you can quickly generate python and analyze your data using uh the mighty extension to jupyter so mito is a tool you can download into your jupyter environment and yeah let's get started so first thing we're going to do is get data into the tool so we can do that multiple ways one thing we can do is we can insert a data frame as an argument if you're already working with some data above we can also delete this here we can also pass in files directly from our file system so i'm going to add these two csvs amtrak 2010 and zip code 2010 and we'll see our data gets put into the tool as data frames and below we will generate this code so everything we do in the mido sheet will generate the equivalent python so this code here is turning these csvs into dataframes so we can start working with them and generating scripts based on them now that we have the data in we can see these two files share this zip code column so we can actually join them on the zip code and we can do this with in the tool without actually having to type any code so we'll click our merge button here it'll automatically detect that this zip code column is shared between the two files and then we can join just by leaving these as the settings and we can also decide what columns we want to keep we'll decide to keep all of them for now so i'll just close this and now you see we've created this new data frame which is the joined data set on this zip code column and if i scroll down below i'll see we generated the code the equivalent pandas code for that join so you can see how we can generate you know what is decently complex syntax really quickly with the tool um you know beyond that if we want to now investigate this joint data set let's say we wanted to look at how many zip codes for each state are present a really great way to do this would be to use a pivot table so i'm going to click our pivot table feature as the row we will put the state and there's the value we'll put the zip code and we can use all these different aggregation methods we're going to use a count here and now we get a pivot table of the amount of zip codes per each state that are present in this data set and below you'll see we get the equivalent code for that pivot table here we can also order this you know we can sort it in ascending or descending order to maybe get a better look at the data so i'll click this column filter button here we can add filters which i'll do in a second but first i'm just going to show we can put it in ascending or descending order to sort the data in the way that we want when we do that we get the equivalent code for that sort but we can also add filters as well so i'll add a simple filter here let's say we just want to look at the values that are greater than 60. only one present it puts out a little more greater than 40 let's say a little more generous we can see the data set adjusts in real time as we change the number that we're filtering on and below we generate the code for that filter which is really valuable beyond that we can also investigate the data i'm going to take this filter off really quickly we can investigate the data a bit more visually so we provide a distribution graph or a frequency chart for any column as well as summary statistics for the column so a really good way to get some high level information on the data that you're dealing with so i'll close this we can also interact with the data in somewhat of a spreadsheet format using spreadsheet function so we can add a column to any data set and then do operations using normal spreadsheet functionality like printf functions like cat for example you click on this documentation button oops you will get a list of all the spreadsheet functions that we support you know that it varies from numerical things to a lot of data cleaning properties left right trim mid functions like that and we'd also do conditional things like ifs ands and ors really valuable for manipulating your data and lunging it and cleaning it and getting it in the format that's best for you and i'll show you a quick one example of how that works let's say you want to concatenate two pieces of data together so this will concatenate the columns here's our concatenated data and below we get the equivalent code and then the last thing i just want to point out i won't get into it too deeply but because we're generating a script here what we can do is we can save these analyses and then replay them on other data sets all we do for that is hit the save button and what it does is it saves the logic here it saves the steps that we've taken and then we can replay those steps to a similarly structured data set so you can imagine if you have a process where you're analyzing a similar set of data um you know maybe it's a data that's being updated maybe once a day once a month twice a day once a week whatever it is um you can sort of automate that process within the tool it's sort of like a macro within the python environment but definitely more functionality we could go into but i'll sort of leave here with a high level look into the tool um you know essentially it's a great way to generate code really quickly and in a lot of examples code that could be a little bit hard to do by hand it might take some time to go look up the exact right syntax we essentially give you that in this point-and-click format and then also a great way to investigate your data a bit more visually so yeah i hope you uh sign up to use the tool and would love to meet you at some point thanks so i hope that you found this video helpful and if you're finding value in this video please give it a thumbs up subscribe if you haven't already hit on the notification bell in order to be notified of the next video and as always the best way to learn data science is to do data science and please enjoy the journey

Original Description

In this video, we have Jake from the MITO team to show us the new features of MITO for working with spreadsheets in a Jupyter notebook? Mito generates Python code that corresponds to the edits made in the spreadsheet (via the point and click). Now you can effortlessly analyze tabular data in Python with a few clicks of the mouse. ⭕ Links for this video: 1. Sign up for Access: https://hubs.ly/H0HQjCn0 2. The Mito website: https://hubs.ly/H0HQjDw0 ⭕ Time stamps: 0:00 Introduction 0:51 Importing Data 1:27 Merging Datasets 2:15 Pivoting Data 2:49 Sorting Data 3:12 Filtering Data 3:45 Graphing and Statistics 4:02 Functions 🌟 Subscribe to this YouTube channel https://www.youtube.com/dataprofessor?sub_confirmation=1 🌟 Join the Newsletter of Data Professor http://sendfox.com/dataprofessor 🌟 Buy me a coffee https://www.buymeacoffee.com/dataprofessor 🌟 Download Kite for FREE https://www.kite.com/get-kite/?utm_medium=referral&utm_source=youtube&utm_campaign=dataprofessor&utm_content=description-only ⭕ Playlist: Check out our other videos in the following playlists. ✅ Data Science 101: https://bit.ly/dataprofessor-ds101 ✅ Data Science YouTuber Podcast: https://bit.ly/datascience-youtuber-podcast ✅ Data Science Virtual Internship: https://bit.ly/dataprofessor-internship ✅ Bioinformatics: http://bit.ly/dataprofessor-bioinformatics ✅ Data Science Toolbox: https://bit.ly/dataprofessor-datasciencetoolbox ✅ Streamlit (Web App in Python): https://bit.ly/dataprofessor-streamlit ✅ Shiny (Web App in R): https://bit.ly/dataprofessor-shiny ✅ Google Colab Tips and Tricks: https://bit.ly/dataprofessor-google-colab ✅ Pandas Tips and Tricks: https://bit.ly/dataprofessor-pandas ✅ Python Data Science Project: https://bit.ly/dataprofessor-python-ds ✅ R Data Science Project: https://bit.ly/dataprofessor-r-ds ✅ Weka (No Code Machine Learning): http://bit.ly/dp-weka ⭕ Recommended Books: ✅ Hands-On Machine Learning with Scikit-Learn : https://amzn.to/3hTKuTt ✅ Data Science from Scratch : ht
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Professor · Data Professor · 0 of 60

← Previous Next →
1 How a Biologist became a Data Scientist
How a Biologist became a Data Scientist
Data Professor
2 WEKA Tutorial #1.1 - How to Build a Data Mining Model from Scratch
WEKA Tutorial #1.1 - How to Build a Data Mining Model from Scratch
Data Professor
3 WEKA Tutorial #1.2 - How to Build a Data Mining Model from Scratch
WEKA Tutorial #1.2 - How to Build a Data Mining Model from Scratch
Data Professor
4 WEKA Tutorial #1.3 - How to Build a Data Mining Model from Scratch
WEKA Tutorial #1.3 - How to Build a Data Mining Model from Scratch
Data Professor
5 Computational Drug Discovery: Machine Learning for Making Sense of Big Data in Drug Discovery
Computational Drug Discovery: Machine Learning for Making Sense of Big Data in Drug Discovery
Data Professor
6 Quotes #1 on Big Data and Data Science
Quotes #1 on Big Data and Data Science
Data Professor
7 Quotes #2 on Big Data and Data Science
Quotes #2 on Big Data and Data Science
Data Professor
8 Quotes #3 on Big Data and Data Science
Quotes #3 on Big Data and Data Science
Data Professor
9 Quotes #4 on Big Data and Data Science
Quotes #4 on Big Data and Data Science
Data Professor
10 Quotes #5 on Big Data and Data Science
Quotes #5 on Big Data and Data Science
Data Professor
11 Data Science 101: Starting a Data Science / Data Mining Project
Data Science 101: Starting a Data Science / Data Mining Project
Data Professor
12 Data Science 101: CRISP-DM - Data Mining / Data Science in 6 Steps
Data Science 101: CRISP-DM - Data Mining / Data Science in 6 Steps
Data Professor
13 R Programming 101: How to Define Variables
R Programming 101: How to Define Variables
Data Professor
14 R Programming 101: Read and Write CSV files
R Programming 101: Read and Write CSV files
Data Professor
15 Data Science 101: Basic Command-Line for Data Science
Data Science 101: Basic Command-Line for Data Science
Data Professor
16 Strategies for Learning Data Science in 2020 (Data Science 101)
Strategies for Learning Data Science in 2020 (Data Science 101)
Data Professor
17 Building your Data Science Portfolio with GitHub (Data Science 101)
Building your Data Science Portfolio with GitHub (Data Science 101)
Data Professor
18 R Programming 101: Setting up R programming environment (R, RStudio and RStudio.cloud)
R Programming 101: Setting up R programming environment (R, RStudio and RStudio.cloud)
Data Professor
19 Exploratory Data Analysis in R: Towards Data Understanding
Exploratory Data Analysis in R: Towards Data Understanding
Data Professor
20 Exploratory Data Analysis in R: Quick Dive into Data Visualization
Exploratory Data Analysis in R: Quick Dive into Data Visualization
Data Professor
21 Machine Learning in R: Building a Classification Model
Machine Learning in R: Building a Classification Model
Data Professor
22 Machine Learning in R: Repurpose Machine Learning Code for New Data
Machine Learning in R: Repurpose Machine Learning Code for New Data
Data Professor
23 Data Science 101: Deploying your Machine Learning Model
Data Science 101: Deploying your Machine Learning Model
Data Professor
24 Machine Learning in R: Deploy Machine Learning Model using RDS
Machine Learning in R: Deploy Machine Learning Model using RDS
Data Professor
25 Data Pre-processing in R: Handling Missing Data
Data Pre-processing in R: Handling Missing Data
Data Professor
26 Machine Learning in R: Speed up Model Building with Parallel Computing
Machine Learning in R: Speed up Model Building with Parallel Computing
Data Professor
27 Data Science 101: Overview of Machine Learning Model Building Process
Data Science 101: Overview of Machine Learning Model Building Process
Data Professor
28 Web Apps in R: Building your First Web Application in R | Shiny Tutorial Ep 1
Web Apps in R: Building your First Web Application in R | Shiny Tutorial Ep 1
Data Professor
29 Web Apps in R: Build Interactive Histogram Web Application in R | Shiny Tutorial Ep 2
Web Apps in R: Build Interactive Histogram Web Application in R | Shiny Tutorial Ep 2
Data Professor
30 Web Apps in R: Building Data-Driven Web Application in R | Shiny Tutorial Ep 3
Web Apps in R: Building Data-Driven Web Application in R | Shiny Tutorial Ep 3
Data Professor
31 Web Apps in R: Building the Machine Learning Web Application in R | Shiny Tutorial Ep 4
Web Apps in R: Building the Machine Learning Web Application in R | Shiny Tutorial Ep 4
Data Professor
32 Web Apps in R: Build BMI Calculator web application in R for health monitoring | Shiny Tutorial Ep 5
Web Apps in R: Build BMI Calculator web application in R for health monitoring | Shiny Tutorial Ep 5
Data Professor
33 Machine Learning in R: Building a Linear Regression Model
Machine Learning in R: Building a Linear Regression Model
Data Professor
34 What programming language to learn for Data Science? R versus Python
What programming language to learn for Data Science? R versus Python
Data Professor
35 How to Become a Data Scientist (Learning Path and Skill Sets Needed)
How to Become a Data Scientist (Learning Path and Skill Sets Needed)
Data Professor
36 Using Python in R
Using Python in R
Data Professor
37 Interpretable Machine Learning Models
Interpretable Machine Learning Models
Data Professor
38 Making Scatter Plots in R [Data Visualisation in R series]
Making Scatter Plots in R [Data Visualisation in R series]
Data Professor
39 Machine Learning in Python: Building a Classification Model
Machine Learning in Python: Building a Classification Model
Data Professor
40 Compare Machine Learning Classifiers in Python
Compare Machine Learning Classifiers in Python
Data Professor
41 Hyperparameter Tuning of Machine Learning Model in Python
Hyperparameter Tuning of Machine Learning Model in Python
Data Professor
42 Practical Introduction to Google Colab for Data Science
Practical Introduction to Google Colab for Data Science
Data Professor
43 File Handling in Google Colab for Data Science
File Handling in Google Colab for Data Science
Data Professor
44 Pandas for Data Science: Create and Combine DataFrames / Rename Columns
Pandas for Data Science: Create and Combine DataFrames / Rename Columns
Data Professor
45 Machine Learning in Python: Building a Linear Regression Model
Machine Learning in Python: Building a Linear Regression Model
Data Professor
46 Machine Learning in Python: Principal Component Analysis (PCA) for Handling High-Dimensional Data
Machine Learning in Python: Principal Component Analysis (PCA) for Handling High-Dimensional Data
Data Professor
47 How to Plot an ROC Curve in Python | Machine Learning in Python
How to Plot an ROC Curve in Python | Machine Learning in Python
Data Professor
48 Installing conda on Google Colab for Data Science
Installing conda on Google Colab for Data Science
Data Professor
49 Use native R on Google Colab for Data Science
Use native R on Google Colab for Data Science
Data Professor
50 How to Save and Download files from Google Colab
How to Save and Download files from Google Colab
Data Professor
51 Easy Web Scraping in Python using Pandas for Data Science
Easy Web Scraping in Python using Pandas for Data Science
Data Professor
52 Data Science for Computational Drug Discovery using Python (Part 1)
Data Science for Computational Drug Discovery using Python (Part 1)
Data Professor
53 Pandas Profiling for Data Science (Quick and Easy Exploratory Data Analysis)
Pandas Profiling for Data Science (Quick and Easy Exploratory Data Analysis)
Data Professor
54 Exploratory Data Analysis in Python using pandas
Exploratory Data Analysis in Python using pandas
Data Professor
55 Quick tour of PyCaret (a low-code machine learning library in Python)
Quick tour of PyCaret (a low-code machine learning library in Python)
Data Professor
56 How to Upload Files to Google Colab
How to Upload Files to Google Colab
Data Professor
57 How to Install and Use Pandas Profiling on Google Colab
How to Install and Use Pandas Profiling on Google Colab
Data Professor
58 How to Adjust the Style of Pandas DataFrame
How to Adjust the Style of Pandas DataFrame
Data Professor
59 How to use Bamboolib for Data Wrangling in Data Science
How to use Bamboolib for Data Wrangling in Data Science
Data Professor
60 How to use Pandas Profiling on Kaggle
How to use Pandas Profiling on Kaggle
Data Professor

The video teaches how to use MITO to analyze tabular data in Python, covering various features such as importing data, merging datasets, pivoting data, and more. It provides a comprehensive overview of how to use MITO for data analysis and generates Python code for the same.

Key Takeaways
  1. Import data into MITO
  2. Merge datasets
  3. Pivot data
  4. Sort and filter data
  5. Perform simple graphs
  6. Compute statistics
  7. Save and replay analyses
💡 MITO provides a point-and-click feature to generate Python code for data analysis, making it easier to work with spreadsheets in Jupyter notebooks.

Related Reads

📰
Evolving Algorithms: Next-Generation AI in Predictive Analytics
Learn how next-generation AI is transforming predictive analytics with evolving algorithms and why it matters for informed decision-making
Dev.to · Fu'ad Husnan
📰
Architecting for the Future: A Blueprint for Model-Agnostic, Business-Ready AI
Learn to architect model-agnostic, business-ready AI systems with a standardized infrastructure
Medium · AI
📰
The Recommender System Pipeline: An End-to-End Overview
Learn the end-to-end pipeline of recommender systems and how they filter information for users
Medium · AI
📰
The Recommender System Pipeline: An End-to-End Overview
Learn how to build a recommender system pipeline from data collection to model deployment and understand its key components
Medium · Machine Learning

Chapters (8)

Introduction
0:51 Importing Data
1:27 Merging Datasets
2:15 Pivoting Data
2:49 Sorting Data
3:12 Filtering Data
3:45 Graphing and Statistics
4:02 Functions
Up next
1. Overview of Artificial Intelligence | What is AI? Fundamental Concepts & Complete History of AI
Professor Rahul Jain
Watch →