How to effortlessly work with spreadsheets in Python using MITO
Key Takeaways
The video demonstrates how to use MITO, a tool that generates Python code for working with spreadsheets in Jupyter notebooks, to effortlessly analyze tabular data in Python. It covers importing data, merging datasets, pivoting data, sorting and filtering data, performing simple graphs, and computing statistics via a point-and-click feature.
Full Transcript
in this video we're going to have jake over from the middle team to give us an updated tutorial on how to use mido from the beginning and this includes how to import the data merge data set pivot data sorting the data filtering the data performing simple graphs as well as computing statistics and also perform various functions via a point and click feature of my tool and so without further ado we're starting right now hey this is jake from mido i'm going to show you how you can quickly generate python and analyze your data using uh the mighty extension to jupyter so mito is a tool you can download into your jupyter environment and yeah let's get started so first thing we're going to do is get data into the tool so we can do that multiple ways one thing we can do is we can insert a data frame as an argument if you're already working with some data above we can also delete this here we can also pass in files directly from our file system so i'm going to add these two csvs amtrak 2010 and zip code 2010 and we'll see our data gets put into the tool as data frames and below we will generate this code so everything we do in the mido sheet will generate the equivalent python so this code here is turning these csvs into dataframes so we can start working with them and generating scripts based on them now that we have the data in we can see these two files share this zip code column so we can actually join them on the zip code and we can do this with in the tool without actually having to type any code so we'll click our merge button here it'll automatically detect that this zip code column is shared between the two files and then we can join just by leaving these as the settings and we can also decide what columns we want to keep we'll decide to keep all of them for now so i'll just close this and now you see we've created this new data frame which is the joined data set on this zip code column and if i scroll down below i'll see we generated the code the equivalent pandas code for that join so you can see how we can generate you know what is decently complex syntax really quickly with the tool um you know beyond that if we want to now investigate this joint data set let's say we wanted to look at how many zip codes for each state are present a really great way to do this would be to use a pivot table so i'm going to click our pivot table feature as the row we will put the state and there's the value we'll put the zip code and we can use all these different aggregation methods we're going to use a count here and now we get a pivot table of the amount of zip codes per each state that are present in this data set and below you'll see we get the equivalent code for that pivot table here we can also order this you know we can sort it in ascending or descending order to maybe get a better look at the data so i'll click this column filter button here we can add filters which i'll do in a second but first i'm just going to show we can put it in ascending or descending order to sort the data in the way that we want when we do that we get the equivalent code for that sort but we can also add filters as well so i'll add a simple filter here let's say we just want to look at the values that are greater than 60. only one present it puts out a little more greater than 40 let's say a little more generous we can see the data set adjusts in real time as we change the number that we're filtering on and below we generate the code for that filter which is really valuable beyond that we can also investigate the data i'm going to take this filter off really quickly we can investigate the data a bit more visually so we provide a distribution graph or a frequency chart for any column as well as summary statistics for the column so a really good way to get some high level information on the data that you're dealing with so i'll close this we can also interact with the data in somewhat of a spreadsheet format using spreadsheet function so we can add a column to any data set and then do operations using normal spreadsheet functionality like printf functions like cat for example you click on this documentation button oops you will get a list of all the spreadsheet functions that we support you know that it varies from numerical things to a lot of data cleaning properties left right trim mid functions like that and we'd also do conditional things like ifs ands and ors really valuable for manipulating your data and lunging it and cleaning it and getting it in the format that's best for you and i'll show you a quick one example of how that works let's say you want to concatenate two pieces of data together so this will concatenate the columns here's our concatenated data and below we get the equivalent code and then the last thing i just want to point out i won't get into it too deeply but because we're generating a script here what we can do is we can save these analyses and then replay them on other data sets all we do for that is hit the save button and what it does is it saves the logic here it saves the steps that we've taken and then we can replay those steps to a similarly structured data set so you can imagine if you have a process where you're analyzing a similar set of data um you know maybe it's a data that's being updated maybe once a day once a month twice a day once a week whatever it is um you can sort of automate that process within the tool it's sort of like a macro within the python environment but definitely more functionality we could go into but i'll sort of leave here with a high level look into the tool um you know essentially it's a great way to generate code really quickly and in a lot of examples code that could be a little bit hard to do by hand it might take some time to go look up the exact right syntax we essentially give you that in this point-and-click format and then also a great way to investigate your data a bit more visually so yeah i hope you uh sign up to use the tool and would love to meet you at some point thanks so i hope that you found this video helpful and if you're finding value in this video please give it a thumbs up subscribe if you haven't already hit on the notification bell in order to be notified of the next video and as always the best way to learn data science is to do data science and please enjoy the journey
Original Description
In this video, we have Jake from the MITO team to show us the new features of MITO for working with spreadsheets in a Jupyter notebook? Mito generates Python code that corresponds to the edits made in the spreadsheet (via the point and click). Now you can effortlessly analyze tabular data in Python with a few clicks of the mouse.
⭕ Links for this video:
1. Sign up for Access: https://hubs.ly/H0HQjCn0
2. The Mito website: https://hubs.ly/H0HQjDw0
⭕ Time stamps:
0:00 Introduction
0:51 Importing Data
1:27 Merging Datasets
2:15 Pivoting Data
2:49 Sorting Data
3:12 Filtering Data
3:45 Graphing and Statistics
4:02 Functions
🌟 Subscribe to this YouTube channel https://www.youtube.com/dataprofessor?sub_confirmation=1
🌟 Join the Newsletter of Data Professor http://sendfox.com/dataprofessor
🌟 Buy me a coffee https://www.buymeacoffee.com/dataprofessor
🌟 Download Kite for FREE https://www.kite.com/get-kite/?utm_medium=referral&utm_source=youtube&utm_campaign=dataprofessor&utm_content=description-only
⭕ Playlist:
Check out our other videos in the following playlists.
✅ Data Science 101: https://bit.ly/dataprofessor-ds101
✅ Data Science YouTuber Podcast: https://bit.ly/datascience-youtuber-podcast
✅ Data Science Virtual Internship: https://bit.ly/dataprofessor-internship
✅ Bioinformatics: http://bit.ly/dataprofessor-bioinformatics
✅ Data Science Toolbox: https://bit.ly/dataprofessor-datasciencetoolbox
✅ Streamlit (Web App in Python): https://bit.ly/dataprofessor-streamlit
✅ Shiny (Web App in R): https://bit.ly/dataprofessor-shiny
✅ Google Colab Tips and Tricks: https://bit.ly/dataprofessor-google-colab
✅ Pandas Tips and Tricks: https://bit.ly/dataprofessor-pandas
✅ Python Data Science Project: https://bit.ly/dataprofessor-python-ds
✅ R Data Science Project: https://bit.ly/dataprofessor-r-ds
✅ Weka (No Code Machine Learning): http://bit.ly/dp-weka
⭕ Recommended Books:
✅ Hands-On Machine Learning with Scikit-Learn : https://amzn.to/3hTKuTt
✅ Data Science from Scratch : ht
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Data Professor · Data Professor · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
How a Biologist became a Data Scientist
Data Professor
WEKA Tutorial #1.1 - How to Build a Data Mining Model from Scratch
Data Professor
WEKA Tutorial #1.2 - How to Build a Data Mining Model from Scratch
Data Professor
WEKA Tutorial #1.3 - How to Build a Data Mining Model from Scratch
Data Professor
Computational Drug Discovery: Machine Learning for Making Sense of Big Data in Drug Discovery
Data Professor
Quotes #1 on Big Data and Data Science
Data Professor
Quotes #2 on Big Data and Data Science
Data Professor
Quotes #3 on Big Data and Data Science
Data Professor
Quotes #4 on Big Data and Data Science
Data Professor
Quotes #5 on Big Data and Data Science
Data Professor
Data Science 101: Starting a Data Science / Data Mining Project
Data Professor
Data Science 101: CRISP-DM - Data Mining / Data Science in 6 Steps
Data Professor
R Programming 101: How to Define Variables
Data Professor
R Programming 101: Read and Write CSV files
Data Professor
Data Science 101: Basic Command-Line for Data Science
Data Professor
Strategies for Learning Data Science in 2020 (Data Science 101)
Data Professor
Building your Data Science Portfolio with GitHub (Data Science 101)
Data Professor
R Programming 101: Setting up R programming environment (R, RStudio and RStudio.cloud)
Data Professor
Exploratory Data Analysis in R: Towards Data Understanding
Data Professor
Exploratory Data Analysis in R: Quick Dive into Data Visualization
Data Professor
Machine Learning in R: Building a Classification Model
Data Professor
Machine Learning in R: Repurpose Machine Learning Code for New Data
Data Professor
Data Science 101: Deploying your Machine Learning Model
Data Professor
Machine Learning in R: Deploy Machine Learning Model using RDS
Data Professor
Data Pre-processing in R: Handling Missing Data
Data Professor
Machine Learning in R: Speed up Model Building with Parallel Computing
Data Professor
Data Science 101: Overview of Machine Learning Model Building Process
Data Professor
Web Apps in R: Building your First Web Application in R | Shiny Tutorial Ep 1
Data Professor
Web Apps in R: Build Interactive Histogram Web Application in R | Shiny Tutorial Ep 2
Data Professor
Web Apps in R: Building Data-Driven Web Application in R | Shiny Tutorial Ep 3
Data Professor
Web Apps in R: Building the Machine Learning Web Application in R | Shiny Tutorial Ep 4
Data Professor
Web Apps in R: Build BMI Calculator web application in R for health monitoring | Shiny Tutorial Ep 5
Data Professor
Machine Learning in R: Building a Linear Regression Model
Data Professor
What programming language to learn for Data Science? R versus Python
Data Professor
How to Become a Data Scientist (Learning Path and Skill Sets Needed)
Data Professor
Using Python in R
Data Professor
Interpretable Machine Learning Models
Data Professor
Making Scatter Plots in R [Data Visualisation in R series]
Data Professor
Machine Learning in Python: Building a Classification Model
Data Professor
Compare Machine Learning Classifiers in Python
Data Professor
Hyperparameter Tuning of Machine Learning Model in Python
Data Professor
Practical Introduction to Google Colab for Data Science
Data Professor
File Handling in Google Colab for Data Science
Data Professor
Pandas for Data Science: Create and Combine DataFrames / Rename Columns
Data Professor
Machine Learning in Python: Building a Linear Regression Model
Data Professor
Machine Learning in Python: Principal Component Analysis (PCA) for Handling High-Dimensional Data
Data Professor
How to Plot an ROC Curve in Python | Machine Learning in Python
Data Professor
Installing conda on Google Colab for Data Science
Data Professor
Use native R on Google Colab for Data Science
Data Professor
How to Save and Download files from Google Colab
Data Professor
Easy Web Scraping in Python using Pandas for Data Science
Data Professor
Data Science for Computational Drug Discovery using Python (Part 1)
Data Professor
Pandas Profiling for Data Science (Quick and Easy Exploratory Data Analysis)
Data Professor
Exploratory Data Analysis in Python using pandas
Data Professor
Quick tour of PyCaret (a low-code machine learning library in Python)
Data Professor
How to Upload Files to Google Colab
Data Professor
How to Install and Use Pandas Profiling on Google Colab
Data Professor
How to Adjust the Style of Pandas DataFrame
Data Professor
How to use Bamboolib for Data Wrangling in Data Science
Data Professor
How to use Pandas Profiling on Kaggle
Data Professor
More on: AI Productivity Tools
View skill →Related Reads
📰
📰
📰
📰
Evolving Algorithms: Next-Generation AI in Predictive Analytics
Dev.to · Fu'ad Husnan
Architecting for the Future: A Blueprint for Model-Agnostic, Business-Ready AI
Medium · AI
The Recommender System Pipeline: An End-to-End Overview
Medium · AI
The Recommender System Pipeline: An End-to-End Overview
Medium · Machine Learning
Chapters (8)
Introduction
0:51
Importing Data
1:27
Merging Datasets
2:15
Pivoting Data
2:49
Sorting Data
3:12
Filtering Data
3:45
Graphing and Statistics
4:02
Functions
🎓
Tutor Explanation
DeepCamp AI