Visualizing Correlation Table - Data Analysis with Python and Pandas p.4
Key Takeaways
The video demonstrates data analysis with Python and Pandas, specifically visualizing a correlation table using Matplotlib, and covers various tools and techniques such as web scraping, data manipulation, and data visualization. It utilizes libraries like Pandas, NumPy, and Matplotlib, and also touches on machine learning concepts.
Full Transcript
what's going on everybody and welcome to part four of the data science and data analysis with python and Panda's tutorial Series in this video we're going to be continuing off of the last video where we got our basically our correlation table and what we want to do in this video is focus on um visualizing that correlation table and some of the other things that are going to come along with doing that so uh to begin uh let's just get basically where we were so import pandas as PD import numpy as NP and at least to start pd. reads CSV and we can do data sets Min wage. CSV and then we want to do like that act Min wage and that loop I think it would be this Loop here right I just want I don't want to uh yeah that was all just confir of what was the problem so really just this right here copy that so this is from the previous tutorial um and then we want to set the Min wage core here uh so this be yeah this becomes our Min wage core uh and it's not do head that's like a typo really and then Min wage core. head Okay cool so this is our correlation table now we want to graph it so uh if you haven't already open up terminal command prompt pip install map plot lib but if you've been following the series you've already done that this will be a hard series to jump in randomly so I'm going to assume everybody's got uh the things we've installed so far so first of all uh with M plot lib uh we just go we're just going to do import matte plot matte plot bread li. pyplot as PLT and then we can do PLT matow and we're just going to Matt show that data frame so Min wage core um and that should be enough uh PLT we probably don't even have to call pt. show in here yeah okay so okay that got us pretty far pretty quick pretty simple unfortunately as you'll find with is the case with mat plot lib all the time uh you really are going to likely need to do a lot of customization so if you want to we can come into here oops I don't even know how to navigate my own site apparently there we go data viz uh come in here and you can learn all the stuff about customizing matplot lib if you want uh I'm not going to spend too much time on that but we are going to do at least some customization here just to make this work because I feel like one the colors are wrong to uh the labels being numbers means nothing to us it's not helpful it's not a helpful Vis visualization so how can we make this a little better well right out of the gate I would think okay hey here's what we could do uh we could say labels equals and then just do list comprehension so C4 C in Min wage core I see the issue there anyway columns and then fix this and then let's just do like the first two letters so that would be like the first it's going to be a long day okay so so this will be like you know if it's Alabama a Michigan MI I and so on so we could do something really basic like that and then let's fix some of the other things uh so fig equals PLT do figure and then fig size will be we'll just make this a 12x 12 so if you want to start customizing mat plot lib you have to kind of even pull it out of even further so first of all we've pulled we stopped you well in this case we actually didn't do like dot values or anything we just passed the entire data frame and it works so that's great uh but then if you want to start modifying things you can't modify PLT you have to modify an axis well to have an axes you have to have a subplot to have a subplot you need a figure so so we have our figure now we want to do our axes so we're going to say axes equals fig. add subplot and in here I'm going to pass 111 what this means is this is a 1 by one um let's say the figure all the subplots on the figure are in a one by one grid and this is number one um so this just means there's going to be one graph now uh what we want to do is ax. Matt show and we want to Matt show Min wage core but then we also want to change the cmap this that's a color map and the cmap we want to use here is pt. CM CM uh do uh red yellow green and now we have that uh we we could just go ahead and just do this real quick and boom um now it's a heat map looks more like what we would expect and um but our ticks aren't really labeled yet so that's the thing we want to do so the next thing we would want to say is ax. Set uncore uh setor y tick labels and we want to pass labels and then we'll do the exact same thing here and we we probably want to do that before the show can I get away with that oh I uh do X and Y Okay Okay cool so we have the labels here but we don't have all of them so Matt plot lib is kind of truncating them because it doesn't want to put too many labels on an axis make it hard to read if these were numbers then this would be probably descriptive enough for us but they're not numbers and it's not so what we need to do is tell matplot lib hey show me all of them so the next thing we were we can do here is we can actually just tell matot lib show all of the labels and the way that we can do that is ax set underscore and then X tick and then we'll use numpy do range and that will be for the Len of labels labels and we'll do the exact same thing for y XY boom um nope why didn't that work set x t why didn't that work [Music] um why isn't that working hold on everybody uh why tick labels label match show so maybe we will uh we'll set the match show first and then do the modifications yeah so so I'm guessing that even after we set this then we do did this and it like reset it for whatever reason so first do the match show then modify the ticks and I bet we can't do a show now right like that'll still be messed up or would it and what's up with the text fascinating okay whatever we'll do it this way um okay great so that's looking pretty awesome um now what well if we look it's probably really hard to see here but like what if I just print labels out um we can see we actually have quite a few overlapping that are like the same name like Mi and Mi so Michigan Minnesota and Ne Ne any we got like four or five NES here um that's kind of a problem so maybe we want to actually make these right so we could hardcode them ourselves cuz we might actually know them or we could bring in an outside data set so a lot of times in this case we only have like 39 states that actually had minimum wage data so we could totally fill that in by hand but later you might find you have a data set that's much larger and then you've got like some sort of values that you actually want to map to that data set and it would just be very cumbersome to do by hand so how could we fix this well first we need to find we need some way of mapping the state name to um two you know like a two-letter thing so the first thing I would do is just go online and look for one right so um state abbreviations so I would start by here and then basically I think I started yeah I found ended up finding this one I think I started here and then I eventually led to this one so this was the first one I found um but we can't use pandas to pull from here CU this website uh actively declines robots so then I found uh infop please.com and this one does not yes we can overcome uh anything that you know tries to decline access to a robot um that's not a problem um I the problem I have is making a tutorial on doing that to a specific website I think you could run into certain issues if you do that so I'm going to avoid doing that so we use this site instead because this site does not block bots so cool so that's the one that we want to use now um so let's go ahead and read in that data so the first thing uh that we would do is we would just uh you might have this in a separate script so I'm just going to write everything out again import pandas as PD especially because this could cause we might have issues with this in a little bit um and then what we want to do is DF so DFS equals pd. read HTML and don't run this yet uh and then that website that we just found it's like infoplease.com I'll put either the link in the description or the textbase tutorial in the description uh and you can find the link there or you could just Google exactly how I did and find that one super simple now to use read. HTML we need a few packages we need lxml HTML 5 lib and beautiful soup 4 so come to a command prompt pip install HTML 5 lib actually let me just do it in order lxml HTML 5 lib bs4 so go ahead and install those uh if you are on Mac you will need to run an extra command go to the text based version of the tutorial uh to figure that out if you're on like some sort of company computer with some sort of proxy system uh what you're going to want to do is also pip install uh requests you might as well have requests as well it's a much smarter way of accessing the internet um so go ahead and grab that too even if you're not on a proxy so normal people should be able to just run this and get a return but when I run this I get this nasty red nonsense about an SSL certificate fail so to overcome that um instead I'm going to import import requests and then do a actually I'll just say this I'll say web equals request.get and I'm going to get this paste that in there and then we'll read HTML web run that oh could not read requests.get really um I can't decide if I need to like restart the kernel because I just installed this stuff or or what pd. read HTM oh who whoa whoa who whoa whoa okay idiot okay so actually what we need to say is web. uh. text what I need to say is web. text right okay great so um but again if you aren't having issues you probably don't need to do this if you continue to have SSL SE issues you could also say verify equals false in the request. get for some reason I was getting a certificate issue uh with read HTML but not requests despite requests verifying the SSL certificate by default I have no idea what's going on there but anyway cool so we have data frames so what Panda read HTML does is it will parse that website and um and then return a list of data frames based on all of the tables it find so even if it only finds one table it's still a list of that one data frame so for example for DF ndfs printf. head and you can see here that we get uh two data frames one is clearly states the other one is like territories uh which we probably have in our um minimum wage data probably Puerto Rico and Guam but for now we're going to be focused on uh the main one which which will be DF uh DFS zero so uh what I'm going to say is State um AB for abbreviation equals DFS Z then uh we'll say State AB oops doad okay great so that's exactly what we want we want State and then we actually don't want the abbreviation we want the postal code right we want the two-letter thing so what want to do now is just in case we have issues with this like a lot of times if I if I'm writing a script that gets information from the web I really probably just want to run that one time just in case that website decides I've made too many requests or you know whatever so while we're at it we might as well just save this so State ab. 2 CSV and then we'll save that as data sets and we'll make that uh State ab. CSV now couple things to think about when you save things to a CSV remember a CSV so on pandas pandas believes as it should your index is Meaningful so when you save something to a CSV file um CSV does not understand anything's in index so when you save um you want to say I I'm trying to I can't even fully remember it's like either index I think index equals false on save so pandas to CS V and then when you read it in I want to say it's just index call so yeah index default is true so when we go to save this it's going to assume it wants us to save the index so let's say we do that then we read it back in State ab. uh or equals pd. read CSV and actually let's just read this in when we go to read this in suddenly um stay. head suddenly what we have is two of these columns and and as time is going to go on it's only going to keep getting bigger and bigger and bigger so if you are consistently saving and reloading um some sort of data set this will cause trouble so instead what you want to do is is in one or the other you know one way you could say is you could say index call equals z and every time we read this in 01 2 3 4 will always be your index call alternatively we could even say here uh index equals uh false save that now the index data won't even be saved so we could do this and it looks yet again the same or we could even go even further and say index call equals z and boom now we just have the state abbreviation and postal code so there's you know A Million Ways uh that we can do this thing that we're trying to do so uh the next thing I want us to do is go ahead and uh convert this to some sort of meaningful dictionary so the way I'm going to say that is AB dict equals State AB um and then what we want to say is the only column we're actually interested in here is postal code again don't forget double square brackets otherwise it's going to treat it like a series and then we say dot to dict uh underscore there to dict and now uh let's just do ab dict real quick boom so now um we have pretty much what we want but it starts with postal code so instead I'm going to say abct equals Abdi uh postal code now let's print AB dict okay now that is a dictionary that we can easily map uh to a column so now let's just try um and yeah we'll just say this we'll say actually we don't even have to map it to a column we could just say labels equals equals and then it'll be abct C for C inmin uh wage core. columns uh try that out okay we don't have Federal FS or whatever so what we need to do is just set a new value in our AB dict and we'll just add this in really quick we'll just say that I always do that um and why did you just do that to me uh and we're going to say that equals flsa that's fine um okay and then we'll run this again okay yeah right now we're missing Guam and we'll be missing port Rico so let's just fix both of those as well while we're at it Guam in Puerto Rico p and that should be a GU and that should hopefully be the only hard coding we have to actually do cool okay now we have labels so then we can come up here take this code here copy that code come on down probably should have used page down but that's all right uh paste in our new calculation for labels and show our beautiful new graph now we don't have the overlaid everything has the proper name that looks pretty good we did it okay so I think that's an okay stopping point and we've actually covered quite a bit with like the read HTML and fixing a bunch of stuff yeah so pretty cool so I think in the next tutorial what I'd like to do is we have minimum wage data we can begin to like compare data sets and like look at data between multiple data sets and try to derive meaning in some some way uh we're probably not going to find out anything too spectacular uh but it'll give us a good opportunity to just kind of uh bring in more outside and different data sets that actually aren't even from the same because it's like sometimes you're going to find you've got one data set from over here and it's like from one totally different provider you've got another one and then sometimes they're not even organized by either maybe not the same index or a different type of index or like a different granularity and so on so anyways we're going to get a little Messier in the next video um maybe contain that all in one and then hopefully right after that we'll get into doing some machine learning just real basic example of machine learning uh with pandas so anyways that's it for now uh questions comments concerns whatever feel free to leave them below as always thanks for watching thanks for the support the subscriptions the donations the memberships all the good stuff and I will see you guys in another video
Original Description
Visualizing the correlation table with matshow in Matplotlib, among other things!
Text-based tutorial: https://pythonprogramming.net/correlation-table-python3-pandas-data-analysis/
Channel membership: https://www.youtube.com/channel/UCfzlCWGWYyIQ0aLC5w48gBQ/join
Discord: https://discord.gg/sentdex
Support the content: https://pythonprogramming.net/support-donate/
Twitter: https://twitter.com/sentdex
Facebook: https://www.facebook.com/pythonprogramming.net/
Twitch: https://www.twitch.tv/sentdex
G+: https://plus.google.com/+sentdex
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from sentdex · sentdex · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Matplotlib Python Tutorial Part 1: Basics and your first Graph!
sentdex
Python Encryption Tutorial with PyCrypto
sentdex
Python's Logging Function
sentdex
wxPython Tutorials 1: Making Windows GUIs with Python : Installing + 1st window!
sentdex
wxPython Tutorials 2: Making Windows GUIs with Python: Customizing Window Parameters
sentdex
wxPython Programming Tutorial 3: Menu Bar and Menu Button
sentdex
wxPython Programming Tutorial 4: Panels
sentdex
wxPython Programming Tutorial 5: User Input Saved To Variables
sentdex
wxPython Programming Tutorial 6: Multiple Choice Input
sentdex
wxPython Programming Tutorial 7: Adding Static Text and Colors
sentdex
wxPython Programming Tutorial 8: Custom Button Images
sentdex
wxPython Programming Tutorial 9: Tool Bar Items and Sub Menus!
sentdex
Basic PHP Tutorial 13: Multi-dimensional Array
sentdex
Basic PHP Tutorial 15: Functions and Global Variables
sentdex
Basic PHP Tutorial 12: Associative Array
sentdex
Basic PHP Tutorial 14: Foreach loop
sentdex
Basic PHP Tutorial 16: Include and Require
sentdex
Basic PHP Tutorial 7: Assignment, comparison and Logical operators
sentdex
Basic PHP Tutorial 4: Variables and Comments
sentdex
Basic PHP Tutorial 11: Arrays part 1, basic array
sentdex
Basic PHP Tutorial 6: If else and else if conditionals cont'd
sentdex
Basic PHP Tutorial 1: Intro to PHP
sentdex
Basic PHP Tutorial 3: HTML with PHP
sentdex
Basic PHP Tutorial 9: While Loop
sentdex
Basic PHP Tutorial 10: Switch Statement
sentdex
Basic PHP Tutorial 2: Print and Echo
sentdex
Basic PHP Tutorial 5: If else and else if conditional statements
sentdex
Basic PHP Tutorial 8: Arithmatic Operators: Doing math with php
sentdex
Basic PHP Tutorial 17: User Input Form Example / String Manipulation
sentdex
Basic PHP Tutorial 18: HTML Entities and forms cont'd
sentdex
Basic PHP Tutorial 19: Finding words in strings
sentdex
Basic PHP Programming Tutorial 20: Saving to a File / writing and appending
sentdex
Basic PHP Programming Tutorial 22: Hashing part 2: salting
sentdex
Basic PHP Programming Tutorial 23: Variables in Strings and tokenizing
sentdex
Basic PHP Programming Tutorial 21: MD5 Hashing For Security
sentdex
Basic PHP Programming Tutorial 24: String similarity
sentdex
Basic PHP Programming Tutorial 25: Time and Time stamps
sentdex
Basic PHP Programming Tutorial 26: Die and Exit
sentdex
Basic PHP Programming Tutorial 27: MySQL Databases Part 1
sentdex
Basic PHP Programming Tutorial 28: MySQL Database Part 2: Reading From Database
sentdex
Basic PHP Programming Tutorial 29: MySQL Database Part 3: Inputting Data
sentdex
Basic PHP Programming Tutorial 30: MySQL database in Use
sentdex
Django Tutorial Web Development with Python Part 1: Installing Django
sentdex
Python Tutorial: File Deletion and Folder Deletion / directory deletion
sentdex
Python Tutorial: How to Rename Files and Move Files with Python
sentdex
3D Graphs in Matplotlib for Python: Basic 3D Line
sentdex
3D Plotting in Matplotlib for Python: 3D Scatter Plot
sentdex
3D Charts in Matplotlib for Python: Multiple datasets scatter plot
sentdex
Sikuli Tutorial 1: Visually programming in python!
sentdex
Sikuli Tutorial 2: Program visually in python!
sentdex
Sikuli Tutorial 3: Program visually in python!
sentdex
3D Bar Charts in Python and Matplotlib
sentdex
3D Plane wire frame Graph Chart in Python
sentdex
Raspberry Pi Part 1 Introduction
sentdex
Raspberry Pi Part 8: First Download and Update! (Firmware)
sentdex
Raspberry Pi Part 10: How to set up a Linux Web Server on your Pi
sentdex
Raspberry Pi Part 11: Remote Desktop
sentdex
Twitter Analysis: How to rank a user's influence
sentdex
GPIO Tutorial for Pi Part 2 - Programming the GPIO
sentdex
GPIO Tutorial for Raspberry Pi Part 1 - Setting up
sentdex
More on: ML Pipelines
View skill →
🎓
Tutor Explanation
DeepCamp AI