Python Data Science Tutorial #16 - Pandas Merging Data Frames

NeuralNine · Beginner ·🛠️ AI Tools & Apps ·6y ago
Skills: ML Pipelines80%

Key Takeaways

The video teaches how to merge data frames like SQL tables using Pandas in Python, a crucial skill for data science tasks.

Full Transcript

what is going on guys from welcome to despite in the Tory series for data science in today's video we're going to learn how to merge data frames together how to join them together like SQL tables you can join SQL tables together how do we do that in pandas and what different types of joints or merges do we have so let us get into the code so as always we start by importing pandas as PD and now what we're going to do is we're going to create two separate data frames that have the same column social security number but we're going to create one data frame with the names and one data frame with the ages of a person so we're going to say names equals now we're going to create a dictionary and here we have the social security number and we're going to choose I don't know two five seven and eight and then we have the name of the person and the name is Ana bop John and Mike so these are the names and now I'm going to create the ages with the same social security number call Youm but different values and this is the key point here because we're going to join them together to us to to a new data frame that contains names and ages but we have to have the same index column or one column that's the same so that we can join them together on this column and what I'm going to do here is I'm not going to have the exact same social security numbers I'm going to have different social security numbers so some are going to overlap some are not going to overlap I'm going to start with one two which overlaps then maybe three and then five so these are the social security numbers here and I'm now going to say H and now let's define some age of 28 34 45 62 these are now or two dictionaries we're just going to convert them into data frames real quick so DF 1 equals P D dot data frame names in DF 2 equals PD dot data frame ages and now what we're going to do is we're going to create a new data frame DF that contains both these values merge together in one data frame and to do that of course we just say DF equals P dot merge and that's the function that we use to merge two data frames into one data frame to join them together now of course we have to specify the two data frames here so we say DF 1 and DF 2 but besides that we also need to specify on which column we're going to merge them and also how we're going to merge them and the column is obvious because we have the social security number as the index column as the column that both data frames have so we're going to say on equals as this n and now it gets tricky or not tricky but now we can choose a lot of different or actually four different ways to merge these data frames together so we have a left join an inner join and outer join and a right join so basically what we're saying is some of the security numbers social security numbers that we have here we don't have here and also the other way around so which one are we going to neglect out which one are we going to neglect how many of them are we going to throw out are we going to take all of them are we only going to take those values that are contained in both dictionaries or data frames which ones are we going to look at and now if I say how equals outer for example also known as the full join outer join full join what happens here is I basically say take all of them take one two three five seven and eight and just display all the values so if I just go ahead and say of course I would have to do set index here so set index as this N in place equals true so that'll be happen index because it's not automatically the case and now we're going to print a data frame what you're going to see is that we have all the values of course they're not sorted but we have all the individual values and they're where we have in the cases that we have both Social Security numbers and both data frames are what happens is that the values get linked together so anna has the social security number too and also the age 34 so it's one row now if we have John for example that has not which has the or who has the Social Security number seven he does not occur in the ages so we just say nan for not a number and an outer join basically does exactly that we take all the values in the where values are missing we just fill up with Nan's with nada numbers also for the ages of course we have the age of 28 which is the social security number one but we don't have a name for that so this would be an outer join the opposite would be an inner join an inner join would only give us the columns or duros actually where we have all the information so two and five basically because two and five for Social Security numbers that occur in the first dictionary and also in the second dictionary so an inner join only gives us the rows that overlap now another way to do that would be to say left or right join so left join would take all from the first column or actually the first dictionary sorry and then add up or fill it up with the right column and then fill the empty values with nan so what we're doing here is we take all the names so all these forces Social Security numbers it doesn't matter if they occur here we take all of these and then if they occur here we fill them up with the values and otherwise we just set them to nen the opposite would be the right joint just taking all of these here and filling up with these values so here we would have all the ages but not all the names as you can see one and three have no names so that's basically the right join and that's how you merge data frames and pandas so that's it for today's video I hope you learned something I hope you enjoyed it if so hit the like button to support this channel and see future videos for free also feel free to ask questions and give feedback in the comment section down below and of course subscribe to this channel if you want to see more in the future so thank you very much for watching see you in the next video and bye [Music]

Original Description

In today's episode we learn how to merge data frames like SQL tables. Website: https://www.neuralnine.com/ Instagram: https://www.instagram.com/neuralnine Twitter: https://twitter.com/neuralnine GitHub: https://github.com/NeuralNine Programming Books: https://www.neuralnine.com/books/ Outro Music From: https://www.bensound.com/ Subscribe and Like for more free content!
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from NeuralNine · NeuralNine · 38 of 60

1 Visualizing Stock Data With Candlestick Charts in Python
Visualizing Stock Data With Candlestick Charts in Python
NeuralNine
2 Python Beginner Tutorial #1 - Installation and First Program
Python Beginner Tutorial #1 - Installation and First Program
NeuralNine
3 Python Beginner Tutorial #2 - Variables and Data Types
Python Beginner Tutorial #2 - Variables and Data Types
NeuralNine
4 Python Beginner Tutorial #3 - Operators and User Input
Python Beginner Tutorial #3 - Operators and User Input
NeuralNine
5 Python Beginner Tutorial #4 - If Statements and Conditions
Python Beginner Tutorial #4 - If Statements and Conditions
NeuralNine
6 Python Beginner Tutorial #5 - Loops
Python Beginner Tutorial #5 - Loops
NeuralNine
7 Python Beginner Tutorial #6 - Sequences and Collections
Python Beginner Tutorial #6 - Sequences and Collections
NeuralNine
8 Python Beginner Tutorial #7 - Functions
Python Beginner Tutorial #7 - Functions
NeuralNine
9 Python Beginner Tutorial #8 - Exception Handling
Python Beginner Tutorial #8 - Exception Handling
NeuralNine
10 Python Beginner Tutorial #9 - File Operations
Python Beginner Tutorial #9 - File Operations
NeuralNine
11 Python Beginner Tutorial #10 - String Functions
Python Beginner Tutorial #10 - String Functions
NeuralNine
12 Python Intermediate Tutorial #1 - Classes and Objects
Python Intermediate Tutorial #1 - Classes and Objects
NeuralNine
13 Python Intermediate Tutorial #2 - Inheritance
Python Intermediate Tutorial #2 - Inheritance
NeuralNine
14 Python Intermediate Tutorial #3 - Multithreading
Python Intermediate Tutorial #3 - Multithreading
NeuralNine
15 Python Intermediate Tutorial #4 - Synchronizing Threads
Python Intermediate Tutorial #4 - Synchronizing Threads
NeuralNine
16 Python Intermediate Tutorial #5 - Events and Daemon Threads
Python Intermediate Tutorial #5 - Events and Daemon Threads
NeuralNine
17 Python Intermediate Tutorial #6 - Queues
Python Intermediate Tutorial #6 - Queues
NeuralNine
18 Python Intermediate Tutorial #7 - Sockets and Network Programming
Python Intermediate Tutorial #7 - Sockets and Network Programming
NeuralNine
19 Python Intermediate Tutorial #8 - Database Programming
Python Intermediate Tutorial #8 - Database Programming
NeuralNine
20 Python Intermediate Tutorial #9 - Recursion
Python Intermediate Tutorial #9 - Recursion
NeuralNine
21 Python Intermediate Tutorial #10 - XML Processing
Python Intermediate Tutorial #10 - XML Processing
NeuralNine
22 Python Intermediate Tutorial #11 - Logging
Python Intermediate Tutorial #11 - Logging
NeuralNine
23 Python Data Science Tutorial #1 - Anaconda and PyCharm Setup
Python Data Science Tutorial #1 - Anaconda and PyCharm Setup
NeuralNine
24 Python Data Science Tutorial #2 - NumPy Arrays
Python Data Science Tutorial #2 - NumPy Arrays
NeuralNine
25 Python Data Science Tutorial #3 - Numpy Functions
Python Data Science Tutorial #3 - Numpy Functions
NeuralNine
26 Python Data Science Tutorial #4 - Plotting Functions With Matplotlib
Python Data Science Tutorial #4 - Plotting Functions With Matplotlib
NeuralNine
27 Python Data Science Tutorial #5 - Subplots and Multiple Windows
Python Data Science Tutorial #5 - Subplots and Multiple Windows
NeuralNine
28 Python Data Science Tutorial #6 - Matplotlib Styling
Python Data Science Tutorial #6 - Matplotlib Styling
NeuralNine
29 Python Data Science Tutorial #7 - Bar Charts with Matplotlib
Python Data Science Tutorial #7 - Bar Charts with Matplotlib
NeuralNine
30 Python Data Science Tutorial #8 - Pie Charts with Matplotlib
Python Data Science Tutorial #8 - Pie Charts with Matplotlib
NeuralNine
31 Python Data Science Tutorial #9 - Plotting Histograms with Matplotlib
Python Data Science Tutorial #9 - Plotting Histograms with Matplotlib
NeuralNine
32 Python Data Science Tutorial #10 - Scatter Plots with Matplotlib
Python Data Science Tutorial #10 - Scatter Plots with Matplotlib
NeuralNine
33 Python Data Science Tutorial #11 - 3D Plotting with Matplotlib
Python Data Science Tutorial #11 - 3D Plotting with Matplotlib
NeuralNine
34 Python Data Science Tutorial #12 - Pandas Series
Python Data Science Tutorial #12 - Pandas Series
NeuralNine
35 Python Data Science Tutorial #13 - Pandas Data Frames
Python Data Science Tutorial #13 - Pandas Data Frames
NeuralNine
36 Python Data Science Tutorial #14 - Pandas Statistics
Python Data Science Tutorial #14 - Pandas Statistics
NeuralNine
37 Python Data Science Tutorial #15 - Pandas Sorting and Functions
Python Data Science Tutorial #15 - Pandas Sorting and Functions
NeuralNine
Python Data Science Tutorial #16 - Pandas Merging Data Frames
Python Data Science Tutorial #16 - Pandas Merging Data Frames
NeuralNine
39 Python Data Science Tutorial #17 - Pandas Queries
Python Data Science Tutorial #17 - Pandas Queries
NeuralNine
40 Python Machine Learning Tutorial #1 - What is Machine Learning?
Python Machine Learning Tutorial #1 - What is Machine Learning?
NeuralNine
41 Python Machine Learning Tutorial #2 - Linear Regression
Python Machine Learning Tutorial #2 - Linear Regression
NeuralNine
42 Python Machine Learning Tutorial #3 - K-Nearest Neighbors Classification
Python Machine Learning Tutorial #3 - K-Nearest Neighbors Classification
NeuralNine
43 Python Machine Learning #4 - Support Vector Machines
Python Machine Learning #4 - Support Vector Machines
NeuralNine
44 Python Machine Learning Tutorial #5 - Decision Trees and Random Forest Classification
Python Machine Learning Tutorial #5 - Decision Trees and Random Forest Classification
NeuralNine
45 Python Machine Learning Tutorial #6 - K-Means Clustering
Python Machine Learning Tutorial #6 - K-Means Clustering
NeuralNine
46 Python Machine Learning Tutorial #7 - Neural Networks
Python Machine Learning Tutorial #7 - Neural Networks
NeuralNine
47 Python Machine Learning Tutorial #8 - Handwritten Digit Recognition with Tensorflow
Python Machine Learning Tutorial #8 - Handwritten Digit Recognition with Tensorflow
NeuralNine
48 Generating Poetic Texts with Recurrent Neural Networks in Python
Generating Poetic Texts with Recurrent Neural Networks in Python
NeuralNine
49 Stock Portfolio Visualization with Matplotlib in Python
Stock Portfolio Visualization with Matplotlib in Python
NeuralNine
50 Analyzing Coronavirus with Python (COVID-19)
Analyzing Coronavirus with Python (COVID-19)
NeuralNine
51 Making Text Images Readable Again with Python and OpenCV
Making Text Images Readable Again with Python and OpenCV
NeuralNine
52 Neural Networks Simply Explained (Theory)
Neural Networks Simply Explained (Theory)
NeuralNine
53 Motion Filtering with OpenCV in Python
Motion Filtering with OpenCV in Python
NeuralNine
54 Top 5 Programming Languages To Learn in 2020
Top 5 Programming Languages To Learn in 2020
NeuralNine
55 Simple TCP Chat Room in Python
Simple TCP Chat Room in Python
NeuralNine
56 Image Classification with Neural Networks in Python
Image Classification with Neural Networks in Python
NeuralNine
57 Edge Detection with OpenCV in Python
Edge Detection with OpenCV in Python
NeuralNine
58 S&P 500 Web Scraping with Python
S&P 500 Web Scraping with Python
NeuralNine
59 Simple Sentiment Text Analysis in Python
Simple Sentiment Text Analysis in Python
NeuralNine
60 Introduction - Algorithms & Data Structures #1
Introduction - Algorithms & Data Structures #1
NeuralNine

This video tutorial by NeuralNine covers how to merge data frames using Pandas, mimicking SQL table operations, which is essential for data science tasks in Python. By watching this, viewers can learn to efficiently combine and manipulate data. The tutorial is beginner-friendly and focuses on practical application.

Key Takeaways
  1. Import necessary libraries like Pandas
  2. Create or load data frames
  3. Use Pandas merge function to combine data frames
  4. Specify merge type (inner, outer, left, right)
  5. Handle duplicate or missing data
  6. Verify merged data frame
💡 Merging data frames in Pandas is analogous to joining tables in SQL, allowing for powerful data manipulation and analysis in Python.

Related AI Lessons

Up next
I Asked ChatGPT to Apply to 500 Jobs (8 Interviews in 48 Hours)
Sabrina Ramonov 🍄
Watch →