L1 - Data Analysis with Python and PySpark
About this lesson
Data Analysis with Python and PySpark
Full Transcript
my Channel today I want to start some evaluation of a book and book its data analysis with python and Pi spark I really recommend the book is very rich and we just look at some important Concepts in the book and review that gradually so let's go over to the the chapter that we're looking at as a chapter 4 and we're looking at this particular code base so I open the Google collaboratory so the data it's uh data that was used in this chapter 4 is and you can get data here so we copy the URL of this raw data so this is where I actually get the data let me show so this is the data okay so we just view this to get the data that means Navigator I go there from the main on the main Repository you will see a link for getting the data so it is the link you open this in another tab and so you see the broadcast logs and this is the data broadcast and you can view this in row and if you get the URL then you can use this command to get the data downloaded into you know collaboratory so this is the data now so we have that price pack is not installed you can install Pi spark and so we have that and get into the code the the libraries that are needed import the OS and also the greatest box session okay now so we can now use this back session to be able to read the they found like that's to read this power again so now so what we do there uh uh specify the directory is the same main directory and the name of the file now if you check the file you see that it's separated with um vertical bar so each cell is separated with practical bar so that's why we included this separator parameter to be practical by and we are inferring uh it's in the first rule is actually the adder to make that true so you see this is the first row so this is a continuous line so the first line is the other so there's a couple of lines there okay then we first schema okay this is optional but we want to import the schema for each of the attributes based on the cell values and we are passing the time format as this as a year month day and if you look here you will see that if it has anything that has attributes that is dead timestamp so like this log dates let's say it is year month and day so a spice Park is passing this it should take any uh any dates that it sees as this format year month ending and if we print the schema you can just see that and if you try to show the first 20 you can see the duration the some of the attributes there and will continue from here the next time so this is a log date all right of these apps to explain some little stuff about the advice back functionality okay thank you very much and see you in the next one
Original Description
Data Analysis with Python and PySpark
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Related Reads
📰
📰
📰
📰
Data Science with AI — Join IDSA Janakpuri Today
Medium · Data Science
Stop Writing Python Classes Until You Learn The 4 Things You Can Do To Every Piece Of Data An…
Medium · Data Science
Why I Stopped Trying to Predict Electricity Price Spikes (And Built Something Better Instead)
Medium · Data Science
Why I Stopped Trying to Predict Electricity Price Spikes (And Built Something Better Instead)
Medium · Python
🎓
Tutor Explanation
DeepCamp AI