Spark SQL Concepts | DataFrame | PySpark | Apache Spark | Part 3
====================================================================
====================================================================
Create First PySpark App on Apache Spark 2.4.4 using PyCharm | PySpark 101 |Part 1| DM | DataMaking - https://youtu.be/PIa_-aMHYrg
End to End Project using Spark/Hadoop | Code Walkthrough | Architecture | Part 1 | DM | DataMaking - https://youtu.be/nmy8_Aeqd9Q
Spark Structured Streaming with Kafka using PySpark | Use Case 2 |Hands-On|Data Making|DM|DataMaking - https://youtu.be/fFAZi-3AJ7I
Running First PySpark Application in PyCharm IDE with Apache Spark 2.3.0 | DM | DataMaking - https://youtu.be/t-cL3cL7qew
Access Facebook API using Python in English | Hands-On | Part 3 | DM | DataMaking - https://youtu.be/gc6gsjI8Zts
Real-Time Spark Project |Real-Time Data Analysis|Architecture|Part 1| DM | DataMaking | Data Making - https://youtu.be/NFwNKkIkN6o
Web Scraping using Python and Selenium | Scrape Facebook | Part 5 | Data Making | DM | DataMaking - https://youtu.be/IqxohFQ0rGE
End to End Project using Spark/Hadoop | Code Walkthrough | Kafka Producer | Part 2 | DM | DataMaking - https://youtu.be/7ffhyoYZz9E
Apache Zeppelin | Step-by-Step Installation Guide | Python | Notebook |DM| DataMaking | Data Making - https://youtu.be/MpvXarBn1JE
Create First RDD(Resilient Distributed Dataset) in PySpark | PySpark 101 | Part 2 | DM | DataMaking - https://youtu.be/_KOiCxwrmog
====================================================================
====================================================================
Join this channel to get access to perks:
https://www.youtube.com/channel/UCFQucNX7WsUwaWGNTrn6bIQ/join
Full Transcript
i'm just giving a simple example for spark sql so so you can run this right so so this is uh um same i'm trying to find out uh total records in my data frame so this is straight away you can just say count from the data frame you can use the count operation if you want to uh this is expressing a operation in a data frame structure just a minute i'm getting one hello hello yeah so you are getting ready so basically here i am just expressing my logic in a using the sql interface or sql query uh here i'm just using the data frame functions on data frame operations okay so so this is in a very high level it is this way we will do but internally you have different methods for specific purpose you have so that we will see uh going forward any other question prakash if you have any question we'll take it up otherwise i'll just go with the regular flow few things so like this is this is this is the way you read your csc file and create data frame okay now i'll just go back to i will start my uh jupiter notebook okay so you can actually write your five spark code in python like as we discussed earlier you can write in pi spark and also in jupiter notebook okay i am i'm using in specific uh python uh virtual environment so i'll just activate my virtual environment i'm created on virtual data making so here only i installed all the packages so okay okay so yesterday we will discuss how to create a json data frame from the json file and we we have to there is one more example we left which is how to create a data frame from the nested json file so if you look at the okay so if this is the json say if it isn't only one level say example if you don't have this we can just directly go on the read sparks session so a small session object.read.json and if you give the path uh if your input is like this in only one level it's not having sub nested json then it will give you a data frame and with this has one column under with this value so it'll have one one data frame that's a data frame with the one row which is so this is going to act as a column and it gives give it okay but if you have a nested structure okay so we are going to see how it data data frame will look like and how do you pass this and make it as a flattened data frame so we will look into this okay so here uh if you uh uh make sure that you need to actually uh you run this and you want to run your code in google notebook uh important thing is you have a uh pi square package installed okay so now nowadays the pi square package comes uh as a python package uh but just it's a package uh python pispa comes as a python package so you can go and install using a install command because if you installed that buyspar package it will install it so first you need to import this uh spark session class from piscepor.sql package then you create a sparse session object using a builder method and pass your properties like uh app name and master uh which is a cluster manager and if you have any configurations you can pass those under this guitar get our create will give you a start session update so now this is our json file located here and so let's put this the path is different here um since we are reading from a local file system which is linux file system so we need to use the prefix which is file colon file double colon followed by the path and we were reading this uh so you need to power when you read this json when usage is on apa uh spark session uh spark session.read.json you have to give the path of your json file and whether your json message or data is in a single line or multi line so you need to specify that if it is in multi line you have to provide this property and let's see how this uh data frame look like okay so first i will comment out this i will run this next we will create a swag session object now i am going to run this to create a data frame from the json file and if you look at here so it actually having a only one
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Related AI Lessons
⚡
⚡
⚡
⚡
The Nervous System of the Telco: Unlocking the Real-Time Power of the Network Element Interfaces…
Medium · Data Science
Enhanced RFM Analysis for Customer Segmentation using K-Prototypes
Medium · Machine Learning
One Survey Asked Rich People Ten Times More Often Than Poor People.
Medium · Data Science
Beyond the Credit Score: What 1.3 Million Loans Reveal About Who Actually Repays
Medium · Data Science
🎓
Tutor Explanation
DeepCamp AI