Introduction To Distributed Computing with Practicals | Getting started with Big Data

Krish Naik · Beginner ·☁️ DevOps & Cloud ·1y ago

Key Takeaways

Introduces Distributed Computing with practicals for getting started with Big Data

Full Transcript

hello all uh I think we are live can someone just ping me and let me know if I'm audible hello everyone so just uh message on the chat [Music] hello everyone are you able to hear me can I just get a yes or no in the chat okay see is there am I audible hello all uh welcome back to day two of our big data boot camp so cool uh thanks Rohan thanks Vin okay can anyone from YouTube also let me know seems like we are audible on YouTube I see there's some issue that strange though okay AUD hello on uh should it be audible can someone just confirm on YouTube I hope it is audible right so yeah cool okay great so uh welcome back everyone I see that we are live both on YouTube and the LinkedIn as well so welcome to day two of our big data boot cam like again the we are bringing up the course on which is going to start on December 21st and for that just to make sure that we are able to reach out to more and more students as well as making sure that your doubts or anything which is there regarding the course is clear we are just keeping these sessions so these sessions are of course free live Monday to Thursday today is the day two or the second session for attempting the first session right for attempting the first session you can just go to Chris Channel where you are right now see this and if you will go on live you will see the first session there as well okay so let me share my screen and bring it up yeah cool I hope the screen is also visible everyone so one minute yeah great I hope the screen is also visible uh V the what is not uh visible on uh work on YouTube can you let me know so this is the today's agenda which we are going to discover yeah cool everyone so quickly before I begin if anyone has any doubts anything just let me know uh I hope my screen is visible and you I'm audible as well if anyone has any doubt anything with respect to the setup or anything just let me know and we can solve it before we move forward just to uh make sure that all of you are able to further learn about the course please go to this link where you can learn about the course we also have the full very well detailed syllabus okay so this what all we will be covering in that boot camp that is also there let me just show you once so to enroll or learn about this everyone this is basically the course page as you can see complete Big Data boot camp it will be Tau by both me and chrisa so majorly it will be at20 now we will be covering each and everything here starting from the basics of big data to covering all the clouds and today we will also be seeing how we can set up a cluster on Hadoop on gcp so Google Cloud platform we will be using now these are all the details about the course okay big data your and AWS Cloud Mastery these are the mentors f is also there start date is December 21st as of now timing is 8:00 a.m. to 12: p.m. okay and proficiency level we are saying that it's for professional majorly because we will be heavily using all the clouds and everything so it's normally professionals have a credit card now you will not be charged or anything but as all of you might be knowing to get even the account you have to enroll or basically register on these platforms where they gave the free credits and we will try that we are able to learn each and everything there only as close to a professional environment as you will get in a job okay great uh I think yeah so you will not be able to get the reply like whatever I'm sending on LinkedIn so via like the software which we are using you cannot get that but yeah don't worry uh you can find this thing like it is also in the bottom so if anyone has any doubt regarding anything in general you can also call the counseling team the number must be there which is basically in the bottom you will be able to see that okay great so Dua yes uh we will be bringing the same on YouTube as well but uh as you can see the cabus it's a lot so again to make sure that we are able to provide you with the quality as we have done with other courses starting from I would say DSA and all the courses of machine learning and everything which Chris have on YouTube we are planning that yes everything should be a topnotch quality so that anyone can start from that okay cool so before I begin anyone has any doubt uh anything which they want to ask me regarding their professional experience regarding anything please just let me know also let me share my uh YouTube channel as well there you can see like what I told that day so in the last class that was on Monday we discussed about what exactly big data is is and for the same I have a class on my YouTube channel as well which you can see okay everyone great so this is Big Data boot camp with the Z and AWS this is what we going to basically discuss today now this is a little bit about me so I M Agarwal I am a graduate of nsit Delhi I have basically worked at companies like Goldman Sachs oo rooms and basically myle fortunately in all those companies I got a chance to work on projects where Big Data was a required M along with applying data science machine learning and AI to those you can just go to my LinkedIn profile and learn and see more about my projects I have updated each and everything okay so this is all now you can see on the projects as well like I have actually worked Hands-On on Big Data so it actually started with when I started my professional Journey so because I was in the data science or data analytics field like I have that idea so the projects was also in that M okay cool uh RAR you can just reach out to the counseling team I think for outside like for people who are staying outside uh us I think it is $100 okay I think it's very competitive concerning the PPP as well right so for that it will be the case uh Mohammad good evening sir please I'm a beginner in ml I want to know if your course will have placement assistance Mohammad we will be helping you with placement uh based on my experience as well so I have actually taken lots of mock interviews I was part of company like scaler and coding ninjas and others so function up where I was uh taking mock interviews and everything so have interacted with 500 plus students there also I have taken interviews because I like it very much like taking interviews and everything to make sure that I'm able to properly grill a candidate So based on that experience as well I will be helping you to make sure that you're able to make a resume then further uh make sure how you can also optimize all your profiles where you can uh make your profile and if you will get any job in let's say any of our uh Network then we will of course be referring you as well it's not a guarantee let me make that clear and no course sely should guarantee you I cannot guarantee you but yes uh we will be helping you a lot with placement so you don't have to worry on that okay we have lots of I would say Network many students are there X students who have uh basically completed other courses and they do reach out with the requirement so we will make sure that we are helping you with the placements there and it is also written there as well so you can see here dashboard EX for one and a half years everyone then Community chat forum for discussion live doubt clearing after the sessions then again the hackathons with all the rewards job referals if we will have any and resume discussion and mock interviews so these are all the things which we will be providing you along with this course uh the major aim which I have from this course or this boot camp is that you should not be someone who just know that okay yeah this is big data and everything no we will be discussing everything in complete depth that's why it will be from 5 to 7 months long and we will also make sure that you are industry ready once you complete this course and that's the reason everything we are doing on cloud all the projects everything will be done on cloud only okay cool uh so before I start anyone has any other doubt which they want me to clear regarding their professional Journey so the first is road map and Q&A then we will discuss these topics and then I will make sure that I'm able to answer your q&s more okay so if anyone has any doubt from YouTube or then please make sure that you are asking me that payment Mohammad I don't think Mohammad can you reach out to the team the number which you see I think he should be able to help you uh as of now we have not basically collaborated with any company or something which can help you make those payments but if you can reach out and everyone you will also get a discount of 10% you can use the code Chris 10 right you can also reach out to the counseling team and they can help you with the enrollment and everything Okay cool so let's first now start uh and I hope can anyone just tell me if the Vol setup is all good because this is some new uh like the software which I'm using to write that's a new one which I have actually purchased so I hope it is fine can anyone just confirm me that if all is good there so this is software I'm planning practicing for the first time so I hope this is cool great so let us now start with first a basic road map to make sure that everyone is up to the mark So in the last class I actually explain what big data is so you can just go and see that you will find that in the live of Kish sir or you can also go to my YouTube channel where you will see that what exactly uh I told or I basically explained in that class it is majorly on what exactly is Big Data in a very layen terms by using professional experience not the definition okay great so now I just want to make sure that in this class I'm also discussing a little bit about road map so you guys and everyone have an idea that hey what exactly is this big data what all are we going to study and everything okay cool uh good so now starting with the big data part what we have to or what we normally start with is understanding the problem something which we did in the last class so understanding the problem that we have already return that what exactly big data is okay nice that was the major thing uh which you have to understand first to make sure that you are able to understand what exactly we will be solving I always make sure that I am a lot focused on the problem because solution normally is easy right problem is where major majorly we have to put in the mind now if that is the that thing is clear okay second is we normally start with with Hadoop Hadoop is one of the most sought after and used I would say uh Technologies or a group of Technologies okay which are used to handle your big data Hadoop is actually an architecture so there are many things which form part of it in the Hadoop architecture we have hdfs layer right so we have a layer of hdfs where we store the data it is for storing and along with that we have something known as map reduce Yan and lots of other component as well which we will see in the course as well as we continue our discussion now in the last class I told you that there are two major things which we have to do with our data first is toring and second is processing that is when we work on it right everyone so hdfs majorly is helpful for storing and map redu is very much you can easily think that it is for processing so we have handled these things but now let us try to understand that how Hado come into the picture or what exactly Hadoop is okay great can estimated cost for that should be very minimal you don't have to worry on that okay so majorly we will be using uh the free credits which are provided so don't have to worry on this great now let us try to uh first we just clear the road map like what all we will be covering so you can see the full thing here as well everyone so let me just make sure that I am able to uh like tell you each and everything which will be covered as part of the course so you can see that this is the whole thing these are the learning objectives okay now once we move forward you can see one more thing this these are the prerequisites python SQL and database basics we will be providing you with the overall recordings for this course uh this part so recordings will be provided if many students have doubt then I will make sure that some classes depending on the interest and the challenges which are faced we can also take uh in the class but yes we will make sure that each and everyone is up to that level where we have this uh prere requested covered okay great now let us move forward we have to get the overall idea or what exactly is Big Data something which we discussed in the last class we understood then we have to deep diap and understand about Hadoop something which of course is used in the industry okay in Hadoop we will be discussing about its architecture what is hdfs what is Yan then what is map reduce framework as well now in many course or many places you will see that map reduce is not normally taught because it is not used that much okay it is for processing processing and for processing we have now spark and other things storm spark and all these things which we use but still I believe for the basics or to make sure that you understand each and everything it is very important and that is the philosophy behind this course as well we will not be just completing this or we will not be teaching with just the things okay we will make sure that your Basics are clear so you are set up in your overall journey in your professional Journey for the next 5 seven years okay it is not something which you will just cover and you will be like hey yes I got a job no you will be getting a job along with that you will be comfortable in each and every concept which we have taught to you okay then we will use the tools Hadoop ecosystem tools and we will see Hadoop use cases here we will also see that okay in Yan how it handles all the architecture in the back end what exactly happens here same for the map reduce we will just write a single program to see that okay how exactly Map ruce works and what was the exact issue so if tomorrow someone ask that hey we use map ruce you should not be that hey no I don't know what map redu is it is not used everyone say that map reduce is dead no I don't want that any of you should go to a company and feel like that we will be making sure that you know that okay what was the challenges which map redu was facing you will understand that by seeing a code as well okay we will be running that code over the cloud and you will understand each and everything there as well okay great next we will start with Apache spark and this is what like this road map the way we have decided and worked on it this is exactly the same uh which anyone who have to work or start their journey in Big Data you should be following okay great so Foundation of AES spk you will understand that okay why map reduce fails why map reduce fails and then in spark as well there are actually rdds and data frame and something known as SQL as well now normally data frame and SQL is used a lot in today's world but again we will be discussing about rdds as well so your thought process on your mind that okay why are we using these things and not the other things they are also clear so spark we will be discussing in depth everyone then you can see data frame and structure data processing in spark we will be setting we will be running our code on spark we will understand and go a lot in depth for lots of things for example how caching is done in spark how you can do optimization this is actually something which I have struggled and rather spent a lot of time on because in one of my project at Goldman Sachs optimization of spark that was the major focus of that V project okay I started from spark 2 went to spark 3 understand all in depth about cashing understand what is the different kind of joints which can be made okay all these things I actually spent a lot of time around eight months I was totally deep into spark because the project required that kind of I would say expertise and that is all what I will be sharing I will tell you that okay how I was able to do that project how that help me in promotion and everything but yes the major idea is that we have to understand them in depth okay so each and everything will be covered in depth everyone now we can move forward Advanced data processing and optimization as I said once you move forward performance optimization and tuning in spark as well so you can see that many modules are there for spark okay then the heart of handling unstructured data that is the nosql databases we will be doing an understanding about that we will compare them with what exactly traditional DBS are so your SQL DBS which you must have already done by recordings we will cover them by a no SQL we will cover about mongod DB so many of you must have heard about uh basically in companies mongod Deb or these kind of things are used a lot right everyone so this is all what we will be covering then Cassandra is the next one which we will be covering moving forward now we will understand about Hive so Hive is actually used to it is kind of a data warehouse okay which is a part of Hadoop only so when I made this overall architecture of Hadoop a very high level Hive is also a part of this architecture okay everyone and we will be learning in depth about that we will be setting up Hive on our Cloud able to see that okay how we can work on that moving forward again Hive also we will be doing in a lot of depth so Advanced hi features partitioning optimization and performance tuning on hive as well so in this course each and every one of you will be able to see that it is not that we are just teaching you few things and we are like yes uh this is the basics no we are covering each and everything in complete depth and the reason for that is it will be helpful in your professional Journey once you clear the interview how exactly you have to work all those things will also be covered because we are covering each and everything in depth we will give you those cases where you can have or you will face your job as well that hey your hi query is working slow the spark code is not optimized how you can do that so we have to make sure that we are able to work on these Technologies as well everyone okay so that is going to be the second Focus till now if anyone has any doubt okay so let me just before I move yes if anyone has any doubt till now just ask me in the chat from LinkedIn or YouTube and I will answer that anything which I sent it will it will not be on uh LinkedIn but yes YouTube people will get all the messages so I'm taking a pause here uh before the road map like we will continue the road map if anyone has any doubt let me know quickly anything which you have what all I have covered any topic which you want me to include anything which is there I will just make sure that we are able to clear it if not we will just continue with our road map and I will just give you an idea that okay if you're trying to become a big data engineer uh that is all you have to understand okay and let me meanwhile share the links as well again so let me just give me a minute this is from where you can enroll or learn more about the course the overall this complete cabus you can actually download from here everyone and you have the number counseling team number on the bottom okay where you you can just see and you can reach out to the team again as T said in the last class as well in the last live session uh the counseling team will not be looking to sell you things it will just try to understand and then say that okay what is the best uh course or what you can do to go forward we have made it very clear that we have to make sure that we are able to help each and every one of you okay that is our major Focus here okay cool so next uh we will be discussing about the Kafka everyone so Kafka is used for real-time processing of data so when you use your credit card you will see that okay you get instant reminder or instant alert if anything is a problem when you're talking from this co-pilot okay if you just call this co-pilot it is able to just see your message in real time so this was again one kind of a project which I did in my last job where we kind of created this co-pilot and Kafka was used in that because we have created a chatbot kind of a thing which was able to do lots of things right so for that we have to make sure that realtime processing is handled okay everyone so moving forward once Kafka is done we will also see about Kafka producer consumer and everything in depth okay then we will also learn about spark streaming so that you are able to handle that use case as well then to make sure that you are able to also work as a data engineer understand all the pipelines and stuff we will be learning about Apache air flow a tool which is used a lot in the industries okay for data warehouse many students were asking for snowflake as well so after hi we will make sure that we are covering that also so that it is very helpful to you Apache airf flow snowflake these are few things which are getting used a lot in the industries now right everyone and once you have done this then we will start with cloud in the cloud as well again all the clouds will be covered we will be making sure that each and everything is clear to you so let me now go back I hope the full road map is clear to you everyone what all we have discussed so now let us move forward and let me explain you about Hadoop a little bit so that you have a very good idea and you can explain that to someone as well if anyone has any doubt in lach please just make sure that you ask that in the chat and I will be more than happy to answer them great uh Heather if you want to if you're a non-tech guy who want to transition into tech for this particular course there are lots of prerequisites which you will have okay so if you are someone who is comfortable with databases Python Programming SQL you can do it but if you are someone who is complete like you have no idea whatsoever about technology you have never coded in life then I will not suggest it because then it will be a lot for you to digest it will not be this that easy okay so if again if you are just uh like let's say if you have an idea if you use if you let's say a data analytic guy as well it is very good for you it will be technical yes but it will be a very good addition to your overall resume or your career if you are someone who has never coded then I will suggest some other course not this because it is going to be a little Tech heavy Okay cool so if anyone has any other doubt let me know great so now before I start with Hadoop architecture everyone uh there is just one more thing which I would like to discuss a little and that is about monolithic and your distributed architecture right so there are actually these these are actually the ways okay we are able to overall make our systems monolithic in a very simple sense is just a single system okay I explained you in that in the last class as well the way our DBS are okay so let me just draw a line yeah now if you if you have let's say 512gb here 512gb hard drive space in your overall laptop you can increase that but again there are limits okay let's say you are increasing that you have some slots three slots and you are just adding more and more space to that okay uh let me just quickly ask some doubts timing JW will be 8 to 12 uh this particular live YouTube will be from 8:00 p.m. to 900 p.m. majorly so we connect for 1 hour every day right for the course the timings are 8:00 a.m. to 12 noon basically 8:00 a.m. 12 noon okay it this for college fresher I have done bachelor in computer science and currently pursuing Masters in the same domain uh yesir again the same thing uh if you are interested in the data field plus you want to understand about this thing and considering you are doing Master this is a good course for you okay if uh you want to have a solid career and again believe me data engineering or big data is something which is not going to go any soon we are generating lots of data and companies are looking forward to basically have guys who are able to write code or understand everything with the with respect to that okay I have a master in data science and uh pram uh that should be good that should not be a problem okay same it will be AWS and a your both and along with that gcp actually will also be covered so you don't have to worry on that right good so yeah let us actually start with that now everyone so mono as the name suggest mono means one okay now again uh I have to before I move to her architecture I just want to make sure and because you also were in the last session which I took I want to explain that where is the problem coming like why are we focusing so much that hey I want to use Hado I want to use multiple computers multiple servers I have to make sure that I make your mind in that and this is what I'm trying to do here when we were handling Lots of data and in a simple mono kind of an architecture in a single computer there will always be limits yes you can expand but there is going to be a limit okay you cannot handle the data or the processing as it increases and that is the thing okay so now what I want to make sure that you understand it I can use a very easy example let's say today you have to download a file which is 20tb in size so you have to download a file which is 20tb in file uh rishab I am if you're able to see that so it should be fine okay now there are two ways which through which you can do or download this file right and that is also from my childhood experience so I'm a very big fan of basketball the game of NBA there is an NBA game which I was a very big fan of it used to be like around 8 899 like sorry one TB or something in size I think sorry 200 GBS or something in size okay and I'm talking about the latest one when not sure why they're making it so big now to download that it used to take a lot of time either you can use your own computer either you can use your own computer and download the full file and it's going to take a lot of time or what you can do is you can use four computers here so you are using four computers and each of them is downloading five PPS for you can anyone tell me which is going to be faster so here I will just repeat what I have said to download this 20 TB of data we are using a single computer that's the first approach second is we are using four PCS to download 5 TB each which is going to be faster can anyone tell me which exactly which approach is going to be faster uh RIT I think you meant to send the second one uh we will discuss that why it is distributed okay all these things uh my slate uh everyone is having the different network so that should not be a problem we are having each and every laptop has a separate internet connection okay no throttling nothing now this is going to be faster that is what I did even I asked my friends that hey can you please download this part let's say it was a RAR file which was having 20 parts so I asked that hey can download five I will download five like this again simple thing to do pretty straightforward I was able to get the game and play it as well okay the same mindset if we apply okay don't you think that using something like this something of this four computers to do the work parall that is going to be faster everyone yes or no is that going to be faster and if you just see one more thing or you try to understand one more thing okay which is which many students have doubt there is still a limit here there is a limit here which you will have to which will throttle you if I just use that okay what people were saying but in this use case you can add more pieces if let's say I had eight friends at that time I would have gotten it quickly yes or no everyone if I had eight friends at that time and and I would have been using eight pieces I would have got this thing earlier so now let me just clear each and everything and let us get back to understand about anything things one minute sorry just still getting used to the software yes great so now in distributor we just not just have one machine but we have group of machine or server group of machine or server okay so this is what majorly in a very lay in terms and I will try to keep it a lot easy the understanding is that either you are from non Tech or you are from Tech everyone will be able to understand by this example now okay we have to just focus on few things as well everyone okay that if let say now if I move forward and I just talk about the distributed system and I will be telling that how you can how we will be creating a distributed architecture in sometime on your gcp but just tell me one thing everyone in the example I was this person who was making sure that hey if I have these four PCS okay you will download 1 to five part you will download 6 to 10 you will download 11 to 15 and you will download 16 to 20 okay that was my mindset right everyone but I was the one who was handling this what do you think someone was handling now that what exactly each computer or each node or each server has to download yes or no in this easy case I was the one who was able to do this who was handling all these things then who was making sure that once this person has downloaded okay I get it to me once this person has downloaded I get it to me this person has downloaded give me give me the results I was telling each and every PC what to do then I was monitoring let's say for example if my friend say that hey I am not available my PC is not available then I must say that okay uh I will ask someone else to download this I will ask some other person or some other friend to download this if he says that hey I was not able to download this 15th part right I will just say that okay retry it once again I think it will be done now I hope each and every one of you are able to understand what exactly I'm coming like what exactly I'm telling that there was this person which was me at that time who was handling all these things yes or no anyone has any doubt in understanding this thing anyone any any problem anything which is not clear so in a technical work space as well if you have a very big task but let's say now I say that hey you have hundreds of computers someone will have to make sure now that they are able to divide this task they are able to make sure that each and everything gets picked up we are able to store the data by dividing it we are able to process that data then we are able to collate the results and show the final results yes or no everyone are you able to understand this the problem from a very Layman mind Layman like terms in a very easy to understand problem approach yes or no I'm not going technical here I know that students are there from each and everything and normally when I teach as well I make sure that I'm teaching in a way each and everyone is able to understand and then I move to technical side where it is easy to capture them so if let's say in a technical side now this is easy because again this was a onetime job I was able to handle this and everything tomorrow when the company ask that hey you have the species right you have these species you have lots of them let's say so you know what monolithic and your distributed is that is all good we know all these things you have these these many let's say nodes or servers or laptops we will be needing a Master PC to make sure that so we will be needing a master PC everyone because this time again I will not be standing 24 hours so these are known as master and worker so it's a Master Slave or Master worker architect architecture where this will tell everyone what all to do how to handle how to process and then collate the result as well so this W thing is actually now how to handle this thing either you can write a lots of code but that is actually what Hado help you to do so when you install Hadoop on all of these PCS so what we do in a very easy term again we install Hadoop here here here here everyone we will install Hado we will tell that hey you are the worker node you are the master node this is where or these are the workers okay here this particular worker has an IP address let's say 1921 168.3 do1 something like this we will tell each and everything we will make a connection and this W architecture this W thing which help us to do or achieve these things this is actually in a very easy ter what Hadoop or distributed uh architecture is installing Hadoop help us to make sure that we are able to achieve our goal of a storing the data okay that was the biggest or the first problem and B processing the data so once you have the distributed architecture everyone on top of that you are installing Hadoop to the systems to the nodes which is able to help you to work on Big Data that's the easiest and the most straightforward Layel term definition or the way I can explain even to a 13-year-old on what Hadoop is if anyone has any doubt till now that okay what exactly Hadoop is okay just let me know now why I'm telling you like this because even I had a doubt that hey what is this Hadoop I have a distributed system already why I need all these things you have to understand one thing but you can face failures in terms of storage you can face failures in terms of processing tomorrow it can happen that your this PC it goes down who is going to monitor all these things who is going to handle that is where we have that is where we have created this whole Hadoop software or architecture which is making sure that it is going to handle each and everything so that you can focus on your work again easy example to understand let's say when you buy a new pc you install Windows on top of that now Windows handle each and everything it is have it has a mouse it has a keyboard it has the input board for your audio it has the speaker okay it knows that okay what all Hardware it has to connect if anything is not working it tells you that in a very similar way harop help us to just make sure that our distributed architecture is very good to handle the Big Data require requirements so that is the major thing Hadoop is not a database Prashant not at all Hadoop is an architecture Hadoop is not a database we have the no SQL and SQL DB Hadoop is not a DB okay I hope that is clear everyone you are able to understand the way I'm teaching I hope you are able to actually see what exactly were we doing here in a very easy terms not using any definitions nothing I have actually done okay cool everyone so I hope we are good to move forward if anyone has any doubt till now let me know anything which is not clear and Prashant you will see one more thing here I am talking with respect to the whole architecture I'm not saying that we just have to store the data okay and rather Hadoop in any case it's not a DB it's a full overall I would say a software part okay multiple softwares are in this in that or multiple Technologies are part of Ado hdfs for storing map reduce okay it's a program which help you to process on that data Yan is something which is for resource management it may happen that this PC it is very small okay it is having very less memory then Yan will have that knowledge similarity to make sure that we can store or access the data just like we do in SQL we have Hive there so all these offerings form the part of your Hadoop architecture or the Hadoop overall okay I hope it is clear everyone so now let me just show you one more thing so what we are going to do okay so let us go to Google and then we are going to Google Google Cloud okay so this is the whole Google Cloud everyone uh hopefully it doesn't just so I've used it a lot for AI and stuff as well to make up that server now again we going to discuss that a lot in depth in the course and everything okay uh Z can you check at your end I have been using this W thing for a long time so everyone uh I hope my voice is clear and everything is fine with respect to that I hope that is clear everyone can you just tell me if that thing is clear or not so that uh you guys don't face that issue normally it's a tested setup Z can you please try it once at your in as well uh because like I use this setup every day so yeah cool great thanks a lot everyone so data proc is actually now again see to make a setup like this either you will be making it physically and many companies do that uh Goldman was having that actually in house where they have these setups right they have these setups and you are connecting them manually again very easy thing to do I have two laptops in front of me if I have five I can just connect them one by one I can just install Hadoop on each and every one of them make a network connection and do it uh like basically achieve everything which I want to do not a problem not at all a problem okay majorly we use Linux okay not Windows or Mac the reason for that is for our use cases for the way we want our PC to behave Linux is more than enough Windows normally is not that good and Mac is also kind of built on top of basically principle of Linux only so if I have 10 let's say CPUs uh 10 sorry these things and now I will just uh tell one more which you have to understand so this is known as resources so we say that our laptop node server PC they have resources normally we will be using that a lot in our course or majorly as well you will hear this a lot it is your hard drive then you have RAM and then you have your CPU I I hope each and every one of you will be able to understand this again it is not 10 years back where we didn't have the idea we see these three things only right whenever we are planning to buy any particular workstation for us as well we want that okay to make sure that it is faster we have the ram to make sure that it is process basically RAM and CPU kind of can work together more the memory okay more task can be in the memory more the RAM and then CPU can also process hard drive is used to just store the space store the data and everything right so what I'm telling you here is that you can either create it on your own or you can use some cloud service provider now these cloud service provider what they do is they do kind of a similar thing only they have lots of these uh I would say workstations okay and to show you that as well just let me show you an image data center so this is a very common image which is used to show the data center you will see that many lap many of these workstations many of these uh basically servers are kept which you can easily access right from anywhere just that these AWS Azor gcp IBM Cloud they are giving you the way you can connect to them in a very similar way Google Cloud it has a data proc okay fully manage and highly scalable service for running Apache Hadoop Apache spark Apache Flink Presto and 30 plus open source tool and framework okay so let us go here everyone and I just want to explain that how easy is it like why are we going from this particular route and why companies also do that I hope each and every one of you will be able to understand that it is pretty easy to use these tools then to make it yourself like how will you get a handle all these things you have to install the OS and everything there all these things has to be done and that is why these Cloud computation in a way I would say have been in a lot of growths in the last 10 years because everyone is generating lots of data everyone needs these systems on which they can basically use to store their data to work on their data I hope that is clear everyone anyone has any doubt in law anything which is not clear anyone anything with respect to the course your overall professional Journey what all I have explained and I know I've keeping it a lot lay in and easy so normally I teach like this only plus uh I'm not going into a lot of technical depth the reason I want to show you the Practical along way uh majorly in the next class or normally when I teach we will see the full architecture of Ado that okay how a right happens to that how the things are getting stored there map ruce also we will see in particular proper depth okay so don't have to worry on that is just that I want to make sure that I give you the gist of the things so that you know that okay what exactly big data is otherwise everything which we have mentioned in this particular cabus we will be covering each and every one of them one of these things okay everyone so each and everything which we have mentioned we will have dedicated classes to explain you the whole technical uh part and jargon and everything behind it which is going to be a lot helpful in your interviews as well okay great so now you can see that what it is so we have an autopilot cluster one this is the region so these data centers they are kept at different different regions okay data centers Google region so you can see here or I think this is a better image that where all these data centers are present in Google okay so pay attention everyone we have in Delhi Bombay all these countries they have the data centers here they are interconnected and in a way they are just having lots of machines there which you can use so this is the region I'm using us Central one you can use any of them normally the near one is the better in terms of latency but should be fine okay we are using standard tier uh because again we are not we are on the free so I'm also using it in the free trial only as of now everyone you can see free trial status 24 credit and 21 days remaining activate your full account and get this thing so when I start the course along with you I will be also making a new account so I can face all the issues and everything which you guys are facing okay not that it will be working on my machine no it will be complete from a start uh setup Okay so next is stre registration let actually go back anyone has any doubt before I move forward anything which is not clear anyone want to ask anything just let me know anything we are going to now create our cluster so I'm just showing you that directly if anyone has any doubt before I move forward to create this cluster let me know in the chat I will make sure that I clear your out because I don't want to uh then I will be creating this first quickly and then we will see that right okay so no doubts from that I think it is sinking yeah is taking a little bit time no worries okay Al so I see that there are no doubts nothing uh just let me know then get back everyone so you will just see few things one minute yeah so just let me also give you a little bit idea about Google cloud in general so Google Cloud again it will be giving you lots of solutions okay you can see that okay what all Industries it is basically encapsulating so now because most of the companies rather for more everything they are moving to clouds okay and you will see that lots of AI related stuff is there so when you will be enrolling in the course you will get a full idea about what or exactly cloud is and that is also going to be a lot helpful to you okay now we can go one thing we can search Big Data here and we will see all the offerings as well okay 1 minute so this is the Big Data all the example this is being taught just let me do one thing Big Data gcp okay offerings so that I can show you that what all basically big data is giving you oh sorry gcp is giving you to handle the Big Data let me just start from data prop only great so data proc is basically the major thing which is used here everyone to make sure that you are able to make your overall virtual cluster so this W thing is known as a cluster it's a cluster of PCS or cluster of machines or notes right so many of course things will be like everyone will be using that what we can also do uh go is if I just go on Solutions and Industry so we can just see that data analytics we have so many of these things let's say for stream analytics data Lake modelization databases okay so all these things are there for which Google has different different offerings we are using one of its offering which is data Pro to make our cluster right everyone so it is also giving you the monthly cost and everything so if you are making a cluster okay let us not create a cuberes cluster because we have to create a different cluster just a second one minute if meanwhile anyone has any doubt just let me know okay I will just help you to clear that anything everyone so great pretty happy to see that there are no doubts yeah so let us go back and VI and see the uh cluster and everything okay so again it's a free tier and everything we can just use India as well so you can see americaas Europe everything is present here right then we can just see run business critical workload safer faster and everything uh we don't need as of anything as of now here so we can just Gove forward right we can just have the fleet let's you logically group and normalize cuties cluster so it is making that cluster cuberes just let me quickly see that how we can add a normal Hado cluster here just a second everyone uh meanwhile if anyone has any doubt just quickly ask they have changed their uh setting a little bit so that's why earlier in the normal creating uh cluster creation it used to take you to that only okay any doubts anyone please make sure that you ask that I think some API I have to just uh start okay just create it and let me minute yeah yes so yeah this is actually the place so everyone see this is the cloud data proc where we are now now we are going to try to create a cluster uh G we are discussing few things about the Big Data giving you the idea we are creating our own cluster okay uh Hadoop manage cluster on data Pro Google data proc so now you can see everyone this is the cluster name uh that was creating another kind of a cluster so we'll discuss on that as well that was a cuties cluster actually should be fine now this is where I want all of you to focus so it is asking you for cluster type and we will discuss all these things as well that what is this Zone and everything okay so that you have an idea in a job you don't have to do any of these things keys are the work for an admin but again if any of you is let's say on the devop side of this big data then also you should have this knowledge so everything will be discussed there okay now cluster type you will now see see standard one master n workers single node one master zero workers and high availability now what exactly this High availability is that also we will discuss I can just give you an idea let's say tomorrow you are creating this cluster and everything but what if this main Master computer dies down okay so I was handling each and everything here right when I asked you and give you an example what if I went on a vacation who is going to just handle all these things now in a real company use case let's say Amazon Google all of these companies are using this right Amazon cannot say now that hey our Master computer died you will not be able to shop for five minutes you can just think that okay how much problematic how much revenue loss will be there okay so we are going to create a standard cluster there are versioning and everything so what you can do here is we can select the Hadoop version as well so this is Debian two these are all the OS as you have multiple laptops multiple work noes multiple stations there also we install an OS right everyone so we have to make sure that any know communties is something else we will discuss on that it kinds of bringing the uh I would say different instances within that Hadoop architecture we are making sure that we will have these physical machines given to us so just like you install Windows okay I can install Linux onto my system these are all the volum data Ro images you can see the Hadoop and the spark version here you can give your Custom Image as well that is fine so we are just Sting the normal one adop 3 and Spark 3 works now this Auto scaling everything we will see should not be an issue okay network configuration there are lots of things again we will discuss each and every one of them properly right now it doesn't make sense okay so now what I'm doing enable component Gateway this is to make sure that we are able to see the web interfaces everything we can also have optimal components so we can have jupyter notebook this Flink zookeeper all these things into our cluster right and this is exactly how we are able to just say that hey I want this particular cluster where I will have one master and end noes now let us go back and see the notes okay so we can go onto this series and we can see that okay what all are present you can see that here we can select N1 the first generation and now there will be lots of these machines you can see how big the machines can be so biggest one is having 48 CES and 624 GB of memory I hope each and everyone of you are able to understand that how big of a ram that is right and how many cores are present if I even go to uh the biggest or I would say the latest tech or the biggest most expensive laptop I don't think it will even have 624 GP of memory but for our use cases in Hadoop we handle the big data so we need we want to make sure that we are able to work on these things okay for the time being to make sure that we have less cost I'm just going to select the generic one okay very small one more memory you take it's going to be more uh I would say costly let's keep it 32 GB because again I don't want to make sure that I have lots of uh these this space it will be just problematic to me only it will just charge a lot you can see this was the manager node and this is the worker node okay worker node similarly I can select N1 I can select the basic one number of worker node le

Original Description

Hello All, Our first full fledged Big Data Bootcamp With Cloud Azure And AWS is live and will be starting from 21st December 2024. Counseling Team:- 9111533440 Course Link :- https://learn.krishnaikacademy.com/web/checkout/6746d8f5b7bc6c69007be95b Please find the course details below :- Big Data With Azure And AWS Cloud Mastery Program Mentor: Mayank Aggarwal & Krish Naik Start Date: December 21st 2024 Timing: 8am to 12pm IST(Saturday And Sunday) Proficiency Level - Professionals (2+years experience)
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Krish Naik · Krish Naik · 0 of 60

← Previous Next →
1 Natural Language Processing|Stemming
Natural Language Processing|Stemming
Krish Naik
2 Natural Language Processing|BagofWords
Natural Language Processing|BagofWords
Krish Naik
3 Gaussian distribution or Normal Distribution in statisctics
Gaussian distribution or Normal Distribution in statisctics
Krish Naik
4 Natural Language Processing|TF-IDF for Machine Learning| Text Prerocessing
Natural Language Processing|TF-IDF for Machine Learning| Text Prerocessing
Krish Naik
5 Log Normal Distribution in Statistics
Log Normal Distribution in Statistics
Krish Naik
6 Covariance in Statistics
Covariance in Statistics
Krish Naik
7 Confusion matrix, Precision, Recall| Data Science Interview questions
Confusion matrix, Precision, Recall| Data Science Interview questions
Krish Naik
8 Tutorial 44-Balanced vs Imbalanced Dataset and how to handle Imbalanced Dataset
Tutorial 44-Balanced vs Imbalanced Dataset and how to handle Imbalanced Dataset
Krish Naik
9 Implementing a Spam classifier in python| Natural Language Processing
Implementing a Spam classifier in python| Natural Language Processing
Krish Naik
10 Tutorial 11-Exploratory Data Analysis(EDA) of Titanic dataset
Tutorial 11-Exploratory Data Analysis(EDA) of Titanic dataset
Krish Naik
11 Face Recognition using open CV and VGG 16 Transfer Learning
Face Recognition using open CV and VGG 16 Transfer Learning
Krish Naik
12 Pedestrian Detection using OpenCV from Videos
Pedestrian Detection using OpenCV from Videos
Krish Naik
13 Face and Eye Detection from Videos using HAAR Cascade Classifier
Face and Eye Detection from Videos using HAAR Cascade Classifier
Krish Naik
14 Reading, Writing and Displaying images with Opencv| OpenCV Tutorial
Reading, Writing and Displaying images with Opencv| OpenCV Tutorial
Krish Naik
15 OpenCV Installation | OpenCV tutorial
OpenCV Installation | OpenCV tutorial
Krish Naik
16 Face and Eye Detection from Images using HAAR Cascade Classifier
Face and Eye Detection from Images using HAAR Cascade Classifier
Krish Naik
17 Car Detection using HAAR Cascade and Opencv from Videos.
Car Detection using HAAR Cascade and Opencv from Videos.
Krish Naik
18 Using OpenFace for Face recognition in Keras
Using OpenFace for Face recognition in Keras
Krish Naik
19 OpenPose Tutorial with Tensorflow
OpenPose Tutorial with Tensorflow
Krish Naik
20 Multiple Linear Regression using python and sklearn
Multiple Linear Regression using python and sklearn
Krish Naik
21 Dimensional Reduction| Principal Component Analysis
Dimensional Reduction| Principal Component Analysis
Krish Naik
22 Movie Recommender System using Python
Movie Recommender System using Python
Krish Naik
23 TPR,FPR,FNR,TNR, Confusion Matrix
TPR,FPR,FNR,TNR, Confusion Matrix
Krish Naik
24 Precision, Recall and F1-Score
Precision, Recall and F1-Score
Krish Naik
25 Artificial Neural Network for Customer's Exit Prediction from Bank
Artificial Neural Network for Customer's Exit Prediction from Bank
Krish Naik
26 GridSearchCV- Select the best hyperparameter for any Classification Model
GridSearchCV- Select the best hyperparameter for any Classification Model
Krish Naik
27 RandomizedSearchCV- Select the best hyperparameter for any Classification Model
RandomizedSearchCV- Select the best hyperparameter for any Classification Model
Krish Naik
28 K Nearest Neighbor classification with Intuition and practical solution
K Nearest Neighbor classification with Intuition and practical solution
Krish Naik
29 K Means Clustering Intuition
K Means Clustering Intuition
Krish Naik
30 Create custom Alexa Skill- Lambda function- Part2
Create custom Alexa Skill- Lambda function- Part2
Krish Naik
31 Hierarchical Clustering intuition
Hierarchical Clustering intuition
Krish Naik
32 Implement Transfer Learning with a generic Code Template
Implement Transfer Learning with a generic Code Template
Krish Naik
33 Gender Classifier and Age Estimator using Resnet Convolution Neural Network
Gender Classifier and Age Estimator using Resnet Convolution Neural Network
Krish Naik
34 Unlock Your Application With Your Face using OpenCV
Unlock Your Application With Your Face using OpenCV
Krish Naik
35 Draw rectangle from webcam and sketch process it on a live feed
Draw rectangle from webcam and sketch process it on a live feed
Krish Naik
36 Complete Life Cycle of a Data Science Project
Complete Life Cycle of a Data Science Project
Krish Naik
37 How we can apply Machine Learning in Finance
How we can apply Machine Learning in Finance
Krish Naik
38 Deep Learning in Medical Science
Deep Learning in Medical Science
Krish Naik
39 How to switch your career to Data Science.
How to switch your career to Data Science.
Krish Naik
40 Linear Regression Mathematical Intuition
Linear Regression Mathematical Intuition
Krish Naik
41 Handle Categorical features using Python
Handle Categorical features using Python
Krish Naik
42 Machine Learning Algorithm- Which one to choose for your Problem?
Machine Learning Algorithm- Which one to choose for your Problem?
Krish Naik
43 DBSCAN Clustering Easily Explained with Implementation
DBSCAN Clustering Easily Explained with Implementation
Krish Naik
44 Curse of Dimensionality Easily explained| Machine Learning
Curse of Dimensionality Easily explained| Machine Learning
Krish Naik
45 Feature Selection Techniques Easily Explained | Machine Learning
Feature Selection Techniques Easily Explained | Machine Learning
Krish Naik
46 Tutorial 29-R square and Adjusted R square Clearly Explained| Machine Learning
Tutorial 29-R square and Adjusted R square Clearly Explained| Machine Learning
Krish Naik
47 Cross Validation using sklearn and python | Machine Learning
Cross Validation using sklearn and python | Machine Learning
Krish Naik
48 Handling Missing Data Easily Explained| Machine Learning
Handling Missing Data Easily Explained| Machine Learning
Krish Naik
49 Deploy Machine Learning Model using Flask
Deploy Machine Learning Model using Flask
Krish Naik
50 Deployment of Deep Learning Model using Flask
Deployment of Deep Learning Model using Flask
Krish Naik
51 How to Visualize Multiple Linear Regression in python
How to Visualize Multiple Linear Regression in python
Krish Naik
52 K Nearest Neighbour Easily Explained with Implementation
K Nearest Neighbour Easily Explained with Implementation
Krish Naik
53 Predicting Heart Disease using Machine Learning
Predicting Heart Disease using Machine Learning
Krish Naik
54 Predicting Lungs Disease using Deep Learning
Predicting Lungs Disease using Deep Learning
Krish Naik
55 Stock Sentiment Analysis using News Headlines
Stock Sentiment Analysis using News Headlines
Krish Naik
56 Random Forest(Bootstrap Aggregation) Easily Explained
Random Forest(Bootstrap Aggregation) Easily Explained
Krish Naik
57 Voting Classifier(Hard Voting and Soft Voting Classifier)
Voting Classifier(Hard Voting and Soft Voting Classifier)
Krish Naik
58 Credit Card Fraud Detection using Machine Learning from Kaggle
Credit Card Fraud Detection using Machine Learning from Kaggle
Krish Naik
59 Hyperparameter Optimization for Xgboost
Hyperparameter Optimization for Xgboost
Krish Naik
60 Tutorial 45-Handling imbalanced Dataset  using python- Part 1
Tutorial 45-Handling imbalanced Dataset using python- Part 1
Krish Naik

Related Reads

📰
[Tutorial] Deploying Liferay on Upsun with Postgres and Elasticsearch 🚀
Learn to deploy Liferay on Upsun with Postgres and Elasticsearch for streamlined enterprise platform setup
Dev.to · Flora Brandão
📰
Architecting for APAC: Why You Should Deploy Bare Metal in Malaysia 🇲🇾
Learn why deploying bare metal in Malaysia can improve network performance for Southeast Asian users and how to architect for low latency and high throughput
Dev.to · Alyssa Valdezz
📰
Day 19/40 - Kubernetes ConfigMaps and Secrets
Learn to use Kubernetes ConfigMaps and Secrets to inject values into pod manifests instead of hardcoding them
Dev.to · Adeoye Malumi
📰
Exactly-Once by Default: How Durable Execution Changed the Way I Build Automations
Learn how durable execution in DBOS solves the crash-in-the-middle problem and simplifies automation workflows
Dev.to · אחיה כהן
Up next
Containers on Amazon ECS with Mama J
AWS Developers
Watch →