Feature Engineering & R Script | Beginning Azure ML | Part 6
Key Takeaways
The video demonstrates feature engineering in Azure ML using the quantize module and R script module to bin continuous numerical values into categorical data and execute R scripts for data transformation and visualization.
Full Transcript
hello Internet welcome back to the data science community in the last two episodes we took the time to pre-process all of our data and now it's ready to be mined our data is now 100% ready to be mined and it's just begging for a machine learning model however this video is not about machine learning it's about feature engineering it's where we extract more data from our data so it's if you want to get your hand diry and just build a model go ahead and skip to the next video it won't affect your ability to build models for those of you who stay uh today we'll be going over how to feature engineer in asml and introduce you to the rscript module I'm not going to be teaching you R though that's another series for another day I'll simply be teaching you how to use the r module with an Asher ml studio so what is feature engineering well future engineering is when you can derive more data from your data since fature engineering is a huge topic I'm only going to cover one case today so in this case I'm going to cover specifically how to convert numerical continuous data into Bend categorical data so for example if you have humidity and temperature you might be able to quantify if it's a good day or a bad day from age you might be able to quantify if they were a child or an adult maybe with age and gender you can create a column that says is draftable or can go to war as a true false or maybe the number of years worked will equate to senior or junior staff level so here's an example of a table that I've engineered so you'll notice that we have a brand new column called age group and actually I derived that from our pre-existing column age so I Quantified that if they were below eight that they will be Quantified as a child if they were above eight they would be identified as an adult so from the newly engineered column age group I was able to create a data visualization pie chart that asked the question did being a child affect whether or not you lived or died and from this graph children survived on average 70% all right let's get to it so you want to go into Data transformation under scale and red you'll find a module called the quantize module so just simply drag your data into it the quantize module will let you bend your data with the child an adult example I simply used the custom edges binning mode and I used a cut off of eight so this is going to create a two classification output any person's age who is less than eight will be bid as one and every single person that's over the age of8 will be Bend as two and I'm going to launch the column selector and specify that I want no columns but I want to s the column name and the column I'm trying to Target to bin is H so I'm just going to select that and hit okay and then I'm going to run the model once that's done running go in and visualize the quantize data and you will notice that there is a brand new column added to our data set and you will notice that every time there's a one it means there's a child below eight and anytime there's a two it means it's an adult above eight and that's really all there is to it you can rename age Quant if you want age group or something more meaningful but but I'll let you guys do that by yourself you guys already know how to do that I showed you in a previous video and now what I'm going to show you is the r module so simply go under R languages and modules just drag in execute our script module the r script module can take in three input parameters and two output returns it can take up to two data sets the second one being optional it can take in a script bundle and it can output an our device as well as the results today we'll only be working with one data set so go ahead and drag that into the rscript module and if you click on the r module you'll not that there's a whole bunch of prepackaged code already there for and what this code serves to do is it opens up the API a little bit for you to see what's inside of it so this line right here is actually going to Output to this node right down here and anything you plot or anything you visualize such as a histogram or anything like that is actually going to go into the r device output and then you'll notice that these two up here are actually reading the parameters so they just said one M import one is actually going to read from the first node and then the second node is going to read from from the second data set node and this line right here only serves to bind the two data sets together that you did up there into the bottom output dat this module is great for copying and pasting code that you might already have on hand or you can just start typing our code yourself into this module now keep in mind commenting is a little different than what you're probably used to in R commenting has to be on its own specific line the pound symbol also needs to be the first thing in the line if you're doing a comment so for example if I had a space to begin the comment this would actually break the entire comment this pound symbol actually needs to go first and you also can't comment in line like you do here since the pound symbol actually needs to be the first thing in the line now that you have a quick overview of what the r module can do I'm going to pull up some quick R code that I've written before and this code is actually meant to be copy and pasted into R studio and this R code is doing the same thing that we did earlier with the quantise module it is bending the numerical uh column of age into child and adult categorical bins and then it's creating a pie chart at the very end so this code is actually meant to be copy and pasted in our studio I'm just going to show you an example of that real quick I have R Studio opened up and I'm just going to copy and paste and it's going to go into R Studio it's going to reform and transform the data and create a pie chart but I'm going to show you how this code can quickly be adapted to the r module so let's go over this code really quick so Titanic 3 is basically the our fully transformed data with the child and adult group added in and then you'll notice here that we're reading from a CSV in this code specifically however uh since we already have the data feeding in as data set one we'll just reroute Titanic uh later so we'll copy everything except for the top line into our code so I'm going to go ahead and copy paste everything except for the top row and then I'm just going to paste it into the rscript module and you will notice that our Titanic data set's reading in data set one and we don't have a data set 2 so I'm just going to go ahead and overwrite that as well I'm going to leave the final output Port though cuz cuz we're going to have to Output our data so Titanic 3 is what we want to Output so that's our results data after all the Transformations taking place so we're going to insert that into the man output one and we also notice that the Titanic data sets refer to as Titanic in this script so instead of data set one I'm just going to rename it to Titanic now our R script is successfully adapted to our R module and I'll just run the script when that's done just go ahead and visualize what happened in the results data so I'm just going to visualize that and then you will notice that age group got successfully added to the end and you will notice that every time that someone's below eight it's labeled as a child and anyone else is labeled as an adult and the r device if we visualize that that basically plotted our pie chart the pie chart got outputed to the r device and the output of Titanic 3 got sent into the results data also which you will notice that we rerouted data one into and renamed into Titanic that concludes the tutorial on how to feature engineer in Asher ml where I introduced the quantize module as well as the rscript module join us next time where we build our first machine learning model to predict whether or not a person would have survived a Titanic moving forward I will not be using the data set that we engineered in this video we will instead resume the tutorial where we left off last time after we finished scrubbing all of our data of missing value if you like what you just saw subscribe to our Channel or leave us a comment let us know if there's a topic you want us to cover and be sure to sure to check us out at data science dojo.com until next time
Original Description
Learn to bin continuous numerical values into a two-class categorical values using the quantize module. Also get introduced to the R Script within ML studio.
Watch the updated playlist: https://hubs.ly/H0hNXgq0
0:44 What is feature engineering?
1:22 Feature Engineering Example
1:56 Feature Engineer with the Quantize Module
3:04 R Script Module
4:13 Commenting in the R Module
4:40 Adapting existing R code to the Azure R Module
Titanic Data Set (train.csv):
https://www.kaggle.com/c/titanic/data
--
Learn more about Data Science Dojo here:
https://hubs.ly/H0hNXgB0
See what our past attendees are saying here:
https://hubs.ly/H0hNXDg0
--
At Data Science Dojo, we're extremely passionate about data science. Our in-person data science training has been attended by more than 4000+ employees from over 800 companies globally, including many leaders in tech like Microsoft, Apple, and Facebook.
--
Like Us: https://www.facebook.com/datascienced...
Follow Us: https://plus.google.com/+Datasciencedojo
Connect with Us: https://www.linkedin.com/company/data...
Also find us on:
Instagram: https://www.instagram.com/data_science_dojo/
Vimeo: https://vimeo.com/datasciencedojo
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Data Science Dojo · Data Science Dojo · 7 of 60
1
2
3
4
5
6
▶
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Feature Engineering and Predictive Modeling | Data Analytics with R and Azure ML | Community Webinar
Data Science Dojo
Data Exploration and Visualization | Beginning Azure ML | Part 3
Data Science Dojo
Reading External Data Sources | Beginning Azure ML | Part 2
Data Science Dojo
Importing Data, Accessing, & Creating a New Experiment | Beginning Azure ML | Part 1
Data Science Dojo
Casting Columns & Renaming Columns | Beginning Azure ML | Part 4
Data Science Dojo
Scrub Missing Values & Project Columns | Beginning Azure ML | Part 5
Data Science Dojo
Feature Engineering & R Script | Beginning Azure ML | Part 6
Data Science Dojo
Building Your First Model | Beginning Azure ML | Part 7
Data Science Dojo
Run and Fine-Tune Multiple Models | Beginning Azure ML | Part 8
Data Science Dojo
Deploying Your First Predictive Model As a Web Service | Beginning Azure ML | Part 9
Data Science Dojo
Using R API to Obtain Predictions From Your Web Service Beginning Azure ML | Part 10
Data Science Dojo
Using Python API to Obtain Predictions From Your Web Service | Beginning Azure ML | Part 11
Data Science Dojo
Twitter Sentiment Analysis | Natural Language Processing | Community Webinar
Data Science Dojo
Listening to the Melody of the Universe (LIGO Gravitational Waves Presentation) | Community Webinar
Data Science Dojo
David Wechsler on the Impact of Data Science Bootcamp
Data Science Dojo
Andrew Choi on the Impact of Data Science Bootcamp
Data Science Dojo
Microsoft's Software Engineer Shares Her Experience with Data Science Bootcamp
Data Science Dojo
Michael DAndrea on the Impact of Data Science Bootcamp
Data Science Dojo
Data Driven Decision-Making with Data Science Bootcamp: Artem Kopelev's Revelation
Data Science Dojo
Learn the Fundamentals of Data Science: Srinivas Rao's Experience with Data Science Bootcamp
Data Science Dojo
Re-Learning Data Science with Data Science Bootcamp: Analyst's Revelation
Data Science Dojo
Scale R to Big Data with Hadoop & Spark | Community Webinar
Data Science Dojo
Enhancing Skills with Data Science Bootcamp: Sharon Lane-Getaz's Revelation
Data Science Dojo
Ryan DeMartino on the Impact of Data Science Bootcamp
Data Science Dojo
Software Engineer at Microsoft Reveals About His Experience with Data Science Bootcamp
Data Science Dojo
Wade Wimer on the Impact of Data Science Bootcamp
Data Science Dojo
Analyzing Data with Data Science Bootcamp: Hannah Richta's Revelation
Data Science Dojo
Applying Data Science Skills to The Current Role with Bootcamp: Marcos Lacayo's Revelation
Data Science Dojo
Lance Milner on the Impact of Data Science Bootcamp
Data Science Dojo
Deloitte's Data Scientist Revelation: Learning Predictive Analytics with Data Science Bootcamp
Data Science Dojo
Rajesh Patil's Experience at Data Science Bootcamp As an Enterprise Architect
Data Science Dojo
Michael Atlin on the Impact of Data Science Bootcamp
Data Science Dojo
Amina Tariq's In-Person Experience at Data Science Bootcamp
Data Science Dojo
Ceo's Revelation about Data Science Bootcamp
Data Science Dojo
Stephen Miller Describes His Experience at Data Science Dojo's Bootcamp
Data Science Dojo
Kevin Hillaker on the Impact of Data Science Bootcamp
Data Science Dojo
Marko Topalovic's Experience with Data Science Bootcamp
Data Science Dojo
Text Analytics With Python, Cognitive Services & PowerBI | Data Analytics | Community Webinar
Data Science Dojo
Unisys Manager's Revelation: Visualizing Real Time Data with Data Science Bootcamp
Data Science Dojo
Learn Data Mining with Data Science Bootcamp: Ryan LaBrie's Revelation
Data Science Dojo
Vang Xiong on the Impact of Data Science Bootcamp
Data Science Dojo
Data Scientist's Experience at Our Data Science Bootcamp
Data Science Dojo
Alejandro Wolf Yadlin on the Impact of Data Science Bootcamp
Data Science Dojo
Introduction To Titanic Kaggle Competition | Part 1
Data Science Dojo
Learning How to Code in R with Data Science Bootcamp: Priscilla Mannuel's Revelation
Data Science Dojo
Andrew Berman On Why Data Science Bootcamp Is Better Fit for Him
Data Science Dojo
How To Do Titanic Kaggle Competition in R | Part 3.1
Data Science Dojo
How to do the Titanic Kaggle competition in R | Part 3.1
Data Science Dojo
Delve Deeper into Data Science with Data Science Bootcamp
Data Science Dojo
Bank of America Data Scientist Reveals His Experience of Data Science Bootcamp
Data Science Dojo
Shaena Montanari on the Impact of Data Science Bootcamp
Data Science Dojo
Types of Sampling | Introduction to Data Mining | Part 12
Data Science Dojo
Sampling for Data Selection | Introduction to Data Mining | Part 11
Data Science Dojo
Data Aggregation | Introduction to Data Mining | Part 10
Data Science Dojo
Data Cleaning | Introduction to Data Mining | Part 9
Data Science Dojo
Missing & Duplicated Data | Introduction to Data Mining | Part 8
Data Science Dojo
Data Noise | Introduction to Data Mining | Part 7
Data Science Dojo
Graph and Ordered Data | Introduction to Data Mining | Part 5
Data Science Dojo
Document Data & Transaction Data | Introduction to Data Mining | Part 4
Data Science Dojo
Data Quality | Introduction to Data Mining | Part 6
Data Science Dojo
More on: ML Pipelines
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
AI: Energy Taker or Energy Maker
Medium · AI
When AI Asks for More Electricity Than a Country Can Imagine
Medium · AI
You Are Not Behind. The World Is.
Medium · AI
Career choice with the advent of AI - pure Computer Science or learn software with a background of core engineering area
Dev.to AI
Chapters (6)
0:44
What is feature engineering?
1:22
Feature Engineering Example
1:56
Feature Engineer with the Quantize Module
3:04
R Script Module
4:13
Commenting in the R Module
4:40
Adapting existing R code to the Azure R Module
🎓
Tutor Explanation
DeepCamp AI