Feature Engineering & R Script | Beginning Azure ML | Part 6

Data Science Dojo · Beginner ·📰 AI News & Updates ·11y ago

Key Takeaways

The video demonstrates feature engineering in Azure ML using the quantize module and R script module to bin continuous numerical values into categorical data and execute R scripts for data transformation and visualization.

Full Transcript

hello Internet welcome back to the data science community in the last two episodes we took the time to pre-process all of our data and now it's ready to be mined our data is now 100% ready to be mined and it's just begging for a machine learning model however this video is not about machine learning it's about feature engineering it's where we extract more data from our data so it's if you want to get your hand diry and just build a model go ahead and skip to the next video it won't affect your ability to build models for those of you who stay uh today we'll be going over how to feature engineer in asml and introduce you to the rscript module I'm not going to be teaching you R though that's another series for another day I'll simply be teaching you how to use the r module with an Asher ml studio so what is feature engineering well future engineering is when you can derive more data from your data since fature engineering is a huge topic I'm only going to cover one case today so in this case I'm going to cover specifically how to convert numerical continuous data into Bend categorical data so for example if you have humidity and temperature you might be able to quantify if it's a good day or a bad day from age you might be able to quantify if they were a child or an adult maybe with age and gender you can create a column that says is draftable or can go to war as a true false or maybe the number of years worked will equate to senior or junior staff level so here's an example of a table that I've engineered so you'll notice that we have a brand new column called age group and actually I derived that from our pre-existing column age so I Quantified that if they were below eight that they will be Quantified as a child if they were above eight they would be identified as an adult so from the newly engineered column age group I was able to create a data visualization pie chart that asked the question did being a child affect whether or not you lived or died and from this graph children survived on average 70% all right let's get to it so you want to go into Data transformation under scale and red you'll find a module called the quantize module so just simply drag your data into it the quantize module will let you bend your data with the child an adult example I simply used the custom edges binning mode and I used a cut off of eight so this is going to create a two classification output any person's age who is less than eight will be bid as one and every single person that's over the age of8 will be Bend as two and I'm going to launch the column selector and specify that I want no columns but I want to s the column name and the column I'm trying to Target to bin is H so I'm just going to select that and hit okay and then I'm going to run the model once that's done running go in and visualize the quantize data and you will notice that there is a brand new column added to our data set and you will notice that every time there's a one it means there's a child below eight and anytime there's a two it means it's an adult above eight and that's really all there is to it you can rename age Quant if you want age group or something more meaningful but but I'll let you guys do that by yourself you guys already know how to do that I showed you in a previous video and now what I'm going to show you is the r module so simply go under R languages and modules just drag in execute our script module the r script module can take in three input parameters and two output returns it can take up to two data sets the second one being optional it can take in a script bundle and it can output an our device as well as the results today we'll only be working with one data set so go ahead and drag that into the rscript module and if you click on the r module you'll not that there's a whole bunch of prepackaged code already there for and what this code serves to do is it opens up the API a little bit for you to see what's inside of it so this line right here is actually going to Output to this node right down here and anything you plot or anything you visualize such as a histogram or anything like that is actually going to go into the r device output and then you'll notice that these two up here are actually reading the parameters so they just said one M import one is actually going to read from the first node and then the second node is going to read from from the second data set node and this line right here only serves to bind the two data sets together that you did up there into the bottom output dat this module is great for copying and pasting code that you might already have on hand or you can just start typing our code yourself into this module now keep in mind commenting is a little different than what you're probably used to in R commenting has to be on its own specific line the pound symbol also needs to be the first thing in the line if you're doing a comment so for example if I had a space to begin the comment this would actually break the entire comment this pound symbol actually needs to go first and you also can't comment in line like you do here since the pound symbol actually needs to be the first thing in the line now that you have a quick overview of what the r module can do I'm going to pull up some quick R code that I've written before and this code is actually meant to be copy and pasted into R studio and this R code is doing the same thing that we did earlier with the quantise module it is bending the numerical uh column of age into child and adult categorical bins and then it's creating a pie chart at the very end so this code is actually meant to be copy and pasted in our studio I'm just going to show you an example of that real quick I have R Studio opened up and I'm just going to copy and paste and it's going to go into R Studio it's going to reform and transform the data and create a pie chart but I'm going to show you how this code can quickly be adapted to the r module so let's go over this code really quick so Titanic 3 is basically the our fully transformed data with the child and adult group added in and then you'll notice here that we're reading from a CSV in this code specifically however uh since we already have the data feeding in as data set one we'll just reroute Titanic uh later so we'll copy everything except for the top line into our code so I'm going to go ahead and copy paste everything except for the top row and then I'm just going to paste it into the rscript module and you will notice that our Titanic data set's reading in data set one and we don't have a data set 2 so I'm just going to go ahead and overwrite that as well I'm going to leave the final output Port though cuz cuz we're going to have to Output our data so Titanic 3 is what we want to Output so that's our results data after all the Transformations taking place so we're going to insert that into the man output one and we also notice that the Titanic data sets refer to as Titanic in this script so instead of data set one I'm just going to rename it to Titanic now our R script is successfully adapted to our R module and I'll just run the script when that's done just go ahead and visualize what happened in the results data so I'm just going to visualize that and then you will notice that age group got successfully added to the end and you will notice that every time that someone's below eight it's labeled as a child and anyone else is labeled as an adult and the r device if we visualize that that basically plotted our pie chart the pie chart got outputed to the r device and the output of Titanic 3 got sent into the results data also which you will notice that we rerouted data one into and renamed into Titanic that concludes the tutorial on how to feature engineer in Asher ml where I introduced the quantize module as well as the rscript module join us next time where we build our first machine learning model to predict whether or not a person would have survived a Titanic moving forward I will not be using the data set that we engineered in this video we will instead resume the tutorial where we left off last time after we finished scrubbing all of our data of missing value if you like what you just saw subscribe to our Channel or leave us a comment let us know if there's a topic you want us to cover and be sure to sure to check us out at data science dojo.com until next time

Original Description

Learn to bin continuous numerical values into a two-class categorical values using the quantize module. Also get introduced to the R Script within ML studio. Watch the updated playlist: https://hubs.ly/H0hNXgq0 0:44 What is feature engineering? 1:22 Feature Engineering Example 1:56 Feature Engineer with the Quantize Module 3:04 R Script Module 4:13 Commenting in the R Module 4:40 Adapting existing R code to the Azure R Module Titanic Data Set (train.csv): https://www.kaggle.com/c/titanic/data -- Learn more about Data Science Dojo here: https://hubs.ly/H0hNXgB0 See what our past attendees are saying here: https://hubs.ly/H0hNXDg0 -- At Data Science Dojo, we're extremely passionate about data science. Our in-person data science training has been attended by more than 4000+ employees from over 800 companies globally, including many leaders in tech like Microsoft, Apple, and Facebook. -- Like Us: https://www.facebook.com/datascienced... Follow Us: https://plus.google.com/+Datasciencedojo Connect with Us: https://www.linkedin.com/company/data... Also find us on: Instagram: https://www.instagram.com/data_science_dojo/ Vimeo: https://vimeo.com/datasciencedojo
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Science Dojo · Data Science Dojo · 7 of 60

1 Feature Engineering and Predictive Modeling | Data Analytics with R and Azure ML | Community Webinar
Feature Engineering and Predictive Modeling | Data Analytics with R and Azure ML | Community Webinar
Data Science Dojo
2 Data Exploration and Visualization | Beginning Azure ML | Part 3
Data Exploration and Visualization | Beginning Azure ML | Part 3
Data Science Dojo
3 Reading External Data Sources | Beginning Azure ML | Part 2
Reading External Data Sources | Beginning Azure ML | Part 2
Data Science Dojo
4 Importing Data, Accessing, & Creating a New Experiment | Beginning Azure ML | Part 1
Importing Data, Accessing, & Creating a New Experiment | Beginning Azure ML | Part 1
Data Science Dojo
5 Casting Columns & Renaming Columns | Beginning Azure ML | Part 4
Casting Columns & Renaming Columns | Beginning Azure ML | Part 4
Data Science Dojo
6 Scrub Missing Values & Project Columns | Beginning Azure ML | Part 5
Scrub Missing Values & Project Columns | Beginning Azure ML | Part 5
Data Science Dojo
Feature Engineering & R Script | Beginning Azure ML | Part 6
Feature Engineering & R Script | Beginning Azure ML | Part 6
Data Science Dojo
8 Building Your First Model | Beginning Azure ML |  Part 7
Building Your First Model | Beginning Azure ML | Part 7
Data Science Dojo
9 Run and Fine-Tune Multiple Models | Beginning Azure ML | Part 8
Run and Fine-Tune Multiple Models | Beginning Azure ML | Part 8
Data Science Dojo
10 Deploying Your First Predictive Model As a Web Service | Beginning Azure ML | Part 9
Deploying Your First Predictive Model As a Web Service | Beginning Azure ML | Part 9
Data Science Dojo
11 Using R API to Obtain Predictions From Your Web Service Beginning Azure ML | Part 10
Using R API to Obtain Predictions From Your Web Service Beginning Azure ML | Part 10
Data Science Dojo
12 Using Python API to Obtain Predictions From Your Web Service | Beginning Azure ML | Part 11
Using Python API to Obtain Predictions From Your Web Service | Beginning Azure ML | Part 11
Data Science Dojo
13 Twitter Sentiment Analysis | Natural Language Processing | Community Webinar
Twitter Sentiment Analysis | Natural Language Processing | Community Webinar
Data Science Dojo
14 Listening to the Melody of the Universe (LIGO Gravitational Waves Presentation) | Community Webinar
Listening to the Melody of the Universe (LIGO Gravitational Waves Presentation) | Community Webinar
Data Science Dojo
15 David Wechsler on the Impact of Data Science Bootcamp
David Wechsler on the Impact of Data Science Bootcamp
Data Science Dojo
16 Andrew Choi on the Impact of Data Science Bootcamp
Andrew Choi on the Impact of Data Science Bootcamp
Data Science Dojo
17 Microsoft's Software Engineer Shares Her Experience with Data Science Bootcamp
Microsoft's Software Engineer Shares Her Experience with Data Science Bootcamp
Data Science Dojo
18 Michael DAndrea on the Impact of Data Science Bootcamp
Michael DAndrea on the Impact of Data Science Bootcamp
Data Science Dojo
19 Data Driven Decision-Making with Data Science Bootcamp: Artem Kopelev's Revelation
Data Driven Decision-Making with Data Science Bootcamp: Artem Kopelev's Revelation
Data Science Dojo
20 Learn the Fundamentals of Data Science: Srinivas Rao's Experience with Data Science Bootcamp
Learn the Fundamentals of Data Science: Srinivas Rao's Experience with Data Science Bootcamp
Data Science Dojo
21 Re-Learning Data Science with Data Science Bootcamp: Analyst's Revelation
Re-Learning Data Science with Data Science Bootcamp: Analyst's Revelation
Data Science Dojo
22 Scale R to Big Data with Hadoop & Spark | Community Webinar
Scale R to Big Data with Hadoop & Spark | Community Webinar
Data Science Dojo
23 Enhancing Skills with Data Science Bootcamp: Sharon Lane-Getaz's Revelation
Enhancing Skills with Data Science Bootcamp: Sharon Lane-Getaz's Revelation
Data Science Dojo
24 Ryan DeMartino on the Impact of Data Science Bootcamp
Ryan DeMartino on the Impact of Data Science Bootcamp
Data Science Dojo
25 Software Engineer at Microsoft Reveals About His Experience with Data Science Bootcamp
Software Engineer at Microsoft Reveals About His Experience with Data Science Bootcamp
Data Science Dojo
26 Wade Wimer on the Impact of Data Science Bootcamp
Wade Wimer on the Impact of Data Science Bootcamp
Data Science Dojo
27 Analyzing Data with Data Science Bootcamp: Hannah Richta's Revelation
Analyzing Data with Data Science Bootcamp: Hannah Richta's Revelation
Data Science Dojo
28 Applying Data Science Skills to The Current Role with Bootcamp: Marcos Lacayo's Revelation
Applying Data Science Skills to The Current Role with Bootcamp: Marcos Lacayo's Revelation
Data Science Dojo
29 Lance Milner on the Impact of Data Science Bootcamp
Lance Milner on the Impact of Data Science Bootcamp
Data Science Dojo
30 Deloitte's Data Scientist Revelation: Learning Predictive Analytics with Data Science Bootcamp
Deloitte's Data Scientist Revelation: Learning Predictive Analytics with Data Science Bootcamp
Data Science Dojo
31 Rajesh Patil's Experience at Data Science Bootcamp As an Enterprise Architect
Rajesh Patil's Experience at Data Science Bootcamp As an Enterprise Architect
Data Science Dojo
32 Michael Atlin on the Impact of Data Science Bootcamp
Michael Atlin on the Impact of Data Science Bootcamp
Data Science Dojo
33 Amina Tariq's In-Person Experience at Data Science Bootcamp
Amina Tariq's In-Person Experience at Data Science Bootcamp
Data Science Dojo
34 Ceo's Revelation about Data Science Bootcamp
Ceo's Revelation about Data Science Bootcamp
Data Science Dojo
35 Stephen Miller Describes His Experience at Data Science Dojo's Bootcamp
Stephen Miller Describes His Experience at Data Science Dojo's Bootcamp
Data Science Dojo
36 Kevin Hillaker on the Impact of Data Science Bootcamp
Kevin Hillaker on the Impact of Data Science Bootcamp
Data Science Dojo
37 Marko Topalovic's Experience with Data Science Bootcamp
Marko Topalovic's Experience with Data Science Bootcamp
Data Science Dojo
38 Text Analytics With Python, Cognitive Services & PowerBI | Data Analytics | Community Webinar
Text Analytics With Python, Cognitive Services & PowerBI | Data Analytics | Community Webinar
Data Science Dojo
39 Unisys Manager's Revelation: Visualizing Real Time Data with Data Science Bootcamp
Unisys Manager's Revelation: Visualizing Real Time Data with Data Science Bootcamp
Data Science Dojo
40 Learn Data Mining with Data Science Bootcamp: Ryan LaBrie's Revelation
Learn Data Mining with Data Science Bootcamp: Ryan LaBrie's Revelation
Data Science Dojo
41 Vang Xiong on the Impact of Data Science Bootcamp
Vang Xiong on the Impact of Data Science Bootcamp
Data Science Dojo
42 Data Scientist's Experience at Our Data Science Bootcamp
Data Scientist's Experience at Our Data Science Bootcamp
Data Science Dojo
43 Alejandro Wolf Yadlin on the Impact of Data Science Bootcamp
Alejandro Wolf Yadlin on the Impact of Data Science Bootcamp
Data Science Dojo
44 Introduction To Titanic Kaggle Competition | Part 1
Introduction To Titanic Kaggle Competition | Part 1
Data Science Dojo
45 Learning How to Code in R with Data Science Bootcamp: Priscilla Mannuel's Revelation
Learning How to Code in R with Data Science Bootcamp: Priscilla Mannuel's Revelation
Data Science Dojo
46 Andrew Berman On Why Data Science Bootcamp Is Better Fit for Him
Andrew Berman On Why Data Science Bootcamp Is Better Fit for Him
Data Science Dojo
47 How To Do Titanic Kaggle Competition in R | Part 3.1
How To Do Titanic Kaggle Competition in R | Part 3.1
Data Science Dojo
48 How to do the Titanic Kaggle competition in R | Part 3.1
How to do the Titanic Kaggle competition in R | Part 3.1
Data Science Dojo
49 Delve Deeper into Data Science with Data Science Bootcamp
Delve Deeper into Data Science with Data Science Bootcamp
Data Science Dojo
50 Bank of America Data Scientist Reveals His Experience of Data Science Bootcamp
Bank of America Data Scientist Reveals His Experience of Data Science Bootcamp
Data Science Dojo
51 Shaena Montanari on the Impact of Data Science Bootcamp
Shaena Montanari on the Impact of Data Science Bootcamp
Data Science Dojo
52 Types of Sampling | Introduction to Data Mining | Part 12
Types of Sampling | Introduction to Data Mining | Part 12
Data Science Dojo
53 Sampling for Data Selection | Introduction to Data Mining | Part 11
Sampling for Data Selection | Introduction to Data Mining | Part 11
Data Science Dojo
54 Data Aggregation | Introduction to Data Mining | Part 10
Data Aggregation | Introduction to Data Mining | Part 10
Data Science Dojo
55 Data Cleaning | Introduction to Data Mining | Part 9
Data Cleaning | Introduction to Data Mining | Part 9
Data Science Dojo
56 Missing & Duplicated Data | Introduction to Data Mining | Part 8
Missing & Duplicated Data | Introduction to Data Mining | Part 8
Data Science Dojo
57 Data Noise | Introduction to Data Mining | Part 7
Data Noise | Introduction to Data Mining | Part 7
Data Science Dojo
58 Graph and Ordered Data | Introduction to Data Mining | Part 5
Graph and Ordered Data | Introduction to Data Mining | Part 5
Data Science Dojo
59 Document Data & Transaction Data | Introduction to Data Mining | Part 4
Document Data & Transaction Data | Introduction to Data Mining | Part 4
Data Science Dojo
60 Data Quality | Introduction to Data Mining | Part 6
Data Quality | Introduction to Data Mining | Part 6
Data Science Dojo

This video teaches feature engineering in Azure ML using the quantize module and R script module, covering data transformation, binning numerical continuous data, and R script execution. It provides a hands-on introduction to Azure ML and R scripting for data science tasks.

Key Takeaways
  1. Drag data into the quantize module
  2. Select the custom edges binning mode and specify a cutoff value
  3. Launch the column selector and specify the column to target
  4. Run the model
  5. Visualize the quantized data
  6. Drag data set into R script module
  7. Bind two data sets together using R module
  8. Adapt R code to R module for data transformation and visualization
  9. Visualize data using R device and output to R device
💡 The quantize module in Azure ML can be used to bin numerical continuous data into categorical data, and the R script module can be used to execute R scripts for data transformation and visualization, providing a powerful combination for feature engineering tasks.

Related AI Lessons

AI: Energy Taker or Energy Maker
Learn how rising data center energy demands can catalyze a clean energy transition and why it matters for sustainable AI development
Medium · AI
When AI Asks for More Electricity Than a Country Can Imagine
AI's increasing power consumption is causing concerns, learn why it matters for data centers and energy supply
Medium · AI
You Are Not Behind. The World Is.
You're not behind, the world is still adapting to AI, and it's okay to take your time to learn and grow
Medium · AI
Career choice with the advent of AI - pure Computer Science or learn software with a background of core engineering area
Learn how to choose between a Computer Science and Engineering career path or combining programming with a core engineering background in the age of AI
Dev.to AI

Chapters (6)

0:44 What is feature engineering?
1:22 Feature Engineering Example
1:56 Feature Engineer with the Quantize Module
3:04 R Script Module
4:13 Commenting in the R Module
4:40 Adapting existing R code to the Azure R Module
Up next
Generative AI
Alea IT Solutions
Watch →