Casting Columns & Renaming Columns | Beginning Azure ML | Part 4

Data Science Dojo · Beginner ·📰 AI News & Updates ·11y ago

Key Takeaways

The video covers preprocessing in Azure ML, specifically casting columns into categorical types and renaming columns for clarity, using tools like Azure ML, Descriptive Statistics Module, and Metadata Editor. It also briefly covers intellisense and descriptive statistics.

Full Transcript

hello Internet welcome back to the data science Community last episode we found out through data exploration and data visualization that our data set's still a little bit raw and will require some pre-processing before we can properly create machine learning models so in this episode I will go over with you three things the descriptive statistics module how to cast your columns into direct correct Class Type and how to rename your columns okay before we do any pre-processing I want to show you a quick module that you can also use to further explore your data so if you go and expand your uh modules and then select on statistical functions and then go to descriptive statistics and just drag that module in all right so and all you have to do now is feed your data directly into the descriptive statistics model now keep in mind this model is still an empty model there's we have to run our experiment again for it to fully go through so once the cloud has finished uh processing your module you can actually see that there's a check marks in the module itself and if you click on visualize in the module the module is actually no longer an empty shell of itself there's things inside of it but you can tell that there's something not right the mode looks very messed up when it comes to passenger ID right but everything else seems for the most part to be all right let's figure out why it broke all right so if you go under the Titanic Titanic data set itself and go to visualize so now that we're in the data you can see that there are 891 rows and passing passenger ID has 891 unique values so the mode broke because currently passenger ID is listed as numeric and the mode broke because there's an 891 way tie for the highest frequency value so that's a good indicator that passenger ID should have probably been a categorical value so we should recast uh from numerical into categorical survive should also be categorical pclass should also be categorical and sex should be converted from a string to a categorical as well as the embarked okay let's start our first pre-processing of the data so first we're going to do some casting of the data and to do that you would go under data transformation and under manipulation you will find something called a metadata editor so this is a module that that let you basically edit your columns right so metadata simply means data about the data so we're going to edit the data about the data so we're going to basically feed the Titanic data set into the metad DAT editor and let's actually remove this routing that we have from descriptive statistics so if you go into the metadata editor there's a number of things you can do right you can change the data type all right and you can change it from categorical to non-categorical or you can change the fields into a labels or a feature and then you can also change the column names as well so you want to start by selecting The Columns that you want to actually convert into categorical Val so launch the column selector over here and from that if you click on this column box see it it will show up all the columns that were available in your data set so we established last time that passenger ID should have been one survived pclass sex and Embark should all be categorical values so we want to check that so those are the columns that are now selected and now we want to go from categorical from unchanged into categorical and now you want to feed the output of the metadata editor into the descriptive statistics and then run your experiment again once that's completed you can actually go into the output of the metadata editor and then just visualize what's happening in there passenger ID is now categorical survived is now categorical P class is now categorical as well as sex and Embark now that we've ens shared that that works let's label this for good practice so this is categorical casting so now we know exactly what that does now if you notice if you go back into your original data set and visualize that everything's still numerical so this is still your raw data it's untouch still right so once it's only after it gets past this node does it get changed now you can actually go into your descriptive statistics and then visualize it and look it's not broken anymore because passenger idea is now casted correctly now before we move on I want to talk a little bit about the intelligence and how it works inside the metadata editor if you were noticed earlier when we watched the column selector that there was all the columns that were present in the data was already fed into a pre-generated box however if you just had a free floating metadata editor and just tried to launch the column selector you will notice that nothing shows up it's only after that you feed a data set into the metadata editor while the intell activate or a processed module that's outputting data as well so you can see if I fed the rendered metadata editor that we had previously casting categorical that intellisense will also show up however you will notice that this metadata has an exclamation mark next to it which means it's not processed yet which means it's an it's currently an empty module so if I tried to send into another metadata editor you will see that nothing shows up in the intelligence so if you're remembering last video I tried to do data visualization with the survived column before it was casted correctly into categorical data and it broke so now if we go back and ask the same question again did Sex have anything to do whether or not a person lived or died we can properly answer that question because survived is properly casted as categorical so let's do that so let's just compare that with survived and you can tell from this distribution that a disproportionate amount of men died in the Titanic and while we're here visualizing the we might as well go on to our next topic which is renaming columns and that can be done for a number of good reasons right cuz look at this column name CSP and parch right so is CSP stands for siblings and spouses that were with you so how big and how wide was your family that traveled you versus parch which is parent and child so how big was your immediate family how many parents did come with you and how many kids came with you and P class is also a good candidate for rename too right cuz it's right now it's it represents first class second class class third class but I think accommodation class would be a much better name for it so we've selected three columns as good candidates for renaming and each metadata editor can actually only rename one column so that means you need three metadata editors so drag either drag in three or even better if you can select one metadata editor contrl C and then contrl V and you can have as many metadata editors as you want just by copying and pasting so let's get one more out here and basically just line them up and just have them flow into each other like a waterfall so this data set is going to feed into this data set it's going to render it's going to feed into another data set let's launch the column selector and then so we were going to rename P class okay and we're going to rename P class accommodation class and for a bit of style you want this to be a single string so either use camel casing or use underscore snake uh casing because if you ever tried to export this back into a database the space between the column names will actually break the database so it's good to keep good style if you want to keep using your data and then let's move on to the next metadata editor so if you remember correctly what I said earlier the intellisense won't work here because it's being fed from a non- rendered metadata editor so I just happened to remember what the column was and that was C SP and keep in mind that case does matter in this case for column names and then for the bottom one I wanted to rename parch all right and then I wanted to rename parch into parent child and I we want to rename C SP into sibling spouse okay so now that that's all done we can just run it and now we have three metadata editors which are all doing different things so let's actually name that to be considered to other people who are actually going to work on uh the same experiment so this got renamed to accommodation class and this got renamed to SSP into sibling spouse and likewise uh parch got renamed into parent child all right so let's expand all those so now it's very apparent and clear what happened to all of the data and once that's done loading you can go to the very bottom node and visualize what's been done to your data and you will notice that sibling spouse was successfully renamed parent child as well as accommodation class so these are much more clear now when other people are working on this and that concludes the video tutorial on how to use the metadata Editor to rename your columns as well as to recast column so join us next time when we tackle this issue right here so you will notice that age has 177 missing values and that's actually going to cause problems because it's a numerical value and our models expects continuous data to stay continuous if you like what you just saw subscribe to our Channel or leave us a comment let us know if there's a topic you want us to cover and be sure to check us out at data science dojo.com until next time

Original Description

Preprocessing part I with Azure ML. Cast your columns into categorical types, and rename your columns for clarity. We also briefly cover intellisense and descriptive statistics. Watch updated playlist: https://hubs.ly/H0hNXvQ0 0:28 Briefly Cover Descriptive Statistics Module 1:58 Casting Columns, Metadata Editor 4:05 Intellisense (Auto-complete) 5:30 Renaming Columns Titanic Data Set (train.csv): https://www.kaggle.com/c/titanic/data -- Learn more about Data Science Dojo here: https://hubs.ly/H0hNX7m0 See what our past attendees are saying here: https://hubs.ly/H0hNXwb0 -- At Data Science Dojo, we're extremely passionate about data science. Our in-person data science training has been attended by more than 4000+ employees from over 800 companies globally, including many leaders in tech like Microsoft, Apple, and Facebook. -- Like Us: https://www.facebook.com/datascienced... Follow Us: https://plus.google.com/+Datasciencedojo Connect with Us: https://www.linkedin.com/company/data... Also find us on: Instagram: https://www.instagram.com/data_science_dojo/ Vimeo: https://vimeo.com/datasciencedojo
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Science Dojo · Data Science Dojo · 5 of 60

1 Feature Engineering and Predictive Modeling | Data Analytics with R and Azure ML | Community Webinar
Feature Engineering and Predictive Modeling | Data Analytics with R and Azure ML | Community Webinar
Data Science Dojo
2 Data Exploration and Visualization | Beginning Azure ML | Part 3
Data Exploration and Visualization | Beginning Azure ML | Part 3
Data Science Dojo
3 Reading External Data Sources | Beginning Azure ML | Part 2
Reading External Data Sources | Beginning Azure ML | Part 2
Data Science Dojo
4 Importing Data, Accessing, & Creating a New Experiment | Beginning Azure ML | Part 1
Importing Data, Accessing, & Creating a New Experiment | Beginning Azure ML | Part 1
Data Science Dojo
Casting Columns & Renaming Columns | Beginning Azure ML | Part 4
Casting Columns & Renaming Columns | Beginning Azure ML | Part 4
Data Science Dojo
6 Scrub Missing Values & Project Columns | Beginning Azure ML | Part 5
Scrub Missing Values & Project Columns | Beginning Azure ML | Part 5
Data Science Dojo
7 Feature Engineering & R Script | Beginning Azure ML | Part 6
Feature Engineering & R Script | Beginning Azure ML | Part 6
Data Science Dojo
8 Building Your First Model | Beginning Azure ML |  Part 7
Building Your First Model | Beginning Azure ML | Part 7
Data Science Dojo
9 Run and Fine-Tune Multiple Models | Beginning Azure ML | Part 8
Run and Fine-Tune Multiple Models | Beginning Azure ML | Part 8
Data Science Dojo
10 Deploying Your First Predictive Model As a Web Service | Beginning Azure ML | Part 9
Deploying Your First Predictive Model As a Web Service | Beginning Azure ML | Part 9
Data Science Dojo
11 Using R API to Obtain Predictions From Your Web Service Beginning Azure ML | Part 10
Using R API to Obtain Predictions From Your Web Service Beginning Azure ML | Part 10
Data Science Dojo
12 Using Python API to Obtain Predictions From Your Web Service | Beginning Azure ML | Part 11
Using Python API to Obtain Predictions From Your Web Service | Beginning Azure ML | Part 11
Data Science Dojo
13 Twitter Sentiment Analysis | Natural Language Processing | Community Webinar
Twitter Sentiment Analysis | Natural Language Processing | Community Webinar
Data Science Dojo
14 Listening to the Melody of the Universe (LIGO Gravitational Waves Presentation) | Community Webinar
Listening to the Melody of the Universe (LIGO Gravitational Waves Presentation) | Community Webinar
Data Science Dojo
15 David Wechsler on the Impact of Data Science Bootcamp
David Wechsler on the Impact of Data Science Bootcamp
Data Science Dojo
16 Andrew Choi on the Impact of Data Science Bootcamp
Andrew Choi on the Impact of Data Science Bootcamp
Data Science Dojo
17 Microsoft's Software Engineer Shares Her Experience with Data Science Bootcamp
Microsoft's Software Engineer Shares Her Experience with Data Science Bootcamp
Data Science Dojo
18 Michael DAndrea on the Impact of Data Science Bootcamp
Michael DAndrea on the Impact of Data Science Bootcamp
Data Science Dojo
19 Data Driven Decision-Making with Data Science Bootcamp: Artem Kopelev's Revelation
Data Driven Decision-Making with Data Science Bootcamp: Artem Kopelev's Revelation
Data Science Dojo
20 Learn the Fundamentals of Data Science: Srinivas Rao's Experience with Data Science Bootcamp
Learn the Fundamentals of Data Science: Srinivas Rao's Experience with Data Science Bootcamp
Data Science Dojo
21 Re-Learning Data Science with Data Science Bootcamp: Analyst's Revelation
Re-Learning Data Science with Data Science Bootcamp: Analyst's Revelation
Data Science Dojo
22 Scale R to Big Data with Hadoop & Spark | Community Webinar
Scale R to Big Data with Hadoop & Spark | Community Webinar
Data Science Dojo
23 Enhancing Skills with Data Science Bootcamp: Sharon Lane-Getaz's Revelation
Enhancing Skills with Data Science Bootcamp: Sharon Lane-Getaz's Revelation
Data Science Dojo
24 Ryan DeMartino on the Impact of Data Science Bootcamp
Ryan DeMartino on the Impact of Data Science Bootcamp
Data Science Dojo
25 Software Engineer at Microsoft Reveals About His Experience with Data Science Bootcamp
Software Engineer at Microsoft Reveals About His Experience with Data Science Bootcamp
Data Science Dojo
26 Wade Wimer on the Impact of Data Science Bootcamp
Wade Wimer on the Impact of Data Science Bootcamp
Data Science Dojo
27 Analyzing Data with Data Science Bootcamp: Hannah Richta's Revelation
Analyzing Data with Data Science Bootcamp: Hannah Richta's Revelation
Data Science Dojo
28 Applying Data Science Skills to The Current Role with Bootcamp: Marcos Lacayo's Revelation
Applying Data Science Skills to The Current Role with Bootcamp: Marcos Lacayo's Revelation
Data Science Dojo
29 Lance Milner on the Impact of Data Science Bootcamp
Lance Milner on the Impact of Data Science Bootcamp
Data Science Dojo
30 Deloitte's Data Scientist Revelation: Learning Predictive Analytics with Data Science Bootcamp
Deloitte's Data Scientist Revelation: Learning Predictive Analytics with Data Science Bootcamp
Data Science Dojo
31 Rajesh Patil's Experience at Data Science Bootcamp As an Enterprise Architect
Rajesh Patil's Experience at Data Science Bootcamp As an Enterprise Architect
Data Science Dojo
32 Michael Atlin on the Impact of Data Science Bootcamp
Michael Atlin on the Impact of Data Science Bootcamp
Data Science Dojo
33 Amina Tariq's In-Person Experience at Data Science Bootcamp
Amina Tariq's In-Person Experience at Data Science Bootcamp
Data Science Dojo
34 Ceo's Revelation about Data Science Bootcamp
Ceo's Revelation about Data Science Bootcamp
Data Science Dojo
35 Stephen Miller Describes His Experience at Data Science Dojo's Bootcamp
Stephen Miller Describes His Experience at Data Science Dojo's Bootcamp
Data Science Dojo
36 Kevin Hillaker on the Impact of Data Science Bootcamp
Kevin Hillaker on the Impact of Data Science Bootcamp
Data Science Dojo
37 Marko Topalovic's Experience with Data Science Bootcamp
Marko Topalovic's Experience with Data Science Bootcamp
Data Science Dojo
38 Text Analytics With Python, Cognitive Services & PowerBI | Data Analytics | Community Webinar
Text Analytics With Python, Cognitive Services & PowerBI | Data Analytics | Community Webinar
Data Science Dojo
39 Unisys Manager's Revelation: Visualizing Real Time Data with Data Science Bootcamp
Unisys Manager's Revelation: Visualizing Real Time Data with Data Science Bootcamp
Data Science Dojo
40 Learn Data Mining with Data Science Bootcamp: Ryan LaBrie's Revelation
Learn Data Mining with Data Science Bootcamp: Ryan LaBrie's Revelation
Data Science Dojo
41 Vang Xiong on the Impact of Data Science Bootcamp
Vang Xiong on the Impact of Data Science Bootcamp
Data Science Dojo
42 Data Scientist's Experience at Our Data Science Bootcamp
Data Scientist's Experience at Our Data Science Bootcamp
Data Science Dojo
43 Alejandro Wolf Yadlin on the Impact of Data Science Bootcamp
Alejandro Wolf Yadlin on the Impact of Data Science Bootcamp
Data Science Dojo
44 Introduction To Titanic Kaggle Competition | Part 1
Introduction To Titanic Kaggle Competition | Part 1
Data Science Dojo
45 Learning How to Code in R with Data Science Bootcamp: Priscilla Mannuel's Revelation
Learning How to Code in R with Data Science Bootcamp: Priscilla Mannuel's Revelation
Data Science Dojo
46 Andrew Berman On Why Data Science Bootcamp Is Better Fit for Him
Andrew Berman On Why Data Science Bootcamp Is Better Fit for Him
Data Science Dojo
47 How To Do Titanic Kaggle Competition in R | Part 3.1
How To Do Titanic Kaggle Competition in R | Part 3.1
Data Science Dojo
48 How to do the Titanic Kaggle competition in R | Part 3.1
How to do the Titanic Kaggle competition in R | Part 3.1
Data Science Dojo
49 Delve Deeper into Data Science with Data Science Bootcamp
Delve Deeper into Data Science with Data Science Bootcamp
Data Science Dojo
50 Bank of America Data Scientist Reveals His Experience of Data Science Bootcamp
Bank of America Data Scientist Reveals His Experience of Data Science Bootcamp
Data Science Dojo
51 Shaena Montanari on the Impact of Data Science Bootcamp
Shaena Montanari on the Impact of Data Science Bootcamp
Data Science Dojo
52 Types of Sampling | Introduction to Data Mining | Part 12
Types of Sampling | Introduction to Data Mining | Part 12
Data Science Dojo
53 Sampling for Data Selection | Introduction to Data Mining | Part 11
Sampling for Data Selection | Introduction to Data Mining | Part 11
Data Science Dojo
54 Data Aggregation | Introduction to Data Mining | Part 10
Data Aggregation | Introduction to Data Mining | Part 10
Data Science Dojo
55 Data Cleaning | Introduction to Data Mining | Part 9
Data Cleaning | Introduction to Data Mining | Part 9
Data Science Dojo
56 Missing & Duplicated Data | Introduction to Data Mining | Part 8
Missing & Duplicated Data | Introduction to Data Mining | Part 8
Data Science Dojo
57 Data Noise | Introduction to Data Mining | Part 7
Data Noise | Introduction to Data Mining | Part 7
Data Science Dojo
58 Graph and Ordered Data | Introduction to Data Mining | Part 5
Graph and Ordered Data | Introduction to Data Mining | Part 5
Data Science Dojo
59 Document Data & Transaction Data | Introduction to Data Mining | Part 4
Document Data & Transaction Data | Introduction to Data Mining | Part 4
Data Science Dojo
60 Data Quality | Introduction to Data Mining | Part 6
Data Quality | Introduction to Data Mining | Part 6
Data Science Dojo

This video teaches how to preprocess data in Azure ML by casting columns into categorical types and renaming columns for clarity, and briefly covers intellisense and descriptive statistics. It is essential for beginners in machine learning to understand data preprocessing and exploration. By following the steps in this video, viewers can learn how to prepare their data for machine learning models.

Key Takeaways
  1. Run experiment again after feeding data into descriptive statistics module
  2. Cast Passenger ID column from numerical to categorical
  3. Cast Survived, Pclass, Sex, and Embarked columns from numerical/string to categorical
  4. Use metadata editor to edit column data types and names
  5. Rename columns for good practice and style
  6. Use metadata editors to rename columns
  7. Copy and paste metadata editors to create multiple instances
  8. Rename P class accommodation class
  9. Rename C SP to sibling spouse
  10. Rename parch to parent child
💡 Casting columns into categorical types and renaming columns for clarity are crucial steps in data preprocessing, and using tools like Azure ML and Metadata Editor can simplify these processes.

Related AI Lessons

Chapters (4)

0:28 Briefly Cover Descriptive Statistics Module
1:58 Casting Columns, Metadata Editor
4:05 Intellisense (Auto-complete)
5:30 Renaming Columns
Up next
Tasty Weird! Book 16 by Anh Do · Audiobook preview
Google Play Books
Watch →