The Essential Programming Concepts For Learning Data Science

Ken Jee · Beginner ·🧬 Deep Learning ·3y ago

Key Takeaways

The video discusses essential programming concepts for learning data science, including flow control structures, data types, functions, methods, and libraries, with a focus on Python and its applications in data science.

Full Transcript

programming might be the single most important skill for data scientists and many other data practitioners coding allows you to construct predictive models beautiful visualizations and build products that scale to millions of people this skill is extremely important but actually learning coding is a massive barrier for those looking to harness its power when I was starting out I knew I wanted to leverage programming for data unfortunately programming was such a large field that I didn't know where to start or what direction I should take I knew I needed to get started so I just pulled up code academy and went about learning aimlessly at the time I didn't know what languages were most practical what concepts would help me in data analysis or even what Ides to use I spent so much time just fumbling through different tutorials that I was really frustrated I spent so many hours making stupid shapes move around on a page when I could have been doing and learning things that were more practical for my goals so that you don't waste your time like I did I'm going to break down the essential Concepts that you need to learn and why they're relevant for the data domain specifically I start with some of most basic heels and work the way up to a framework for learning almost any new Advanced tool or concept if anything you're probably going to want to stick around to the end for that like always the resources that I recommend will be linked in the description quick is this a dog or a pastry as a human it's pretty easy to tell which is a cute puppy or a delicious muffin on the other hand even some Advanced machine learning algorithms struggle with this distinction without advanced machine learning techniques computers really struggle to do many tasks like this that are simple for humans so why would we use computers for something like this if we had to classify 10 images a human would absolutely crush it on the other hand if we had to classify 5 million we'd probably just throw up our hands and start eating the muffins to forget computers are incredible at doing Simple tasks over and over again this is illustrated by the first coding concept that I want to highlight this concept is called flow control structures and they're made up of loops and conditionals a loop simply repeats a command over and over again and we have two different types of Loops first we have a for Loop and next we have a while loop a for Loop repeats an action a specified amount of time while a while loop repeats indefinitely often only stopping when a specific condition is met for example maybe we wanted to go through a transcript of one of my videos and count how many words there were one approach to do this would be to write a loop to iterate through each word and just keep a counter Loops are one of the most powerful tools in the data scientist Arsenal they let us quickly iterate through data and manipulate it when necessary Loops in a normal day-to-day data science workflow can be abstracted away through libraries which we'll touch on later but it's one good to have an understanding of what's going on under the hood and two it's often really helpful to write one of these really quickly to aggregate count or do many other actions now what if we wanted to count how many times I said papaya in any one of my video transcripts this is where conditionals another type of flow control structure comes into play conditionals in combination with loops allow us to make some very powerful changes to our data in most cases conditionals allow us to do a certain action if a specific condition is met in our papaya counting algorithm we would do something like this if a war is equal to papaya add to a counter else we don't count it this way we could count only papayas and ignore other words the cool thing here is that we can have multiple conditionals to make our Loop do very diverse things when confronted with specific scenarios for example we could have multiple counters one for papaya and maybe some other for fruits like like strawberries as you can imagine flow control gives us a very powerful ability to manipulate and aggregate our data I frequently use these concepts for data engineering and creating new features in my models the syntax of a loop in Python looks like this where we have 4i in some range and then we do some action you can Loop through a list of numbers or any list in general more on list in a second conditionals generally take this form so if we have some condition we do this else we do this other thing a second ago I mentioned lists these fall into another very important concept for data science called data structures you know as a data scientist that you're going to be working with a lot of data wouldn't it be better if it was organized in some logical way data structures allow us to organize the vast amounts of data so that it's easier to analyze search or alter some common data structures are lists dictionaries arrays sets and tuples these are all Native to python but different libraries that we touch on in a later section also introduce different data structures like series and data frames data structures are designed for a specific purpose let's take a list for example a list is a comma separated collection of different data points we can have a list of different numbers words objects or a mix of these things in a list lists are easy to iterate through and they're also mutable which means that we can change the values at each point in the list this can be great if we need to go through and make adjustments to our data at any point for example I might put my data into a list if I wanted to clean it up a bit I could convert something to text and then maybe remove all the extra spaces or all of the punctuation for instance dictionaries serve another purpose so dictionaries allow us to categorize our data into specific headers just like a dictionary for words we have a key which is the equivalent of the word that you're looking up in a dictionary and we have a value which is the equivalent to the meaning of the word in the dictionary these are also mutable let's say I wanted to go through my podcast Ken's nearest neighbors and do a count of each word that was said I would Loop through each word of the podcasts and use a dictionary to catalog how many times each word was said in this case the key would be the word and the value pair of the word would increase by one each time our Loop came across our word in the transcript the code would look something like this where We're looping through and if we haven't seen a word before it becomes a key and if we have seen the word before we would add a counter to it with dictionaries we can start doing some very basic forms of analysis we could actually make a pretty cool bar chart with word frequencies or we could aggregate across some other metric some other data structure structures are immutable which means that they can't be edited this is generally faster for some different types of operations and tuples and arrays are like that tuples or raisin sets are a bit outside the scope of this video and this video is already going to be really long so I left some links in the description for my favorite resources if you want to learn more about those it might have gone under the radar but we've talked about a couple different types of data already in this video first we were talking about text Data with a transcript of my video and then we were talking about dealing with numerical data after we had counted my papayas we also talked about lists and dictionaries that takes us to data types which happen to be very important for analyzing data as a data scientist you'll run into many different types of data and it's important to have a conceptual understanding of them before you analyze them and as you grow in your career this is a list of the different data types in python as you can see some of the data structures that I described fall into data types already let's go over a few of the most important ones for your career in the data domain first we have string string is Text data and you can see it represented with quotation marks simply put in quotation marks around a word makes it a string strings have specific characteristics for example strings are indexable so you can take segments of them and iterate through them like lists you can't however change them so they're known as immutable you see strings in almost all Text data and categorical variables dealing with strings and lists of strings is particularly relevant for the field of natural language processing a branch of machine learning techniques next we have integers so integers are whole numbers and we use these to express counts and sometimes also to express categorical variables or rank order on the other hand floats are numbers with decimals so you can run into some issues when you switch between integer and flow types the last data type that I want to touch on is called bools or booleans these are essentially true and false values so in Python booleans can be represented as a zero or one or as false and true keywords we can use these for binary variables and for classifications when analyzing data in Python we need to constantly be aware of the data types that we're using using wrong types of data can present problems for our analysis over time actually a big part of coding is avoiding making errors if we have to write the same code over and over again we increase the chance that we make errors and we also have long and gross looking files the next concept that we focus on allows us to reuse code easily and make our work more scalable this concept is called functions probably unsurprisingly I get asked the same questions about data science almost every day let's take the question how should I learn to code for data science if I responded with a full answer to everyone that asked me this I'd probably have absolutely no time for anything else in my day on the other hand if I made a video about it and just shared the link over and over again people would get the same in-depth answer to this question without me having to spend massive amounts of my time typing the same concept goes for functions with functions you can encapsulate some of your code and reuse it over and over again let's take the dictionary example from earlier where I wanted to get a word count from all the words I used in my podcast I could write this code in a function and then make it so that I could use that on any podcast that I've ever recorded functions work for many things because we can encapsulate code but also also passed different parameters into them so for example I could write my code for the word counter like this so we Define the word counter we pass in the text files that I want to parse we do the counting for all the words and then we return the dictionary with the word counts for functions you always want them to return something you don't necessarily have to pass a parameter in very closely related to functions are methods so methods allow us to apply a function or a function like Behavior to specific objects for example lists have a method called sort for a list of numbers for example this would put the list in order these methods are related to objects and object-oriented programming which is important but in my opinion not completely essential to understand for getting started in data after you get familiar with the other basic concepts I do recommend brushing up on object oriented programming which I discussed just a bit more in depth later in the video as you can imagine functions and methods are really important for scaling our work especially with training and test sets if we want to do the same things to both sets like data scaling functions can come in very handy there in some sense the majority of actual coding work is just reusing other people's code it makes no sense for me to write my own linear regression from scratch every time instead I can import the code from someone else who's probably way smarter than I am and apply it to my specific problem this is where the concept of libraries comes in so essentially a library is a whole bunch of functions that someone else has put together that I can bring in and use for my own work an example of this would be scikit-learn which has most of the machine learning algorithms the data scientists use on a daily basis I can simply import the algorithm that I want to use and apply it in my code so rather than being functions these libraries are often made up of a group of objects that have methods that give us our clear desired functionality to use these libraries I just download the package to my python environment using pip or anaconda and then import the library into my workbook these are some of the main libraries that are used for data science so we have pandas and numpy for manipulating data and scientific Computing pandas has a few data structures like series and data frames that I described before these are integral for a lot of descriptive analysis that we do in data science scikit-learn is the main one for machine learning algorithms train test splitting and model evaluation matplotlib plotly and streamlit are very commonly used in data visualization and things like tensorflow Pi torch and Keras are used for deep learning I've left all the documentation for all of these libraries and a few videos where I use these tools in the description below obviously there are so many more than just this but I think those are good ones to get you started full transparency most people usually get stuck at this part new libraries can be unbelievably confusing but luckily ucode can help you with that ucode is the single best completely free search engine for coders on your next project you'll save time debugging because you code sophisticated ranking system is designed to find the most relevant coding results from different sources you can click to directly copy code from developer focused apps like stack Overflow and you can even use ucode Suite of AI powered apps to generate code for languages including SQL python FedEx spark k8s and hugging face my favorite part is the ability to easily search and find relevant documentation and peer-reviewed papers with those Concepts in my opinion you have a lot of the basics for coding for data science for those looking to go to the next level which you should be these are some of the other relevant Concepts that I think can help get you there so you ran your code and it isn't working what do you do should you just throw your hands up in the air should you ask your friend next to you should you eat some of those delicious muffins from before or should you just give up the answer is to look at the error messages and begin to understand them this is one of the most useful skills that you can develop generally you start just by Googling the errors or going to stack Overflow but over time you can learn to debug with just the error messages that you're given over time you can also start writing unit tests and using some other process tools to help you discover inconsistency in your data and your outputs as I mentioned before you can also start exploring object-oriented programming to expand your knowledge most libraries are structured as a series of objects rather than a series of functions object oriented programming provides increased modularity reusability and scalability of code it allows us to build things around objects rather than individual piecemeal functions I've left a few of my favorite articles on object oriented programming for python in the description below once you understand how objects work you can start understanding all the logic and the libraries that you use so I promised a framework for learning any new library or concept that comes your way this is how I personally approach learning any new tool that I want to work with doing this over and over has given me confidence that I can pick up most things and get them working within a few hours so first number one thing is look at the documentation almost every good Library out there has solid documentation and they're all structured very similarly so once you get good at reading some documentation you get go to reading most documentation next look at the examples get a feel for how the libraries are used you can often find examples in the documentation itself on kaggle on GitHub or on stack Overflow after that I think you should apply the tools on toy problems so just get them working and Tinker around with them and then finally apply these libraries on real data and there you have it a simple and functional framework I should note that I use python as a base language here this does work for many other languages although there might be some small differences thank you so much for watching and good luck on your data science Journey thank you

Original Description

Programming / Coding is one of the most important skills for a data scientist. Having a great foundation can increase your value over the course of your career. In this video I talk about the 5 essential programming concepts that will give you the biggest boost for a data career. Check out YouCode here! https://you.com/kenjee 0:00 Intro 1:10 Control Flow Structures 3:47 Data Structures 5:56 Data Types 7:45 Functions & Methods 9:40 Libraries 11:49: Debugging 12:20 Object Oriented Programming 12:52 Learning Framework Tuples Vs Arrays Vs Lists: https://www.geeksforgeeks.org/python-list-vs-array-vs-tuple/ Python Sets: https://www.w3schools.com/python/python_sets.asp Documentation for libraries - Pandas Docs: https://pandas.pydata.org/docs/ - Numpy Docs: https://numpy.org/doc/ - Scikit Learn Docs: https://scikit-learn.org/0.21/documentation.html - Matplotlib Docs: https://matplotlib.org/stable/index.html - Seaborn Docs: https://seaborn.pydata.org/ - Plotly Docs: https://plotly.com/python/ - Tensorflow Docs: https://www.tensorflow.org/api_docs - Pytorch Docs: https://pytorch.org/docs/stable/index.html - Keras Docs: https://keras.io/ Other Resources/ projects: - Project Example Using Many of These Tools: https://www.youtube.com/watch?v=I3FBJdiExcg&ab_channel=KenJee #datascience #KenJee #coding ⭕ Subscribe: https://www.youtube.com/c/kenjee1?sub_confirmation=1 🎙 Listen to My Podcast: https://www.youtube.com/c/KensNearestNeighborsPodcast 🕸 Check out My Website - https://kennethjee.com/ ✍️Sign up for My Newsletter - https://www.kennethjee.com/newsletter 📚 Books and Products I use - https://www.amazon.com/shop/kenjee (affiliate link) Partners & Affiliates 🌟 365 Data Science - Courses ( 57% Annual Discount): https://365datascience.pxf.io/P0jbBY 🌟 Interview Query - https://www.interviewquery.com/?ref=kenjee MORE DATA SCIENCE CONTENT HERE: 🐤My Twitter - https://twitter.com/KenJee_DS 👔 LinkedIn - https://www.linkedin.com/in/kenjee/ 📈 Kaggle - https://www
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Ken Jee · Ken Jee · 0 of 60

← Previous Next →
1 Predicting Crypto-Currency Price Using RNN lSTM & GRU
Predicting Crypto-Currency Price Using RNN lSTM & GRU
Ken Jee
2 Predicting Season Long NBA Wins Using Multiple Linear Regression
Predicting Season Long NBA Wins Using Multiple Linear Regression
Ken Jee
3 How I Became A Data Scientist From a Business Background
How I Became A Data Scientist From a Business Background
Ken Jee
4 Should You Get A Masters in Data Science?
Should You Get A Masters in Data Science?
Ken Jee
5 How to Simulate NBA Games in Python
How to Simulate NBA Games in Python
Ken Jee
6 Demystifying Data Science Roles
Demystifying Data Science Roles
Ken Jee
7 The Best Way to Predict NBA Minutes Played
The Best Way to Predict NBA Minutes Played
Ken Jee
8 IT'S NOT TOO LATE TO LEARN CODE!
IT'S NOT TOO LATE TO LEARN CODE!
Ken Jee
9 My Top 5 Data Science Resources for 2019
My Top 5 Data Science Resources for 2019
Ken Jee
10 Watch This Before Applying to Data Science Jobs
Watch This Before Applying to Data Science Jobs
Ken Jee
11 Where YOU Should Start With Data Science Projects
Where YOU Should Start With Data Science Projects
Ken Jee
12 Welcome To My Channel | Ken Jee | Data Science
Welcome To My Channel | Ken Jee | Data Science
Ken Jee
13 Why You DON'T Want to be a WFH Data Scientist
Why You DON'T Want to be a WFH Data Scientist
Ken Jee
14 Was Captain Marvel Bad? A Sentiment Analysis of Twitter Data
Was Captain Marvel Bad? A Sentiment Analysis of Twitter Data
Ken Jee
15 Data Science, Machine Learning, and AI: What's the Difference?
Data Science, Machine Learning, and AI: What's the Difference?
Ken Jee
16 Data Science: Startup vs. Large Corporation
Data Science: Startup vs. Large Corporation
Ken Jee
17 Where to Look for Data Science Jobs
Where to Look for Data Science Jobs
Ken Jee
18 Work From Home Data Scientist: Day in the Life
Work From Home Data Scientist: Day in the Life
Ken Jee
19 Scrape Twitter Data in Python with Twitterscraper Module
Scrape Twitter Data in Python with Twitterscraper Module
Ken Jee
20 Should You Learn R for Data Science?
Should You Learn R for Data Science?
Ken Jee
21 NASA Physicist Turned Data Scientist (Tim Bowling) - KNN EP. 02
NASA Physicist Turned Data Scientist (Tim Bowling) - KNN EP. 02
Ken Jee
22 I Wish I Had Known THIS Before Starting in Data Science
I Wish I Had Known THIS Before Starting in Data Science
Ken Jee
23 What I Learned From My Three Degrees
What I Learned From My Three Degrees
Ken Jee
24 Most Data Science Hopefuls Overlook This Important Skill
Most Data Science Hopefuls Overlook This Important Skill
Ken Jee
25 Golf STATS: Strokes Gained Explained
Golf STATS: Strokes Gained Explained
Ken Jee
26 My Top 5 Data Science Internship Tips
My Top 5 Data Science Internship Tips
Ken Jee
27 How I Got My First Data Science Internship (And How You Can Land One)
How I Got My First Data Science Internship (And How You Can Land One)
Ken Jee
28 Data Science: Pros and Cons
Data Science: Pros and Cons
Ken Jee
29 Data Science Fundamentals: Data Exploration in Python (Pandas)
Data Science Fundamentals: Data Exploration in Python (Pandas)
Ken Jee
30 Data Science Fundamentals: Data Manipulation in Python (Pandas)
Data Science Fundamentals: Data Manipulation in Python (Pandas)
Ken Jee
31 What Does a Data Scientist Actually Do?
What Does a Data Scientist Actually Do?
Ken Jee
32 The Projects You Should Do To Get A Data Science Job
The Projects You Should Do To Get A Data Science Job
Ken Jee
33 Take Your Data Science Projects From Good to Great
Take Your Data Science Projects From Good to Great
Ken Jee
34 How To Get Data Science Experience (Without a Job)
How To Get Data Science Experience (Without a Job)
Ken Jee
35 Data Science Fundamentals: Data Cleaning in Python
Data Science Fundamentals: Data Cleaning in Python
Ken Jee
36 Is Data Science Right For You?
Is Data Science Right For You?
Ken Jee
37 Thank You For The Support | What's Next | Ken Jee | Data Science
Thank You For The Support | What's Next | Ken Jee | Data Science
Ken Jee
38 How To Build A Word Cloud From Scraped Data (Python)
How To Build A Word Cloud From Scraped Data (Python)
Ken Jee
39 6 Habits of Successful Data Scientists
6 Habits of Successful Data Scientists
Ken Jee
40 How Far Should the NBA 3-Point Line Actually Be?
How Far Should the NBA 3-Point Line Actually Be?
Ken Jee
41 How to Stay Productive & Motivated When Learning Data Science
How to Stay Productive & Motivated When Learning Data Science
Ken Jee
42 Why is Balance Important in Data Science?
Why is Balance Important in Data Science?
Ken Jee
43 By The Numbers: Where Should The NBA Put a 4 Point Line?
By The Numbers: Where Should The NBA Put a 4 Point Line?
Ken Jee
44 Why Selling Is An Important Data Science Skill
Why Selling Is An Important Data Science Skill
Ken Jee
45 Applying Data Science To My YouTube Data: My Surprising Findings
Applying Data Science To My YouTube Data: My Surprising Findings
Ken Jee
46 9 Ways You Can Make Extra Income as a Data Scientist
9 Ways You Can Make Extra Income as a Data Scientist
Ken Jee
47 Sports Analytics 101: The Pythagorean Theorem of Sports
Sports Analytics 101: The Pythagorean Theorem of Sports
Ken Jee
48 Golf: Would You Rather Be the LONGEST or STRAIGHTEST Driver on the PGA Tour?
Golf: Would You Rather Be the LONGEST or STRAIGHTEST Driver on the PGA Tour?
Ken Jee
49 Data Science Fundamentals: Linear Regression
Data Science Fundamentals: Linear Regression
Ken Jee
50 How YOU Can Land a Sports Analytics Job
How YOU Can Land a Sports Analytics Job
Ken Jee
51 The 5 Stages of Data Science Adoption
The 5 Stages of Data Science Adoption
Ken Jee
52 Math Needed for Mastering Data Science
Math Needed for Mastering Data Science
Ken Jee
53 5 Sports Analytics Books to Get You Started
5 Sports Analytics Books to Get You Started
Ken Jee
54 3 Reasons You Should NOT Become a Data Scientist
3 Reasons You Should NOT Become a Data Scientist
Ken Jee
55 Collision Course: Sports Betting + Data Science
Collision Course: Sports Betting + Data Science
Ken Jee
56 How to Scrape NBA Data Using the nba_api Python Module
How to Scrape NBA Data Using the nba_api Python Module
Ken Jee
57 5 Data Science Resolutions for 2020
5 Data Science Resolutions for 2020
Ken Jee
58 The Data Science Interview: What to Expect
The Data Science Interview: What to Expect
Ken Jee
59 The 9 Books That Changed My Perspective in 2019
The 9 Books That Changed My Perspective in 2019
Ken Jee
60 Questions You Should Ask Your Data Science Interviewers
Questions You Should Ask Your Data Science Interviewers
Ken Jee

This video teaches the essential programming concepts for learning data science, including flow control structures, data types, functions, methods, and libraries, with a focus on Python and its applications in data science. By learning these concepts, viewers can build a strong foundation in data science and improve their programming skills. The video provides practical examples and steps to apply these concepts to real-world data science problems.

Key Takeaways
  1. Use if-else conditionals to count papayas and ignore other words
  2. Use a loop to iterate through a list of numbers or any list in general
  3. Use lists to organize data in a logical way
  4. Use dictionaries to categorize data into specific headers
  5. Use data structures to make data easier to analyze, search, or alter
  6. Define a function to encapsulate code
  7. Pass parameters into a function to customize its behavior
  8. Use methods to apply functions to specific objects
  9. Look at error messages to understand and debug code
  10. Use unit tests and process tools to discover inconsistencies
💡 The video highlights the importance of learning programming concepts, such as flow control structures, data types, functions, methods, and libraries, to build a strong foundation in data science.

Related AI Lessons

Want to get started with deep learning
Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning
Medium · Deep Learning

Chapters (8)

Intro
1:10 Control Flow Structures
3:47 Data Structures
5:56 Data Types
7:45 Functions & Methods
9:40 Libraries
12:20 Object Oriented Programming
12:52 Learning Framework
Up next
Image Classification with ml5.js
The Coding Train
Watch →