The Essential Programming Concepts For Learning Data Science
Key Takeaways
The video discusses essential programming concepts for learning data science, including flow control structures, data types, functions, methods, and libraries, with a focus on Python and its applications in data science.
Full Transcript
programming might be the single most important skill for data scientists and many other data practitioners coding allows you to construct predictive models beautiful visualizations and build products that scale to millions of people this skill is extremely important but actually learning coding is a massive barrier for those looking to harness its power when I was starting out I knew I wanted to leverage programming for data unfortunately programming was such a large field that I didn't know where to start or what direction I should take I knew I needed to get started so I just pulled up code academy and went about learning aimlessly at the time I didn't know what languages were most practical what concepts would help me in data analysis or even what Ides to use I spent so much time just fumbling through different tutorials that I was really frustrated I spent so many hours making stupid shapes move around on a page when I could have been doing and learning things that were more practical for my goals so that you don't waste your time like I did I'm going to break down the essential Concepts that you need to learn and why they're relevant for the data domain specifically I start with some of most basic heels and work the way up to a framework for learning almost any new Advanced tool or concept if anything you're probably going to want to stick around to the end for that like always the resources that I recommend will be linked in the description quick is this a dog or a pastry as a human it's pretty easy to tell which is a cute puppy or a delicious muffin on the other hand even some Advanced machine learning algorithms struggle with this distinction without advanced machine learning techniques computers really struggle to do many tasks like this that are simple for humans so why would we use computers for something like this if we had to classify 10 images a human would absolutely crush it on the other hand if we had to classify 5 million we'd probably just throw up our hands and start eating the muffins to forget computers are incredible at doing Simple tasks over and over again this is illustrated by the first coding concept that I want to highlight this concept is called flow control structures and they're made up of loops and conditionals a loop simply repeats a command over and over again and we have two different types of Loops first we have a for Loop and next we have a while loop a for Loop repeats an action a specified amount of time while a while loop repeats indefinitely often only stopping when a specific condition is met for example maybe we wanted to go through a transcript of one of my videos and count how many words there were one approach to do this would be to write a loop to iterate through each word and just keep a counter Loops are one of the most powerful tools in the data scientist Arsenal they let us quickly iterate through data and manipulate it when necessary Loops in a normal day-to-day data science workflow can be abstracted away through libraries which we'll touch on later but it's one good to have an understanding of what's going on under the hood and two it's often really helpful to write one of these really quickly to aggregate count or do many other actions now what if we wanted to count how many times I said papaya in any one of my video transcripts this is where conditionals another type of flow control structure comes into play conditionals in combination with loops allow us to make some very powerful changes to our data in most cases conditionals allow us to do a certain action if a specific condition is met in our papaya counting algorithm we would do something like this if a war is equal to papaya add to a counter else we don't count it this way we could count only papayas and ignore other words the cool thing here is that we can have multiple conditionals to make our Loop do very diverse things when confronted with specific scenarios for example we could have multiple counters one for papaya and maybe some other for fruits like like strawberries as you can imagine flow control gives us a very powerful ability to manipulate and aggregate our data I frequently use these concepts for data engineering and creating new features in my models the syntax of a loop in Python looks like this where we have 4i in some range and then we do some action you can Loop through a list of numbers or any list in general more on list in a second conditionals generally take this form so if we have some condition we do this else we do this other thing a second ago I mentioned lists these fall into another very important concept for data science called data structures you know as a data scientist that you're going to be working with a lot of data wouldn't it be better if it was organized in some logical way data structures allow us to organize the vast amounts of data so that it's easier to analyze search or alter some common data structures are lists dictionaries arrays sets and tuples these are all Native to python but different libraries that we touch on in a later section also introduce different data structures like series and data frames data structures are designed for a specific purpose let's take a list for example a list is a comma separated collection of different data points we can have a list of different numbers words objects or a mix of these things in a list lists are easy to iterate through and they're also mutable which means that we can change the values at each point in the list this can be great if we need to go through and make adjustments to our data at any point for example I might put my data into a list if I wanted to clean it up a bit I could convert something to text and then maybe remove all the extra spaces or all of the punctuation for instance dictionaries serve another purpose so dictionaries allow us to categorize our data into specific headers just like a dictionary for words we have a key which is the equivalent of the word that you're looking up in a dictionary and we have a value which is the equivalent to the meaning of the word in the dictionary these are also mutable let's say I wanted to go through my podcast Ken's nearest neighbors and do a count of each word that was said I would Loop through each word of the podcasts and use a dictionary to catalog how many times each word was said in this case the key would be the word and the value pair of the word would increase by one each time our Loop came across our word in the transcript the code would look something like this where We're looping through and if we haven't seen a word before it becomes a key and if we have seen the word before we would add a counter to it with dictionaries we can start doing some very basic forms of analysis we could actually make a pretty cool bar chart with word frequencies or we could aggregate across some other metric some other data structure structures are immutable which means that they can't be edited this is generally faster for some different types of operations and tuples and arrays are like that tuples or raisin sets are a bit outside the scope of this video and this video is already going to be really long so I left some links in the description for my favorite resources if you want to learn more about those it might have gone under the radar but we've talked about a couple different types of data already in this video first we were talking about text Data with a transcript of my video and then we were talking about dealing with numerical data after we had counted my papayas we also talked about lists and dictionaries that takes us to data types which happen to be very important for analyzing data as a data scientist you'll run into many different types of data and it's important to have a conceptual understanding of them before you analyze them and as you grow in your career this is a list of the different data types in python as you can see some of the data structures that I described fall into data types already let's go over a few of the most important ones for your career in the data domain first we have string string is Text data and you can see it represented with quotation marks simply put in quotation marks around a word makes it a string strings have specific characteristics for example strings are indexable so you can take segments of them and iterate through them like lists you can't however change them so they're known as immutable you see strings in almost all Text data and categorical variables dealing with strings and lists of strings is particularly relevant for the field of natural language processing a branch of machine learning techniques next we have integers so integers are whole numbers and we use these to express counts and sometimes also to express categorical variables or rank order on the other hand floats are numbers with decimals so you can run into some issues when you switch between integer and flow types the last data type that I want to touch on is called bools or booleans these are essentially true and false values so in Python booleans can be represented as a zero or one or as false and true keywords we can use these for binary variables and for classifications when analyzing data in Python we need to constantly be aware of the data types that we're using using wrong types of data can present problems for our analysis over time actually a big part of coding is avoiding making errors if we have to write the same code over and over again we increase the chance that we make errors and we also have long and gross looking files the next concept that we focus on allows us to reuse code easily and make our work more scalable this concept is called functions probably unsurprisingly I get asked the same questions about data science almost every day let's take the question how should I learn to code for data science if I responded with a full answer to everyone that asked me this I'd probably have absolutely no time for anything else in my day on the other hand if I made a video about it and just shared the link over and over again people would get the same in-depth answer to this question without me having to spend massive amounts of my time typing the same concept goes for functions with functions you can encapsulate some of your code and reuse it over and over again let's take the dictionary example from earlier where I wanted to get a word count from all the words I used in my podcast I could write this code in a function and then make it so that I could use that on any podcast that I've ever recorded functions work for many things because we can encapsulate code but also also passed different parameters into them so for example I could write my code for the word counter like this so we Define the word counter we pass in the text files that I want to parse we do the counting for all the words and then we return the dictionary with the word counts for functions you always want them to return something you don't necessarily have to pass a parameter in very closely related to functions are methods so methods allow us to apply a function or a function like Behavior to specific objects for example lists have a method called sort for a list of numbers for example this would put the list in order these methods are related to objects and object-oriented programming which is important but in my opinion not completely essential to understand for getting started in data after you get familiar with the other basic concepts I do recommend brushing up on object oriented programming which I discussed just a bit more in depth later in the video as you can imagine functions and methods are really important for scaling our work especially with training and test sets if we want to do the same things to both sets like data scaling functions can come in very handy there in some sense the majority of actual coding work is just reusing other people's code it makes no sense for me to write my own linear regression from scratch every time instead I can import the code from someone else who's probably way smarter than I am and apply it to my specific problem this is where the concept of libraries comes in so essentially a library is a whole bunch of functions that someone else has put together that I can bring in and use for my own work an example of this would be scikit-learn which has most of the machine learning algorithms the data scientists use on a daily basis I can simply import the algorithm that I want to use and apply it in my code so rather than being functions these libraries are often made up of a group of objects that have methods that give us our clear desired functionality to use these libraries I just download the package to my python environment using pip or anaconda and then import the library into my workbook these are some of the main libraries that are used for data science so we have pandas and numpy for manipulating data and scientific Computing pandas has a few data structures like series and data frames that I described before these are integral for a lot of descriptive analysis that we do in data science scikit-learn is the main one for machine learning algorithms train test splitting and model evaluation matplotlib plotly and streamlit are very commonly used in data visualization and things like tensorflow Pi torch and Keras are used for deep learning I've left all the documentation for all of these libraries and a few videos where I use these tools in the description below obviously there are so many more than just this but I think those are good ones to get you started full transparency most people usually get stuck at this part new libraries can be unbelievably confusing but luckily ucode can help you with that ucode is the single best completely free search engine for coders on your next project you'll save time debugging because you code sophisticated ranking system is designed to find the most relevant coding results from different sources you can click to directly copy code from developer focused apps like stack Overflow and you can even use ucode Suite of AI powered apps to generate code for languages including SQL python FedEx spark k8s and hugging face my favorite part is the ability to easily search and find relevant documentation and peer-reviewed papers with those Concepts in my opinion you have a lot of the basics for coding for data science for those looking to go to the next level which you should be these are some of the other relevant Concepts that I think can help get you there so you ran your code and it isn't working what do you do should you just throw your hands up in the air should you ask your friend next to you should you eat some of those delicious muffins from before or should you just give up the answer is to look at the error messages and begin to understand them this is one of the most useful skills that you can develop generally you start just by Googling the errors or going to stack Overflow but over time you can learn to debug with just the error messages that you're given over time you can also start writing unit tests and using some other process tools to help you discover inconsistency in your data and your outputs as I mentioned before you can also start exploring object-oriented programming to expand your knowledge most libraries are structured as a series of objects rather than a series of functions object oriented programming provides increased modularity reusability and scalability of code it allows us to build things around objects rather than individual piecemeal functions I've left a few of my favorite articles on object oriented programming for python in the description below once you understand how objects work you can start understanding all the logic and the libraries that you use so I promised a framework for learning any new library or concept that comes your way this is how I personally approach learning any new tool that I want to work with doing this over and over has given me confidence that I can pick up most things and get them working within a few hours so first number one thing is look at the documentation almost every good Library out there has solid documentation and they're all structured very similarly so once you get good at reading some documentation you get go to reading most documentation next look at the examples get a feel for how the libraries are used you can often find examples in the documentation itself on kaggle on GitHub or on stack Overflow after that I think you should apply the tools on toy problems so just get them working and Tinker around with them and then finally apply these libraries on real data and there you have it a simple and functional framework I should note that I use python as a base language here this does work for many other languages although there might be some small differences thank you so much for watching and good luck on your data science Journey thank you
Original Description
Programming / Coding is one of the most important skills for a data scientist. Having a great foundation can increase your value over the course of your career. In this video I talk about the 5 essential programming concepts that will give you the biggest boost for a data career.
Check out YouCode here! https://you.com/kenjee
0:00 Intro
1:10 Control Flow Structures
3:47 Data Structures
5:56 Data Types
7:45 Functions & Methods
9:40 Libraries
11:49: Debugging
12:20 Object Oriented Programming
12:52 Learning Framework
Tuples Vs Arrays Vs Lists: https://www.geeksforgeeks.org/python-list-vs-array-vs-tuple/
Python Sets: https://www.w3schools.com/python/python_sets.asp
Documentation for libraries
- Pandas Docs: https://pandas.pydata.org/docs/
- Numpy Docs: https://numpy.org/doc/
- Scikit Learn Docs: https://scikit-learn.org/0.21/documentation.html
- Matplotlib Docs: https://matplotlib.org/stable/index.html
- Seaborn Docs: https://seaborn.pydata.org/
- Plotly Docs: https://plotly.com/python/
- Tensorflow Docs: https://www.tensorflow.org/api_docs
- Pytorch Docs: https://pytorch.org/docs/stable/index.html
- Keras Docs: https://keras.io/
Other Resources/ projects:
- Project Example Using Many of These Tools: https://www.youtube.com/watch?v=I3FBJdiExcg&ab_channel=KenJee
#datascience #KenJee #coding
⭕ Subscribe: https://www.youtube.com/c/kenjee1?sub_confirmation=1
🎙 Listen to My Podcast: https://www.youtube.com/c/KensNearestNeighborsPodcast
🕸 Check out My Website - https://kennethjee.com/
✍️Sign up for My Newsletter - https://www.kennethjee.com/newsletter
📚 Books and Products I use - https://www.amazon.com/shop/kenjee (affiliate link)
Partners & Affiliates
🌟 365 Data Science - Courses ( 57% Annual Discount): https://365datascience.pxf.io/P0jbBY
🌟 Interview Query - https://www.interviewquery.com/?ref=kenjee
MORE DATA SCIENCE CONTENT HERE:
🐤My Twitter - https://twitter.com/KenJee_DS
👔 LinkedIn - https://www.linkedin.com/in/kenjee/
📈 Kaggle - https://www
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Ken Jee · Ken Jee · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Predicting Crypto-Currency Price Using RNN lSTM & GRU
Ken Jee
Predicting Season Long NBA Wins Using Multiple Linear Regression
Ken Jee
How I Became A Data Scientist From a Business Background
Ken Jee
Should You Get A Masters in Data Science?
Ken Jee
How to Simulate NBA Games in Python
Ken Jee
Demystifying Data Science Roles
Ken Jee
The Best Way to Predict NBA Minutes Played
Ken Jee
IT'S NOT TOO LATE TO LEARN CODE!
Ken Jee
My Top 5 Data Science Resources for 2019
Ken Jee
Watch This Before Applying to Data Science Jobs
Ken Jee
Where YOU Should Start With Data Science Projects
Ken Jee
Welcome To My Channel | Ken Jee | Data Science
Ken Jee
Why You DON'T Want to be a WFH Data Scientist
Ken Jee
Was Captain Marvel Bad? A Sentiment Analysis of Twitter Data
Ken Jee
Data Science, Machine Learning, and AI: What's the Difference?
Ken Jee
Data Science: Startup vs. Large Corporation
Ken Jee
Where to Look for Data Science Jobs
Ken Jee
Work From Home Data Scientist: Day in the Life
Ken Jee
Scrape Twitter Data in Python with Twitterscraper Module
Ken Jee
Should You Learn R for Data Science?
Ken Jee
NASA Physicist Turned Data Scientist (Tim Bowling) - KNN EP. 02
Ken Jee
I Wish I Had Known THIS Before Starting in Data Science
Ken Jee
What I Learned From My Three Degrees
Ken Jee
Most Data Science Hopefuls Overlook This Important Skill
Ken Jee
Golf STATS: Strokes Gained Explained
Ken Jee
My Top 5 Data Science Internship Tips
Ken Jee
How I Got My First Data Science Internship (And How You Can Land One)
Ken Jee
Data Science: Pros and Cons
Ken Jee
Data Science Fundamentals: Data Exploration in Python (Pandas)
Ken Jee
Data Science Fundamentals: Data Manipulation in Python (Pandas)
Ken Jee
What Does a Data Scientist Actually Do?
Ken Jee
The Projects You Should Do To Get A Data Science Job
Ken Jee
Take Your Data Science Projects From Good to Great
Ken Jee
How To Get Data Science Experience (Without a Job)
Ken Jee
Data Science Fundamentals: Data Cleaning in Python
Ken Jee
Is Data Science Right For You?
Ken Jee
Thank You For The Support | What's Next | Ken Jee | Data Science
Ken Jee
How To Build A Word Cloud From Scraped Data (Python)
Ken Jee
6 Habits of Successful Data Scientists
Ken Jee
How Far Should the NBA 3-Point Line Actually Be?
Ken Jee
How to Stay Productive & Motivated When Learning Data Science
Ken Jee
Why is Balance Important in Data Science?
Ken Jee
By The Numbers: Where Should The NBA Put a 4 Point Line?
Ken Jee
Why Selling Is An Important Data Science Skill
Ken Jee
Applying Data Science To My YouTube Data: My Surprising Findings
Ken Jee
9 Ways You Can Make Extra Income as a Data Scientist
Ken Jee
Sports Analytics 101: The Pythagorean Theorem of Sports
Ken Jee
Golf: Would You Rather Be the LONGEST or STRAIGHTEST Driver on the PGA Tour?
Ken Jee
Data Science Fundamentals: Linear Regression
Ken Jee
How YOU Can Land a Sports Analytics Job
Ken Jee
The 5 Stages of Data Science Adoption
Ken Jee
Math Needed for Mastering Data Science
Ken Jee
5 Sports Analytics Books to Get You Started
Ken Jee
3 Reasons You Should NOT Become a Data Scientist
Ken Jee
Collision Course: Sports Betting + Data Science
Ken Jee
How to Scrape NBA Data Using the nba_api Python Module
Ken Jee
5 Data Science Resolutions for 2020
Ken Jee
The Data Science Interview: What to Expect
Ken Jee
The 9 Books That Changed My Perspective in 2019
Ken Jee
Questions You Should Ask Your Data Science Interviewers
Ken Jee
More on: LLM Foundations
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Want to get started with deep learning
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Medium · Deep Learning
Chapters (8)
Intro
1:10
Control Flow Structures
3:47
Data Structures
5:56
Data Types
7:45
Functions & Methods
9:40
Libraries
12:20
Object Oriented Programming
12:52
Learning Framework
🎓
Tutor Explanation
DeepCamp AI