How to get started with Data Science (Career tracks and advice)
Skills:
Research Methods80%Reading ML Papers70%ML Maths Basics60%Supervised Learning50%Unsupervised Learning50%
Key Takeaways
The video discusses three career tracks for hands-on data science: research, applied research, and industry, and provides advice on getting started with data science, including learning basic theory and using tools like TensorFlow, PyTorch, and Python.
Full Transcript
hello and welcome to this quick guidance on how to get started with data science i created this video because a lot of people ask me how they should get started and also which direction they should take of course all of the following is only based on my personal opinion and represents what i would suggest to a good friend the position data scientist has many flavors from what i've seen so far there are mainly three career tracks for hands-on data science and by data scientists i only refer to data or machine learning scientists but not related fields like data analysts or data engineer the first one is the research track which means you will work at a university or a research institute here you typically do state-of-the-art research and find new ways to build machine learning models or use data and the people on this track typically work on a phd or already have one the second track is what i would call applied research and most of the time you are in the r d department of a company and do research with regards to open problems in the company for example in an automotive area you might work on autonomous driving or in the pharmaceutical industry you do research on drug discovery the third track is the industry track you are hired in any department of a company and use your skills to build models and deploy them to production here you are typically more confronted with building high quality applications and writing clean codes most of the time proven off-the-shelf models are used and you work together with a lot of other people to provide a dashboard in api or generally a digital product people commonly switch between these tracks for example after finishing the phd a lot of people switch to applied research of course you don't know what you like before you've tested it and therefore i want to give you some guidance on what you should try out and which skills are a must in each of these tracks before we talk about the tech stack of a data scientist and which tools i recommend to do data science the most important part is to learn the basic theory before not having a basic understanding it really doesn't make sense to start with the implementation to that end there are great youtube tutorials and lectures you could watch but also a lot of free books and other resources to get familiar with the theory at the present time we are really lucky to have access to so many great resources all over the internet which makes it quite easy to get started of course you can also check out courses at coursera which are typically very well prepared give yourself a few weeks to invest in building a solid basis many tutorials will also give you a first glimpse of how you can implement different algorithms all i want to say here is don't start coding without knowing what you're doing on the right are a few things that i would cover in the first place for example try to understand the inner workings of a linear regression as this will teach some intuition for the more advanced algorithms three based ensemble models are another very important algorithmic class which are an absolute must in my opinion besides that you should have heard of terms like overfitting or cross validation and should know some algorithms for supervised and unsupervised learning depending on what you will do in the future it can also help a lot to get familiar with mathematical optimization for example understanding gradient descent this will come in handy when you take a look at neural networks and deep learning whenever you touch data you will be confronted with statistics even if it's only a simple histogram therefore it is very important to also have some knowledge in this discipline i know it can be overwhelming at the beginning but try to get not frustrated and conquer one thing after the other we all need to learn new stuff every day and the important part is that we keep going let's now have a closer look at the different data science tracks having a clearer picture on what they are doing helps to take better career decisions let's begin with the research track how does a day in a research scientist's life look like these guys often read and write a lot of papers and usually work with state-of-the-art machine learning often the problems they try to solve have a longer time horizon and fundamental research is conducted therefore a lot of math and statistics is involved and you should have fun diving into the mathematical details which sometimes look like this typically the conducted experiments are done with smaller open source data sets while you don't need too big computers however most of these positions are currently in the deep learning field and therefore you need to know how to train on a gpu so what are the typical tech skills required here of course you build new things from scratch therefore good programming skills are necessary most of the time you will also work on a linux machine because a lot of things are usually easier compared to windows at least from my experience finally if you work on deep learning research you should be familiar with one of the deep learning libraries like tensorflow or pytorch of course there can be a lot of other skills that are required but this really depends on what you're working on when it comes to applied research there are quite some overlaps with the research track the previous skills are all skills that you will need here as well it might be however that you need less in-depth understanding of the deep learning models or base your experiments on existing implementations the goal is to apply the latest research to industry problems and that's why you will also read a lot of papers this track is focused on innovation and aims to build prototypes but you're quite decoupled from the actual business typically the data sets in the industry are much bigger and therefore you will often work with cloud computing instances and need to build applications at scale that's why this position is also quite computer science heavy an important skill in this r d track is also the ability to work with databases this means using sql or other technologies to read and write data that's because you typically work with real world data and therefore need to access this data somehow as you will work together with other researchers it's also important to have profound code versioning skills using tools like git finally i would say that a phd is not a must in this intersection between industry and research but a lot of people have one the industry data science track is most connected with the actual business units you work together with them to create a solution for a business problem for example they require an anomaly detection model in their production line or the sales department wants to have a tool for estimating future revenues there are really lots of possibilities here you typically don't have to come up with new machine learning models for this as basically everything is already available the main task is therefore to clean the data put it in a proper shape and fit the model with it for this good visualization and modeling skills are required as you build digital solutions you typically work in teams with other roles like front-end and back-end developers data engineers and ux designers that's why a big part of the daily work is communication and maybe also scrum in early phases of the project it can be very helpful to be able to build simple dashboards to communicate the results with the business finally your model or application needs to run somewhere and therefore you should have a basic knowledge on how to put things into production this includes being familiar with http requests and container tools like docker to wrap this up i think you can say that the research roles are more focused on the algorithms and the industry data scientist is more focused on the data also the industry roles are most of the time working with tabular data from what i've seen so far and deep learning models are more often handled by the applied research scientists but of course this can be different from department to department next we will get a bit more precise and talk about which skills i recommend to have and which tools i use because a fact is you will only become good with data science if you start doing a lot of things in gathering experience first of all you could do data science in pretty much any programming language but it has become commonplace to mostly use these three python r and julia with those you can easily manipulate and visualize data and they also have a broad community with lots of extensions my personal choice is python some years ago i also gave r a shot but i didn't find it very intuitive for my purposes but this is just personal preference it certainly also has its advantages and r is mainly used for statistics heavy tasks python is an all-rounder and julia was built for efficient data processing in scientific computing you can easily say that python is the most commonly used language and therefore i suggest any beginner to use it all of the following is therefore also based on python of course it's really important to bring some basic knowledge in programming such as understanding concepts like object oriented programming or generally the ability to solve problems with stack overflow whether you program and what tools should you use most of the data scientists nowadays work in jupiter notebooks that's because you can split the code into cells and execute each cell independently that allows you to perform heavy computations only once and play around with the results i recommend everyone to go with jupiter lab instead of jupiter notebook because from my experience notebooks tend to become quite messy after some time and therefore it makes sense to refactor the code into separate files personally most of the time i work in an ide which stands for integrated development environment my personal preference here is visual studio code ides also support jupyter notebooks but in addition provide many other advantages such as great debuggers easy setup for remote work and a lot of other things finally i wanted to point out that google collab is a great place for your personal projects you can get a lot of computing power for free and most of the required libraries are already set up generally cloud-based notebooks are a very good environment for quickly getting started i thought it makes sense to share what i use as techstack for doing data science you can use this overview to see which areas you already have covered and what additional things might be helpful for data wrangling i mainly use pandas numpy and sometimes spark with that i can modify tabular data but also other data types like images or graphs efficiently to build plots i prefer to use seaborne and also work with plotly every now and then the seaborne gallery usually also inspires me which visualization technique might be most suitable for my data for machine learning i mostly use scikit-learn for general algorithms and pytorch for deep learning those two cover pretty much everything i did in all my projects so far when it comes to reading and writing data i sometimes need sql or no sql frameworks like mongodb as i'm mostly doing deep learning my data is usually stored on a hard drive of a linux remote machine for this purpose i'm commonly using the clouds for example aws or microsoft azure that's why it's also important to know some basic linux commands for example how to connect with a remote machine or how to copy files for building quick prototype dashboards i usually use streamlit or dash finally to be able to track my experiments and deploy the models as api endpoints i use ml flow or weights and biases besides that i worked my code with gits and used additional tools depending on the project well now you have an overview but how exactly should you get started after learning the theory from books my absolute recommendation is to participate in a kaggle competition this is the perfect playground to try things out and get supported by the community whenever you get stuck you can quickly find help and inspiration from other caglers and generally it's a great mix of theory and practice and also a great place to learn from others after that i suggest to do a personal project for example teach a car to drive on its own using reinforcement learning or build a vehicle detection system or analyze nfl player data and see if you can find interesting patterns whatever you find interesting once you feel comfortable with the basics you will be able to learn anything new that comes across your way and this will also help you to get a feeling for which of these three tracks might be most suitable for you in any case data science means lifelong learning because this field is constantly improving and new ideas are published daily but that's also what makes it exciting for me it never gets boring with that we come to the end of this video and i hope that i gave you a good overview and some guidance on how to get started feel free to contact me in case of any questions and simply leave a comment if you found it helpful have a great day and see you soon in a future video
Original Description
▬▬ Used Music ▬▬▬▬▬▬▬▬▬▬▬
Music from Uppbeat (free for Creators!):
https://uppbeat.io/t/ra/glowing
License code: VCV7HTCWOOON7WAS
▬▬ Timestamps ▬▬▬▬▬▬▬▬▬▬▬
00:00 Introduction
00:26 Data Science Tracks
02:09 Learning the Theory
04:19 Research Data Scientist
05:38 Applied Research Data Scientist
06:53 Industry Data Scientist
08:12 Wrap-Up Career Tracks
08:50 Programming Languages
09:59 Development Environment
11:06 My Data Science Tech Stack
12:45 Roadmap to get started
▬▬ Used Icons ▬▬▬▬▬▬▬▬▬▬▬
All Icons are from Freepic (flaticon)
▬▬ Support me if you like 🌟
►Coursera: https://imp.i384100.net/b31QyP
►Link to this channel: https://bit.ly/3zEqL1W
►Support me on Patreon: https://bit.ly/2Wed242
►Buy me a coffee on Ko-Fi: https://bit.ly/3kJYEdl
►E-Mail: deepfindr@gmail.com
▬▬ My equipment 💻
- Microphone: https://amzn.to/3DVqB8H
- Microphone mount: https://amzn.to/3BWUcOJ
- Monitors: https://amzn.to/3G2Jjgr
- Monitor mount: https://amzn.to/3AWGIAY
- Height-adjustable table: https://amzn.to/3aUysXC
- Ergonomic chair: https://amzn.to/3phQg7r
- PC case: https://amzn.to/3jdlI2Y
- GPU: https://amzn.to/3AWyzwy
- Keyboard: https://amzn.to/2XskWHP
- Bluelight filter glasses: https://amzn.to/3pj0fK2
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from DeepFindr · DeepFindr · 39 of 56
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
▶
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
Understanding Graph Neural Networks | Part 1/3 - Introduction
DeepFindr
Understanding Graph Neural Networks | Part 2/3 - GNNs and it's Variants
DeepFindr
Understanding Graph Neural Networks | Part 3/3 - Pytorch Geometric and Molecule Data using RDKit
DeepFindr
Node Classification on Knowledge Graphs using PyTorch Geometric
DeepFindr
Understanding Convolutional Neural Networks | Part 1 / 3 - The Basics
DeepFindr
Understanding Convolutional Neural Networks | Part 2 / 3 - Wonders of the world CNN with PyTorch
DeepFindr
Understanding Convolutional Neural Networks | Part 3 / 3 - Transfer Learning and Explainable AI
DeepFindr
How to use edge features in Graph Neural Networks (and PyTorch Geometric)
DeepFindr
Explainable AI explained! | #1 Introduction
DeepFindr
Explainable AI explained! | #2 By-design interpretable models with Microsofts InterpretML
DeepFindr
Explainable AI explained! | #3 LIME
DeepFindr
Explainable AI explained! | #4 SHAP
DeepFindr
Explainable AI explained! | #5 Counterfactual explanations and adversarial attacks
DeepFindr
Explainable AI explained! | #6 Layerwise Relevance Propagation with MRI data
DeepFindr
Understanding Graph Attention Networks
DeepFindr
GNN Project #1 - Introduction to HIV dataset
DeepFindr
GNN Project #2 - Creating a Custom Dataset in Pytorch Geometric
DeepFindr
GNN Project #3.2 - Graph Transformer
DeepFindr
GNN Project #4.1 - Graph Variational Autoencoders
DeepFindr
GNN Project #4.2 - GVAE Training and Adjacency reconstruction
DeepFindr
GNN Project #4.3 - One-shot molecule generation - Part 1
DeepFindr
GNN Project #4.3 - Code explanation
DeepFindr
Machine Learning Model Deployment with Python (Streamlit + MLflow) | Part 1/2
DeepFindr
Machine Learning Model Deployment with Python (Streamlit + MLflow) | Part 2/2
DeepFindr
How to explain Graph Neural Networks (with XAI)
DeepFindr
Explaining Twitch Predictions with GNNExplainer
DeepFindr
Python Graph Neural Network Libraries (an Overview)
DeepFindr
Friendly Introduction to Temporal Graph Neural Networks (and some Traffic Forecasting)
DeepFindr
Traffic Forecasting with Pytorch Geometric Temporal
DeepFindr
Fraud Detection with Graph Neural Networks
DeepFindr
Fake News Detection using Graphs with Pytorch Geometric
DeepFindr
Recommender Systems using Graph Neural Networks
DeepFindr
How to handle Uncertainty in Deep Learning #1.1
DeepFindr
How to handle Uncertainty in Deep Learning #1.2
DeepFindr
How to handle Uncertainty in Deep Learning #2.1
DeepFindr
How to handle Uncertainty in Deep Learning #2.2
DeepFindr
Converting a Tabular Dataset to a Graph Dataset for GNNs
DeepFindr
Converting a Tabular Dataset to a Temporal Graph Dataset for GNNs
DeepFindr
How to get started with Data Science (Career tracks and advice)
DeepFindr
Causality and (Graph) Neural Networks
DeepFindr
Diffusion models from scratch in PyTorch
DeepFindr
Self-/Unsupervised GNN Training
DeepFindr
Contrastive Learning in PyTorch - Part 1: Introduction
DeepFindr
Contrastive Learning in PyTorch - Part 2: CL on Point Clouds
DeepFindr
State of AI 2022 - My Highlights
DeepFindr
Equivariant Neural Networks | Part 1/3 - Introduction
DeepFindr
Equivariant Neural Networks | Part 2/3 - Generalized CNNs
DeepFindr
Equivariant Neural Networks | Part 3/3 - Transformers and GNNs
DeepFindr
Personalized Image Generation (using Dreambooth) explained!
DeepFindr
Vision Transformer Quick Guide - Theory and Code in (almost) 15 min
DeepFindr
LoRA explained (and a bit about precision and quantization)
DeepFindr
Dimensionality Reduction Techniques | Introduction and Manifold Learning (1/5)
DeepFindr
Principal Component Analysis (PCA) | Dimensionality Reduction Techniques (2/5)
DeepFindr
Multidimensional Scaling (MDS) | Dimensionality Reduction Techniques (3/5)
DeepFindr
t-distributed Stochastic Neighbor Embedding (t-SNE) | Dimensionality Reduction Techniques (4/5)
DeepFindr
Uniform Manifold Approximation and Projection (UMAP) | Dimensionality Reduction Techniques (5/5)
DeepFindr
More on: Research Methods
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Medium · AI
ICMI 2026 Reviews [D]
Reddit r/MachineLearning
Workshop submission for main conference paper under review [D]
Reddit r/MachineLearning
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Reddit r/MachineLearning
Chapters (11)
Introduction
0:26
Data Science Tracks
2:09
Learning the Theory
4:19
Research Data Scientist
5:38
Applied Research Data Scientist
6:53
Industry Data Scientist
8:12
Wrap-Up Career Tracks
8:50
Programming Languages
9:59
Development Environment
11:06
My Data Science Tech Stack
12:45
Roadmap to get started
🎓
Tutor Explanation
DeepCamp AI