How to get started with Data Science (Career tracks and advice)

DeepFindr · Beginner ·📄 Research Papers Explained ·4y ago

Skills: Research Methods80%Reading ML Papers70%ML Maths Basics60%Supervised Learning50%Unsupervised Learning50%

Key Takeaways

The video discusses three career tracks for hands-on data science: research, applied research, and industry, and provides advice on getting started with data science, including learning basic theory and using tools like TensorFlow, PyTorch, and Python.

Full Transcript

hello and welcome to this quick guidance on how to get started with data science i created this video because a lot of people ask me how they should get started and also which direction they should take of course all of the following is only based on my personal opinion and represents what i would suggest to a good friend the position data scientist has many flavors from what i've seen so far there are mainly three career tracks for hands-on data science and by data scientists i only refer to data or machine learning scientists but not related fields like data analysts or data engineer the first one is the research track which means you will work at a university or a research institute here you typically do state-of-the-art research and find new ways to build machine learning models or use data and the people on this track typically work on a phd or already have one the second track is what i would call applied research and most of the time you are in the r d department of a company and do research with regards to open problems in the company for example in an automotive area you might work on autonomous driving or in the pharmaceutical industry you do research on drug discovery the third track is the industry track you are hired in any department of a company and use your skills to build models and deploy them to production here you are typically more confronted with building high quality applications and writing clean codes most of the time proven off-the-shelf models are used and you work together with a lot of other people to provide a dashboard in api or generally a digital product people commonly switch between these tracks for example after finishing the phd a lot of people switch to applied research of course you don't know what you like before you've tested it and therefore i want to give you some guidance on what you should try out and which skills are a must in each of these tracks before we talk about the tech stack of a data scientist and which tools i recommend to do data science the most important part is to learn the basic theory before not having a basic understanding it really doesn't make sense to start with the implementation to that end there are great youtube tutorials and lectures you could watch but also a lot of free books and other resources to get familiar with the theory at the present time we are really lucky to have access to so many great resources all over the internet which makes it quite easy to get started of course you can also check out courses at coursera which are typically very well prepared give yourself a few weeks to invest in building a solid basis many tutorials will also give you a first glimpse of how you can implement different algorithms all i want to say here is don't start coding without knowing what you're doing on the right are a few things that i would cover in the first place for example try to understand the inner workings of a linear regression as this will teach some intuition for the more advanced algorithms three based ensemble models are another very important algorithmic class which are an absolute must in my opinion besides that you should have heard of terms like overfitting or cross validation and should know some algorithms for supervised and unsupervised learning depending on what you will do in the future it can also help a lot to get familiar with mathematical optimization for example understanding gradient descent this will come in handy when you take a look at neural networks and deep learning whenever you touch data you will be confronted with statistics even if it's only a simple histogram therefore it is very important to also have some knowledge in this discipline i know it can be overwhelming at the beginning but try to get not frustrated and conquer one thing after the other we all need to learn new stuff every day and the important part is that we keep going let's now have a closer look at the different data science tracks having a clearer picture on what they are doing helps to take better career decisions let's begin with the research track how does a day in a research scientist's life look like these guys often read and write a lot of papers and usually work with state-of-the-art machine learning often the problems they try to solve have a longer time horizon and fundamental research is conducted therefore a lot of math and statistics is involved and you should have fun diving into the mathematical details which sometimes look like this typically the conducted experiments are done with smaller open source data sets while you don't need too big computers however most of these positions are currently in the deep learning field and therefore you need to know how to train on a gpu so what are the typical tech skills required here of course you build new things from scratch therefore good programming skills are necessary most of the time you will also work on a linux machine because a lot of things are usually easier compared to windows at least from my experience finally if you work on deep learning research you should be familiar with one of the deep learning libraries like tensorflow or pytorch of course there can be a lot of other skills that are required but this really depends on what you're working on when it comes to applied research there are quite some overlaps with the research track the previous skills are all skills that you will need here as well it might be however that you need less in-depth understanding of the deep learning models or base your experiments on existing implementations the goal is to apply the latest research to industry problems and that's why you will also read a lot of papers this track is focused on innovation and aims to build prototypes but you're quite decoupled from the actual business typically the data sets in the industry are much bigger and therefore you will often work with cloud computing instances and need to build applications at scale that's why this position is also quite computer science heavy an important skill in this r d track is also the ability to work with databases this means using sql or other technologies to read and write data that's because you typically work with real world data and therefore need to access this data somehow as you will work together with other researchers it's also important to have profound code versioning skills using tools like git finally i would say that a phd is not a must in this intersection between industry and research but a lot of people have one the industry data science track is most connected with the actual business units you work together with them to create a solution for a business problem for example they require an anomaly detection model in their production line or the sales department wants to have a tool for estimating future revenues there are really lots of possibilities here you typically don't have to come up with new machine learning models for this as basically everything is already available the main task is therefore to clean the data put it in a proper shape and fit the model with it for this good visualization and modeling skills are required as you build digital solutions you typically work in teams with other roles like front-end and back-end developers data engineers and ux designers that's why a big part of the daily work is communication and maybe also scrum in early phases of the project it can be very helpful to be able to build simple dashboards to communicate the results with the business finally your model or application needs to run somewhere and therefore you should have a basic knowledge on how to put things into production this includes being familiar with http requests and container tools like docker to wrap this up i think you can say that the research roles are more focused on the algorithms and the industry data scientist is more focused on the data also the industry roles are most of the time working with tabular data from what i've seen so far and deep learning models are more often handled by the applied research scientists but of course this can be different from department to department next we will get a bit more precise and talk about which skills i recommend to have and which tools i use because a fact is you will only become good with data science if you start doing a lot of things in gathering experience first of all you could do data science in pretty much any programming language but it has become commonplace to mostly use these three python r and julia with those you can easily manipulate and visualize data and they also have a broad community with lots of extensions my personal choice is python some years ago i also gave r a shot but i didn't find it very intuitive for my purposes but this is just personal preference it certainly also has its advantages and r is mainly used for statistics heavy tasks python is an all-rounder and julia was built for efficient data processing in scientific computing you can easily say that python is the most commonly used language and therefore i suggest any beginner to use it all of the following is therefore also based on python of course it's really important to bring some basic knowledge in programming such as understanding concepts like object oriented programming or generally the ability to solve problems with stack overflow whether you program and what tools should you use most of the data scientists nowadays work in jupiter notebooks that's because you can split the code into cells and execute each cell independently that allows you to perform heavy computations only once and play around with the results i recommend everyone to go with jupiter lab instead of jupiter notebook because from my experience notebooks tend to become quite messy after some time and therefore it makes sense to refactor the code into separate files personally most of the time i work in an ide which stands for integrated development environment my personal preference here is visual studio code ides also support jupyter notebooks but in addition provide many other advantages such as great debuggers easy setup for remote work and a lot of other things finally i wanted to point out that google collab is a great place for your personal projects you can get a lot of computing power for free and most of the required libraries are already set up generally cloud-based notebooks are a very good environment for quickly getting started i thought it makes sense to share what i use as techstack for doing data science you can use this overview to see which areas you already have covered and what additional things might be helpful for data wrangling i mainly use pandas numpy and sometimes spark with that i can modify tabular data but also other data types like images or graphs efficiently to build plots i prefer to use seaborne and also work with plotly every now and then the seaborne gallery usually also inspires me which visualization technique might be most suitable for my data for machine learning i mostly use scikit-learn for general algorithms and pytorch for deep learning those two cover pretty much everything i did in all my projects so far when it comes to reading and writing data i sometimes need sql or no sql frameworks like mongodb as i'm mostly doing deep learning my data is usually stored on a hard drive of a linux remote machine for this purpose i'm commonly using the clouds for example aws or microsoft azure that's why it's also important to know some basic linux commands for example how to connect with a remote machine or how to copy files for building quick prototype dashboards i usually use streamlit or dash finally to be able to track my experiments and deploy the models as api endpoints i use ml flow or weights and biases besides that i worked my code with gits and used additional tools depending on the project well now you have an overview but how exactly should you get started after learning the theory from books my absolute recommendation is to participate in a kaggle competition this is the perfect playground to try things out and get supported by the community whenever you get stuck you can quickly find help and inspiration from other caglers and generally it's a great mix of theory and practice and also a great place to learn from others after that i suggest to do a personal project for example teach a car to drive on its own using reinforcement learning or build a vehicle detection system or analyze nfl player data and see if you can find interesting patterns whatever you find interesting once you feel comfortable with the basics you will be able to learn anything new that comes across your way and this will also help you to get a feeling for which of these three tracks might be most suitable for you in any case data science means lifelong learning because this field is constantly improving and new ideas are published daily but that's also what makes it exciting for me it never gets boring with that we come to the end of this video and i hope that i gave you a good overview and some guidance on how to get started feel free to contact me in case of any questions and simply leave a comment if you found it helpful have a great day and see you soon in a future video

Original Description

▬▬ Used Music ▬▬▬▬▬▬▬▬▬▬▬ Music from Uppbeat (free for Creators!): https://uppbeat.io/t/ra/glowing License code: VCV7HTCWOOON7WAS ▬▬ Timestamps ▬▬▬▬▬▬▬▬▬▬▬ 00:00 Introduction 00:26 Data Science Tracks 02:09 Learning the Theory 04:19 Research Data Scientist 05:38 Applied Research Data Scientist 06:53 Industry Data Scientist 08:12 Wrap-Up Career Tracks 08:50 Programming Languages 09:59 Development Environment 11:06 My Data Science Tech Stack 12:45 Roadmap to get started ▬▬ Used Icons ▬▬▬▬▬▬▬▬▬▬▬ All Icons are from Freepic (flaticon) ▬▬ Support me if you like 🌟 ►Coursera: https://imp.i384100.net/b31QyP ►Link to this channel: https://bit.ly/3zEqL1W ►Support me on Patreon: https://bit.ly/2Wed242 ►Buy me a coffee on Ko-Fi: https://bit.ly/3kJYEdl ►E-Mail: deepfindr@gmail.com ▬▬ My equipment 💻 - Microphone: https://amzn.to/3DVqB8H - Microphone mount: https://amzn.to/3BWUcOJ - Monitors: https://amzn.to/3G2Jjgr - Monitor mount: https://amzn.to/3AWGIAY - Height-adjustable table: https://amzn.to/3aUysXC - Ergonomic chair: https://amzn.to/3phQg7r - PC case: https://amzn.to/3jdlI2Y - GPU: https://amzn.to/3AWyzwy - Keyboard: https://amzn.to/2XskWHP - Bluelight filter glasses: https://amzn.to/3pj0fK2

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from DeepFindr · DeepFindr · 39 of 56

← Previous Next →

Understanding Graph Neural Networks | Part 1/3 - Introduction

Understanding Graph Neural Networks | Part 1/3 - Introduction

Understanding Graph Neural Networks | Part 2/3 - GNNs and it's Variants

Understanding Graph Neural Networks | Part 2/3 - GNNs and it's Variants

Understanding Graph Neural Networks | Part 3/3 - Pytorch Geometric and Molecule Data using RDKit

Understanding Graph Neural Networks | Part 3/3 - Pytorch Geometric and Molecule Data using RDKit

Node Classification on Knowledge Graphs using PyTorch Geometric

Node Classification on Knowledge Graphs using PyTorch Geometric

Understanding Convolutional Neural Networks | Part 1 / 3 - The Basics

Understanding Convolutional Neural Networks | Part 1 / 3 - The Basics

Understanding Convolutional Neural Networks | Part 2 / 3 - Wonders of the world CNN with PyTorch

Understanding Convolutional Neural Networks | Part 2 / 3 - Wonders of the world CNN with PyTorch

Understanding Convolutional Neural Networks | Part 3 / 3 - Transfer Learning and Explainable AI

Understanding Convolutional Neural Networks | Part 3 / 3 - Transfer Learning and Explainable AI

How to use edge features in Graph Neural Networks (and PyTorch Geometric)

How to use edge features in Graph Neural Networks (and PyTorch Geometric)

Explainable AI explained! | #1 Introduction

Explainable AI explained! | #1 Introduction

Explainable AI explained! | #2 By-design interpretable models with Microsofts InterpretML

Explainable AI explained! | #2 By-design interpretable models with Microsofts InterpretML

Explainable AI explained! | #3 LIME

Explainable AI explained! | #3 LIME

Explainable AI explained! | #4 SHAP

Explainable AI explained! | #4 SHAP

Explainable AI explained! | #5 Counterfactual explanations and adversarial attacks

Explainable AI explained! | #5 Counterfactual explanations and adversarial attacks

Explainable AI explained! | #6 Layerwise Relevance Propagation with MRI data

Explainable AI explained! | #6 Layerwise Relevance Propagation with MRI data

Understanding Graph Attention Networks

Understanding Graph Attention Networks

GNN Project #1 - Introduction to HIV dataset

GNN Project #1 - Introduction to HIV dataset

GNN Project #2 - Creating a Custom Dataset in Pytorch Geometric

GNN Project #2 - Creating a Custom Dataset in Pytorch Geometric

GNN Project #3.2 - Graph Transformer

GNN Project #3.2 - Graph Transformer

GNN Project #4.1 - Graph Variational Autoencoders

GNN Project #4.1 - Graph Variational Autoencoders

GNN Project #4.2 - GVAE Training and Adjacency reconstruction

GNN Project #4.2 - GVAE Training and Adjacency reconstruction

GNN Project #4.3 - One-shot molecule generation - Part 1

GNN Project #4.3 - One-shot molecule generation - Part 1

GNN Project #4.3 - Code explanation

GNN Project #4.3 - Code explanation

Machine Learning Model Deployment with Python (Streamlit + MLflow) | Part 1/2

Machine Learning Model Deployment with Python (Streamlit + MLflow) | Part 1/2

Machine Learning Model Deployment with Python (Streamlit + MLflow) | Part 2/2

Machine Learning Model Deployment with Python (Streamlit + MLflow) | Part 2/2

How to explain Graph Neural Networks (with XAI)

How to explain Graph Neural Networks (with XAI)

Explaining Twitch Predictions with GNNExplainer

Explaining Twitch Predictions with GNNExplainer

Python Graph Neural Network Libraries (an Overview)

Python Graph Neural Network Libraries (an Overview)

Friendly Introduction to Temporal Graph Neural Networks (and some Traffic Forecasting)

Friendly Introduction to Temporal Graph Neural Networks (and some Traffic Forecasting)

Traffic Forecasting with Pytorch Geometric Temporal

Traffic Forecasting with Pytorch Geometric Temporal

Fraud Detection with Graph Neural Networks

Fraud Detection with Graph Neural Networks

Fake News Detection using Graphs with Pytorch Geometric

Fake News Detection using Graphs with Pytorch Geometric

Recommender Systems using Graph Neural Networks

Recommender Systems using Graph Neural Networks

How to handle Uncertainty in Deep Learning #1.1

How to handle Uncertainty in Deep Learning #1.1

How to handle Uncertainty in Deep Learning #1.2

How to handle Uncertainty in Deep Learning #1.2

How to handle Uncertainty in Deep Learning #2.1

How to handle Uncertainty in Deep Learning #2.1

How to handle Uncertainty in Deep Learning #2.2

How to handle Uncertainty in Deep Learning #2.2

Converting a Tabular Dataset to a Graph Dataset for GNNs

Converting a Tabular Dataset to a Graph Dataset for GNNs

Converting a Tabular Dataset to a Temporal Graph Dataset for GNNs

Converting a Tabular Dataset to a Temporal Graph Dataset for GNNs

How to get started with Data Science (Career tracks and advice)

How to get started with Data Science (Career tracks and advice)

Causality and (Graph) Neural Networks

Causality and (Graph) Neural Networks

Diffusion models from scratch in PyTorch

Diffusion models from scratch in PyTorch

Self-/Unsupervised GNN Training

Self-/Unsupervised GNN Training

Contrastive Learning in PyTorch - Part 1: Introduction

Contrastive Learning in PyTorch - Part 1: Introduction

Contrastive Learning in PyTorch - Part 2: CL on Point Clouds

Contrastive Learning in PyTorch - Part 2: CL on Point Clouds

State of AI 2022 - My Highlights

State of AI 2022 - My Highlights

Equivariant Neural Networks | Part 1/3 - Introduction

Equivariant Neural Networks | Part 1/3 - Introduction

Equivariant Neural Networks | Part 2/3 - Generalized CNNs

Equivariant Neural Networks | Part 2/3 - Generalized CNNs

Equivariant Neural Networks | Part 3/3 - Transformers and GNNs

Equivariant Neural Networks | Part 3/3 - Transformers and GNNs

Personalized Image Generation (using Dreambooth) explained!

Personalized Image Generation (using Dreambooth) explained!

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

LoRA explained (and a bit about precision and quantization)

LoRA explained (and a bit about precision and quantization)

Dimensionality Reduction Techniques | Introduction and Manifold Learning (1/5)

Dimensionality Reduction Techniques | Introduction and Manifold Learning (1/5)

Principal Component Analysis (PCA) | Dimensionality Reduction Techniques (2/5)

Principal Component Analysis (PCA) | Dimensionality Reduction Techniques (2/5)

Multidimensional Scaling (MDS) | Dimensionality Reduction Techniques (3/5)

Multidimensional Scaling (MDS) | Dimensionality Reduction Techniques (3/5)

t-distributed Stochastic Neighbor Embedding (t-SNE) | Dimensionality Reduction Techniques (4/5)

t-distributed Stochastic Neighbor Embedding (t-SNE) | Dimensionality Reduction Techniques (4/5)

Uniform Manifold Approximation and Projection (UMAP) | Dimensionality Reduction Techniques (5/5)

Uniform Manifold Approximation and Projection (UMAP) | Dimensionality Reduction Techniques (5/5)

The video provides an introduction to data science career tracks and advice on getting started, including learning basic theory and using tools like TensorFlow, PyTorch, and Python. It covers the research, applied research, and industry tracks, and provides tips on reading and writing research papers, working with machine learning, and solving problems with a longer time horizon. By following the advice in the video, viewers can gain a better understanding of the data science field and start bui

Key Takeaways

Learn basic theory before implementation
Choose a career track: research, applied research, or industry
Develop programming skills, including knowledge of deep learning libraries like TensorFlow or PyTorch
Familiarize yourself with tools like Jupyter notebooks, Visual Studio Code, and cloud computing instances
Practice reading and writing research papers
Apply machine learning to industry problems
Build prototypes and work with real-world data
Communicate with business units and create digital solutions

💡 Learning basic theory before implementation is crucial for data science, and choosing the right career track can help you focus your skills and efforts.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Research Methods

View skill →

Mechanics of Materials III: Beam Bending

Mechanics of Materials III: Beam Bending

Inaugural Lecture: Juliane Reinecke

Inaugural Lecture: Juliane Reinecke

Saïd Business School, University of Oxford

Hands-On Learning: How and Why You Should Build a Home Lab

Hands-On Learning: How and Why You Should Build a Home Lab

SANS Live Online Interactive Remote Lab and Range Demo – SEC599: Defeating Advanced Adversaries

SANS Live Online Interactive Remote Lab and Range Demo – SEC599: Defeating Advanced Adversaries

Does Water Swirl the Other Way in the Southern Hemisphere?

Does Water Swirl the Other Way in the Southern Hemisphere?

Undergraduate Research Forum 2026

Undergraduate Research Forum 2026

Related AI Lessons

I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way

Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics

ICMI 2026 Reviews [D]

Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances

Reddit r/MachineLearning

Workshop submission for main conference paper under review [D]

Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV

Reddit r/MachineLearning

Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]

Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it

Reddit r/MachineLearning

Chapters (11)

Introduction

0:26 Data Science Tracks

2:09 Learning the Theory

4:19 Research Data Scientist

5:38 Applied Research Data Scientist

6:53 Industry Data Scientist

8:12 Wrap-Up Career Tracks

8:50 Programming Languages

9:59 Development Environment

11:06 My Data Science Tech Stack

12:45 Roadmap to get started

Beyond Big Vendors: ERP Systems Explained #shorts

Digital Transformation with Eric Kimberling