What topics to prepare for Data Science Interviews in 2020?
Key Takeaways
Data Science interview preparation, systems design, and common interview questions in 2020
Full Transcript
hey everyone in this video we will see how you can prepare for your data science interviews if you've never interviewed for a data sense rule before or if you've been giving interviews and have been failing to prepare well well this video is for you in this video we will dive deep into different areas of data science that you can prepare for what are the topics are most commonly asked during the interviews and I hope this video will help you understand and help you improve at your interviews well let's just dive right into the video let's talk about projects right even before we talk about the kind of topics that you will need if you are already if you're not there in the field already building projects is a way for you to show that you can actually do something you can actually build data science projects a lot of the interviewers will ask you about let's say use you talk about building a recommendation engine right so how did you go about building it what were the data sources why did you choose that how did you handle missing values or maybe work it was the data set balanced what were the statistical techniques that you had evaluated for doing building that did you even did you even need a statistical evening so questions like that right all these kind of questions will come up in the interview so if you have done projects please be prepared completely about them right be prepared to go in depth on the project that you have worked on regardless of you being a generalist or a specialist because that is what is something that you have worked on and you are expected to know that pretty well if you know the why as well then it's fantastic if you know why did you do x over something else great so understanding deeply helps the helps you stand out more if you've worked in a team previously on a project talk about your role in the project talk about other people roll as well and have that helped you improve your roll or how how are you coordinating and collaborating with them okay so a little bit on the kind of different topics that you can work on and as we saw earlier data science can be broadly defined as data science equal to programming / hacking plus math / statistics plus the domain knowledge and it's a combination of all these three things so please be prepared to answer questions on each of these sub sections again a lot of these questions will vary depending on your background if you come from a math background like then it's likely that the interview is going to be math heavy if you come from a programming background maybe programming heavy if you comfortable domain background will be domain heavy and also the background of the interviewer and the company and the role so it's a combination of all these things that will lead to the questions that are going to be asked in an interview but from your side you have to be prepared on all of these different subsections so please do that and though you're not expected to be as good a programmer as a typical programmer is or as good a statistician or a mathematician as a typical mathematician is supposed to understand you supposed to understand what is happening right you're supposed to know what is happening and have a basic intuition for everything understand the works was like maybe you might not remember all the equations at any given point for a given algorithm but maybe one understand intuitively how it works and given time you can derive the equation or you can write the equations right because you're understanding it intuitively you can come to that so just prepared prepare for it in that way now let's look at the different categories we are talking about programming or hacking skills that we mentioned already so you should know how to program this is I think the hard requirement nowadays because a lot of the companies are building digital products and you won't be able to do anything if you can't program where you won't be able to let's say even sequel write it if you can't write shake on it you won't be able to query data you won't be able to get data out of your database how are you going to do any analysis right so should know how to program that's a bare minimum ideally right now because the market and the industry is working a lot in Python and are these one of these or both of these programming languages is something that you should be concentrating on if you want to be in data understanding of computer science fundamentals data structures algorithms base time time space complexity is helpful it really helps you improve your code so these things again are not required if you are a total beginner in programming don't even have to think about that even the third point third point is understanding of different programming paradigms object-oriented procedural imperative functional don't have to worry about all that initially but as you go further because more and more code will be written like for example scikit-learn right scikit-learn is it's a machine learning library in Python and if you look at their code they have written in an object-oriented way I mean it's it's good to learn this it's not mandatory and this is not asked during the interviews but then if like some if you have let's say you're given or take home tasks right if you demonstrate any of these things if you demonstrate that you know what a function means you know when to convert multiple lines into a function or you know how to write a class and define objects and and pass parameters if you can show all of that in the take-home task that you're given great I think then you can show that here you understand these concepts you're likely to write clean cold when you are hired and that's a big plus point right so yeah this is on programming knowledge the ways you will be tested for these in interviews are the ways I was tested I can think about that I can tell you about that was I was one one of the companies gave a hacker rank test another is the take home test which is another format where you are given a data set and a question and you're asked to find certain inferences right so then you're expected to write maybe a Jupiter Road book or a script and give it back so in one of the interviews I had to do both I had to write a Jupiter notebook I had to show how I would productionize it as well so I had to write functions create an API and give it back so this was one interview that I faced another experience was pair programming I don't know if you've heard of this but it's very similar to a whiteboarding kind of so the other person will be listening to you as you speak out loud and solve the problem so these are the different kinds of programming problems that you would get while applying for data science jobs that knowledge understanding of the basic concepts in probability statistics linear algebra and calculus concepts in probability might include events random variables based here on conditional probability a joint probability etcetera so I've been asked questions on all these most of these things on random variables based theorem events not on joint probability but yeah so this is like basic probability that is asked concepts in statistics might include measures of central tendency like mean median mode or measures of variance extended deviation variance statistical tests that you could use to maybe check correlation between two variables or and depending on the kind of variables right if it is a categorical variable or a nominal variable all these things come into picture hypothesis testing is one experimentation how do you do a hypothesis testing what is yeah so where is it to use what is assisting is essentially a/b testing used a lot of marketers so how do you determine the sample size how do you stop when do you stop your hypothesis test what is the metric what is the z-score what is the p-value how do you calculate a p-value and stuff like that but that is basic statistics and linear algebra I haven't been asked a lot of linear algebra questions but I thought it would be good to put here so that you can also revise like matrices have different operations I'd like what is the rank of a matrix and what what are the yeah how how do you do dimensionality reduction in matrices so so PCA is principal component analysis it was also asked in one of the interviews where I was asked to explain it mathematically how does PCA work right so things like that concepts in calculus might include differentiation so things like that so I have put math knowledge and statistical and machines on it as separate because this is an entire topic in itself so math knowledge is more geared towards probability statistics algebra linear algebra and calculus so these are like math right this is coming from math and then one branch of statistics is statistical learning or / machine learning and this itself is huge like math also you don't need to know everything that exists in probability you don't need to know everything that and statistic but a lot of the things around based here I'm a lot of the things I don't mean median more deviation variance hypothesis testing and stuff like that right so this is mad knowledge this is statistical and machine learning knowledge so what you would need to know in this is understand all the different concepts in statistical learning like supervised learning unsupervised learning reinforcement learning you should be able to conceptually explain what you mean West supervised learning the kind of datasets you can work on using the kind of problems you can solve like for example classification regression the different algorithms that you can use for each of them I'm not going into the details here about the techniques like linear regression logistic regression SVM's random forest decision trees and deep learning net neutral networks etcetera etcetera these are the different techniques you could use for all of these problems k-means clustering hierarchical clustering etc etcetera but those are unsupervised learning techniques different so different so these are different methods and like there's a framework that I use to think about them which is representation evaluation and reprobates representation optimization and evaluation so representation is the equation used to represent that particular algorithm optimization is the optimization technique that you use to improve the performance of a model and evaluation is the metric that you use to measure and evaluate the performance of your models so what are the different optimization techniques it could be a gradient descent or stochastic gradient descent or any other technique that you use to optimize your algorithms evaluation matrix could be your accuracy is your precision your recall or your AUC ROC curve that you use or all your false positive rates true positive rate so these are different kinds of evaluation metrics that you could use to see the performance of your models and then maybe concepts like cross-validation k-fold cross-validation how can you improve the performance of your model what is how can you split your data sets right train test splits validation split what are the different ways in which you can split your data sets how is your data set collected what are the different distributions that exist this is going little bit back to the math what normal distribution or a binomial distribution what kind of events can be modeled using these distributions and one more thing about ya when you split your dataset is it a stratified split or is it a random split or so all these things are important for you to know for you to be able to tell the interviewer may be things like if there are missing values what are the techniques that you would use to miss to handle that if there are outliers what are the things you would do to handle outliers you know data set or if you have a lot of dimensions right how can you actually reduce the dimensions without losing a lot of information things like that or what are the different techniques that you use what what what can you do right so all these things all this I mean whatever I'm telling here are our questions that have been asked to me in one or the other interview so these things are require and the biggest the biggest thing about data science is that you never know what kind of question you might be asked depending like I said it's a combination of all of these things right it comes from math computer science statistical learning and the backgrounds of the interviewer in your background all of these in the projects you've done all of these matters so just prepare across these topics here's what I would say going forward domain knowledge a lot of questions are not on these friends but then a few might be asked especially if you're just getting started you cannot expect it to have a lot of domain knowledge but then having domain knowledge gives you a clear advantage so understanding of the domain you're working in looking to work in different and this is different knowledge for different domains like Pharma telecom education HR media etcetera like like I said each domain has its own trade secrets so to say and if you know that and if you've used that to build a project right then and you're applying let's say you have some domain knowledge in pharma and you're applying to a pharmaceutical company and saying that hey I know how clinical trials work I know how to extract patient information from clinical trials using natural language processing and I've done this this was my project and now this is showing a lot of things here right this is showing how you know how clinical trials work which is again a very domain-specific thing you know how pharmaceutical companies are looking for this kind of knowledge and you know how NLP techniques work you know how to extract stuff from that you know how to use a computer to do that so you are using the entire set of skills that you have to do that project and this is the kind of project that will be very useful for a pharmaceutical company right but this might not be useful for a media company you wouldn't you can't go and say I'm meeting company that I did something in clinical trials look at that unless it is WikiLeaks kind of a thing which is something that you could also do and then show it to yeah like for example if you go to WikiLeaks data set it's huge right you can go out and find some relationship between those things and go to a media house and they will maybe they'll hire you so that's about domain knowledge broad understanding the use cases where data can be applied to solve domain problems is very useful and typically not a lot of questions are asked on domain knowledge a lot of the questions are on like I said if you can program if you can understand math if you can understand different techniques to work with data this is an our added advantage every time if you know the domain really well thank you so much for watching if you really liked this video please do give it a thumbs up and don't forget to subscribe to the channel
Original Description
Data Science interviews are known to be unpredictable given how much there is to know in the field. But there are certain areas where interviewers tend to concentrate on. I have been recently through a Data Scientist job hunt myself and have used my job hunt as a way to help you understand what you need to know! In this video, I also share a few questions that I was asked during my interviews with the hope that it helps you answer if you encounter them during your interviews.
Please do give it thumbs up if you find it useful and please subscribe to the channel!
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Imaad Mohamed Khan · Imaad Mohamed Khan · 5 of 34
1
2
3
4
▶
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Does AI know Fashion? - Mitali Sodhi - Mantissa Data Science Meetups
Imaad Mohamed Khan
Mantissa Data Science Webinar - 1 with Santhosh Shetty
Imaad Mohamed Khan
Recommender Systems - Imaad Mohamed Khan - Mantissa Data Science Meetups
Imaad Mohamed Khan
Data Science is more than just Data Scientist - Different Roles in the field of Data Science
Imaad Mohamed Khan
What topics to prepare for Data Science Interviews in 2020?
Imaad Mohamed Khan
Programming as a human activity
Imaad Mohamed Khan
What are the languages or tools used by Data Scientists in their work?
Imaad Mohamed Khan
Linear Regression From Scratch - Part 1
Imaad Mohamed Khan
Linear Regression From Scratch - Part 2
Imaad Mohamed Khan
Linear Regression From Scratch - Part 3
Imaad Mohamed Khan
Journey into Data Science - Fireside chat with Adarsha and Karthikeyan
Imaad Mohamed Khan
Off the ground - Python in 5 Steps
Imaad Mohamed Khan
How LinkedIn uses Data Science to build your feed - LinkedIn Feed Algorithm Explained
Imaad Mohamed Khan
Fireside chat with Eric Weber - Learnings in Data Science
Imaad Mohamed Khan
Part 2 - How LinkedIn uses Data Science to build your feed | LinkedIn Feed Algorithm Explained
Imaad Mohamed Khan
Using Streamlit's Share Feature to easily deploy (and share) videos using Github
Imaad Mohamed Khan
Airbnb Experiences Ranking Algorithm Explained - Part I
Imaad Mohamed Khan
Airbnb Experiences Ranking Algorithm Explained - Part II
Imaad Mohamed Khan
Airbnb Experiences Ranking Algorithm Explained - Part III
Imaad Mohamed Khan
Big Data, Hadoop and Machine Learning Explained using Dams
Imaad Mohamed Khan
Fireside Chat with Hiromu Hota - Transitioning from Research to Industry
Imaad Mohamed Khan
Introduction to Anomaly Detection and One Class Classification
Imaad Mohamed Khan
Reading and manipulating Google Sheets (GSheets) using Python libraries
Imaad Mohamed Khan
Writing to Google Sheets (GSheets) using Python libraries
Imaad Mohamed Khan
Fireside Chat with Mirza Rahim Baig - Business Problem Solving and Data Science Career Tips
Imaad Mohamed Khan
Six types of Data Analysis you will do as a Data Scientist
Imaad Mohamed Khan
Automatic Speech Recognition (ASR) with Facebook AI's wav2vec 2.0 model using Huggingface
Imaad Mohamed Khan
9 Anti-patterns to avoid MLOps mistakes
Imaad Mohamed Khan
8 pitfalls to avoid while using Machine Learning Interpretation Techniques (SHAP, PDP, LIME, PFI)
Imaad Mohamed Khan
Fireside Chat with Shadab Khan - AI in Healthcare and Data Science Career Tips
Imaad Mohamed Khan
Features and Feature Engineering in Machine Learning - An Introduction
Imaad Mohamed Khan
Building your own AI text generation tool with aitextgen using GPT-2/GPT-3
Imaad Mohamed Khan
Organising Data Science projects using CRISP-DM
Imaad Mohamed Khan
Introduction to Prompt Engineering
Imaad Mohamed Khan
More on: Systems Design Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
The Hardest Part Of Microservices Is Undoing What Already Succeeded
Medium · Programming
What OOP Actually Buys You (And Why “Real World Modeling” Is a Lie)
Medium · Programming
Data Partitioning in System Design: Why Every Scalable Application Depends on It
Medium · Programming
Why Realtime Collaboration Is Harder Than It Looks?
Medium · JavaScript
🎓
Tutor Explanation
DeepCamp AI