Data Science SQL Interview Question Walkthrough | SQL Sundays #5

Tina Huang · Advanced ·📄 Research Papers Explained ·5y ago

Key Takeaways

The video provides a walkthrough of a data science SQL interview question, demonstrating how to approach and solve SQL problems in a real interview style, utilizing tools such as StrataScratch and 365 Data Science.

Full Transcript

hello my friends it is time for another sql sunday all right let's jump straight into it alright so today's question is from microsoft uh it is called download fax find the total number of downloads for paying and non-paying users by date include all records where non-paying customers have more downloads than paying customers the output should be sorted by earliest date first and contain 3 column three columns date non-paying downloads and paying downloads okay so sorry about the noise i am clearing the space on this table all right so for ms user dimension we have user id account id and we have ms account dimensions account id paying customers and we have ms download fax which is date user id and downloads let us look at these tables so ms user dimensions okay it has user id indeed and their account id okay so i'm curious if the user id one user only has one account or it is not the case that's a question that i'm going to be keeping in the back of my head um and another one see let's see ms account dimensions account id and paying customer so i'm assuming paying customer will show me some sort of data um either identifying that it's a paying customer or not or it's a table that only has paying customers so let us see okay cool so this is a table that um has account id and it tells you if it's a paying customer or not so i assume that it's either no or yes and for ms download fax this is a fact table that tells you date user id and the downloads okay it is look at the expected output um so it's going to be date it's going to be non-paying and it's going to be paying and it's sorted by date okay cool this question has quite a few steps to it so let's try to break this down um so let's see find a total number of downloads from paying and non-paying users by date so we only use we only want to include records where non-paying customers have more downloads than paying customers so that's going to be a filter over there and then the output should be sorted by earlier state first and contain three columns okay so some of the assumptions i'm going to make here as i was saying earlier the um [Music] actually okay so for it for use i was talking about how like user each user would only have one account id um i don't know if that actually matters here because even if they did have multiple account ids and then they used it on every single like multiple times a day we're like you know using different accounts since we're just summing up the different downloads they're doing i don't think it actually matters in this case all right let us tackle this question okay let's see so first we want to do some joins so we're going to be joining together these tables ms user dimension ms account dimension and ms download fax okay and then after that we will be we'll be grouping by date and for each date we want to use a case one so we want to sum all downloads for non-paying customers and also for paying customers and finally uh oh yeah we want to sort by earliest date so we want to order by date and then finally include only records where non-paying customers have more downloads than paying customers so this is going to be another filter so okay so non-paying customers downloads is greater than paying customers cool let's get started on the query okay let us see so from ms user dimension join ms account dimension join ms download fax so for this joint oh no okay for this drawing we are joining on uh let's give this name so user account dimensions call that a and download fast call that f so on you dot what are we joining on um account id is equal to a dot account id and for this one we are joining on you don't user id is equal to f dot user id oops cool um group by date so i'm going to write the select name in first select so we want the date and then so we want to sum all downloads from non-paying customers so here we're going to need to use the case one statement so we can do case one what's it called paying customer paying customer is equal to no then downloads and as uh is it called non-paying so this will for each time that we have a paying customer uh not paying customer then we're gonna get the number of downloads and what we want to do here is that we actually want to sum this together and that will give us all the downloads for that day for the non-paying customer and we want to do the same for the paying customer so in case when paying customers equals yes then downloads and as paying okay so we are going to be grouping by the date here and then we want to order by date as well um we want it ascending right yeah so we can just leave that and finally um so we need to do this filter so there's like a few ways of doing this i generally prefer using like a a cte um because i think it's just a little bit clearer to read for the interviewer and and for myself so let's do that um it's kind of like with temp as not the best name but whatever it is okay we attempt as that and then we will say from temp we will select all and where what is it again non-paying customer is greater than paying customer so non-paying is greater than pay this is a longer query it's also a little bit harder so let's just make sure that we actually got everything right um let's see from ms user dimension okay we're joining these together that looks fine okay group by order by temp as okay we're finding a sum of it and then we're also finding finding some case when paying customers equal to yes then downloads cool um okay i think this should be correct shall we check the solution all right okay yay you got it right hey okay um i want to actually see here because this is like um i wonder what they did in terms of their solution um let's see okay so okay so this is their inner query so let's also look at that some case when okay so this looks about the same um group by date order by date okay so and then select here yeah so they over here they did like a group by statement and then a having statement um yeah i think that's yeah that's that's the same here i don't know if you need to order it again over here um this is already in order i don't know i i don't think you actually need to order again but i'm not sure um yeah i mean i guess like having it in there is is better so better safe than sorry i can definitely see like there's some uh servers in which if you don't explicitly order explicitly order it again then it would not it would be out of order but i think it's like a pretty minor detail so it's okay if you didn't catch that i mean i i didn't catch that um so i think that's correct oh they did a left joint here so okay they did a left join on both of these why did they do a left join um ms user dimension okay so i would actually argue that maybe you shouldn't do a left join yeah so i think i would actually argue that maybe you shouldn't do a left joining here my reasoning is because if you're joining these together and say like you're joining together the user dimensions and then the account dimensions and it doesn't exist on account dimensions then you could get um that you wouldn't know if they're a paying customer or not you might just you would just get like a bunch of nulls in that case and the same over here so i think um i guess like it doesn't really matter um that much in this case because it seems like all the um what is it called the dummy data as well as like the checks that they have are really just not really looking into that case um and also because you are explicitly saying like when paying customers you go like yes but when paying customers equal to no um so i guess like it's okay and you're also summing it yeah so i i don't think it actually makes that much of a difference um in this case okay cool enough of my ramblings um yeah in terms of like what we can do to optimize or improve it you know this is like this is this is a good question i like this question i would love to see your solution for this one because i feel like there's a lot of ways of approaching this one um and i will not doubt that you guys come up with much more optimal ways of doing things than i have um even yeah so i would love to see that leave it as a comment below um but currently like off the top of my head i can't really think of anything that will make him more optimized maybe i don't know if there's a way of doing it so that you can only have one pass through but if you guys i can figure out a way of doing that please do let me know that's it for me and that's it for this episode of sequel sunday if you guys want to check out strata scratch check it out in the link below alright see you guys in the next video

Original Description

Hi friends! Welcome back to SQL Sundays! A weekly SQL data science interview questions and answers walkthrough. This is another data science SQL interview question and answer done from beginning to end in a real interview style. 🔗Affiliates ======================== My SQL for data science interviews course (10 full interviews): https://365datascience.com/learn-sql-for-data-science-interviews/ 365 Data Science: https://365datascience.pxf.io/WD0za3 (link for 57% discount for their complete data science training) Check out StrataScratch for data science interview prep: https://stratascratch.com/?via=tina 📲Socials ======================== instagram: https://www.instagram.com/hellotinah/ linkedin: https://www.linkedin.com/in/tinaw-h/ discord: https://discord.gg/5mMAtprshX 🤯Study with Tina ======================== Study with Tina channel: https://www.youtube.com/channel/UCI8JpGrDmtggrryhml8kFGw How to make a studying scoreboard: https://www.youtube.com/watch?v=KAVw910mIrI Scoreboard website: scoreboardswithtina.com livestreaming google calendar: https://bit.ly/3wvPzHB 🎥Other videos you might be interested in ======================== SQL Sundays Playlist: https://www.youtube.com/playlist?list=PLVD3APpfd1tuXrXBWAntLx4tNaONro5dA How I learned SQL from Scratch in 11 Days to Pass my FANNG SQL Interview: https://www.youtube.com/watch?v=vaD3ZFFNwhM How I consistently study with a full-time job: https://www.youtube.com/watch?v=INymz5VwLmk How I would learn to code (if I could start over): https://www.youtube.com/watch?v=MHPGeQD8TvI&t=84s 🐈‍⬛🐈‍⬛About me ======================== Hi, my name is Tina and I'm a data scientist at a FAANG company. I was pre-med studying pharmacology at the University of Toronto until I finally accepted that I would make a terrible doctor. I didn't know what to do with myself so I worked for a year as a research assistant for a bioinformatics lab where I learned how to code and became interested in data science. I then di
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Tina Huang · Tina Huang · 18 of 60

1 How to choose between software engineering and data science | 5 Key Considerations
How to choose between software engineering and data science | 5 Key Considerations
Tina Huang
2 How I got Software Engineering and Data Science Internships | Computer Science Job Search Part 1
How I got Software Engineering and Data Science Internships | Computer Science Job Search Part 1
Tina Huang
3 How I Became a Data Scientist | Computer Science Job Search Part 2
How I Became a Data Scientist | Computer Science Job Search Part 2
Tina Huang
4 3rd Year Statistics,  Data Science, Computer Science Resume | Reviewing Your Resumes Ep. 1
3rd Year Statistics, Data Science, Computer Science Resume | Reviewing Your Resumes Ep. 1
Tina Huang
5 How to learn SQL for data science interview (the minimize effort maximize outcome way)
How to learn SQL for data science interview (the minimize effort maximize outcome way)
Tina Huang
6 3rd Year CS Resume (and asian drink) Review | Reviewing Your Resumes Ep. 2
3rd Year CS Resume (and asian drink) Review | Reviewing Your Resumes Ep. 2
Tina Huang
7 Are you a student? If yes, this is the best data science project for you!
Are you a student? If yes, this is the best data science project for you!
Tina Huang
8 The Ultimate Guide to Land a Data Science Internship | schedule and time management survival guide
The Ultimate Guide to Land a Data Science Internship | schedule and time management survival guide
Tina Huang
9 Upenn MCIT Program Details and Real Student Experiences - Dr. Arvind Bhusnurmath
Upenn MCIT Program Details and Real Student Experiences - Dr. Arvind Bhusnurmath
Tina Huang
10 Real Data Science SQL Interview Questions and Answers # 1 | Data Science Interview Questions
Real Data Science SQL Interview Questions and Answers # 1 | Data Science Interview Questions
Tina Huang
11 3 More Unique and Impactful Projects to get a Data Science Job
3 More Unique and Impactful Projects to get a Data Science Job
Tina Huang
12 Real Data Science SQL Interview Questions and Answers # 2 | Data Science Interview Questions
Real Data Science SQL Interview Questions and Answers # 2 | Data Science Interview Questions
Tina Huang
13 THANK YOU FOR 1000! | Proper intro | Random facts about myself
THANK YOU FOR 1000! | Proper intro | Random facts about myself
Tina Huang
14 A day in the life of a data scientist (FAANG data scientist remote)
A day in the life of a data scientist (FAANG data scientist remote)
Tina Huang
15 SQL Data Science Interview Questions and Answers (interview style walk-through) | Question 3
SQL Data Science Interview Questions and Answers (interview style walk-through) | Question 3
Tina Huang
16 Biology to Data Science (data professor's tips on how to get a data science research position)
Biology to Data Science (data professor's tips on how to get a data science research position)
Tina Huang
17 SQL Data Science Interview Questions and Answers (interview style walk-through) | SQL Sundays #4
SQL Data Science Interview Questions and Answers (interview style walk-through) | SQL Sundays #4
Tina Huang
Data Science SQL Interview Question Walkthrough | SQL Sundays #5
Data Science SQL Interview Question Walkthrough | SQL Sundays #5
Tina Huang
19 Data Science Resume Round-Up With @KenJee_ds  - Episode 2
Data Science Resume Round-Up With @KenJee_ds - Episode 2
Tina Huang
20 SQL Data Science Interview Question Walkthrough | SQL Sundays #6
SQL Data Science Interview Question Walkthrough | SQL Sundays #6
Tina Huang
21 Data Science vs Software Engineering Interview | 3 Key Differences
Data Science vs Software Engineering Interview | 3 Key Differences
Tina Huang
22 Data Science SQL Interview Question Walkthrough (real interview style) | SQL Sundays #7
Data Science SQL Interview Question Walkthrough (real interview style) | SQL Sundays #7
Tina Huang
23 The data science resume that got me my FAANG (MANGA?) job (entry level data scientist)
The data science resume that got me my FAANG (MANGA?) job (entry level data scientist)
Tina Huang
24 Data Science SQL Interview Question Walkthrough (real interview style) | SQL Sundays #8
Data Science SQL Interview Question Walkthrough (real interview style) | SQL Sundays #8
Tina Huang
25 Interview with a quant trader
Interview with a quant trader
Tina Huang
26 How I chose my masters degree (as an international student)
How I chose my masters degree (as an international student)
Tina Huang
27 The software engineering resume that got me into FAANG and Goldman Sachs (internship)
The software engineering resume that got me into FAANG and Goldman Sachs (internship)
Tina Huang
28 3 tips to avoid debt for a masters #SHORTS
3 tips to avoid debt for a masters #SHORTS
Tina Huang
29 A hard work day (ft. new NLP project) | FAANG data science isn't chill | vlog 1
A hard work day (ft. new NLP project) | FAANG data science isn't chill | vlog 1
Tina Huang
30 The comments sections are WILD | YouTube sentiment analysis - Data science project for beginners
The comments sections are WILD | YouTube sentiment analysis - Data science project for beginners
Tina Huang
31 Do you have what it takes to be a great data scientist?
Do you have what it takes to be a great data scientist?
Tina Huang
32 How to learn data science in 2022 (the minimize effort maximize outcome way)
How to learn data science in 2022 (the minimize effort maximize outcome way)
Tina Huang
33 A productive day as a data scientist | day in the life of a data scientist vlog #2
A productive day as a data scientist | day in the life of a data scientist vlog #2
Tina Huang
34 How to learn math for data science (the minimize effort maximize outcome way)
How to learn math for data science (the minimize effort maximize outcome way)
Tina Huang
35 Internship that made me rethink my career...(technology summer analyst at Goldman Sachs)
Internship that made me rethink my career...(technology summer analyst at Goldman Sachs)
Tina Huang
36 How to get a data science job
How to get a data science job
Tina Huang
37 cake and big sister advice 🖤
cake and big sister advice 🖤
Tina Huang
38 the most underrated data job in 2021
the most underrated data job in 2021
Tina Huang
39 My career changing computer science masters degree in 15 minutes (Upenn MCIT)
My career changing computer science masters degree in 15 minutes (Upenn MCIT)
Tina Huang
40 Data science interview tips (product and technical interviews)
Data science interview tips (product and technical interviews)
Tina Huang
41 Needed to learn javascript in 3 hours - would not recommend
Needed to learn javascript in 3 hours - would not recommend
Tina Huang
42 from management consultant to software engineer | Humans of MCIT
from management consultant to software engineer | Humans of MCIT
Tina Huang
43 Overview, Review and Study Tips - Google Data Analytics Professional Certificate
Overview, Review and Study Tips - Google Data Analytics Professional Certificate
Tina Huang
44 Overview, Review and Study Tips - Google Data Analytics Professional Certificate (condensed version)
Overview, Review and Study Tips - Google Data Analytics Professional Certificate (condensed version)
Tina Huang
45 Watch this video before applying to Georgia Tech OMSCS
Watch this video before applying to Georgia Tech OMSCS
Tina Huang
46 How to self study technical things
How to self study technical things
Tina Huang
47 What's the best certificate for data analysts? Google vs IBM Data Analyst Certificate
What's the best certificate for data analysts? Google vs IBM Data Analyst Certificate
Tina Huang
48 FAANG Data scientist reviews: Datacamp, Dataquest, 365 Data Science
FAANG Data scientist reviews: Datacamp, Dataquest, 365 Data Science
Tina Huang
49 How I would learn to code (if I could start over)
How I would learn to code (if I could start over)
Tina Huang
50 The quant trader interview guide
The quant trader interview guide
Tina Huang
51 We code a trading bot live! @jacobamaral
We code a trading bot live! @jacobamaral
Tina Huang
52 Why you should have a portfolio website
Why you should have a portfolio website
Tina Huang
53 60k cooking and Q&A (ft. Uncle Roger fried rice)
60k cooking and Q&A (ft. Uncle Roger fried rice)
Tina Huang
54 How to self study using MOOCS (Udemy, Coursera, and other online courses) | self study online
How to self study using MOOCS (Udemy, Coursera, and other online courses) | self study online
Tina Huang
55 Data Science SQL Interview Question Walkthrough | SQL Sundays #9
Data Science SQL Interview Question Walkthrough | SQL Sundays #9
Tina Huang
56 Watch me build my portfolio! | How to create a portfolio website
Watch me build my portfolio! | How to create a portfolio website
Tina Huang
57 How I stop myself from burning out
How I stop myself from burning out
Tina Huang
58 How I take notes - Tips for efficient note taking that speeds up learning
How I take notes - Tips for efficient note taking that speeds up learning
Tina Huang
59 How I design effective study plans for ANY SUBJECT (and stick with them) | trading, coding etc.
How I design effective study plans for ANY SUBJECT (and stick with them) | trading, coding etc.
Tina Huang
60 How I'm learning to trade (forex swing trading)
How I'm learning to trade (forex swing trading)
Tina Huang

This video teaches how to approach and solve SQL problems in a real interview style, providing valuable insights and preparation for data science interviews. By following along, viewers can improve their SQL skills and learn how to think critically about data analysis problems. The video also provides resources for further learning, such as StrataScratch and 365 Data Science.

Key Takeaways
  1. Watch the video to understand the SQL problem
  2. Follow along with the solution
  3. Practice solving similar SQL problems
  4. Use resources like StrataScratch and 365 Data Science to improve SQL skills
  5. Prepare for data science interviews by practicing common SQL problems
💡 Practicing SQL problems in a real interview style can help improve critical thinking and problem-solving skills, making it easier to prepare for data science interviews.

Related Reads

📰
On July 1, 2026, arXiv will spin out from Cornell University, its home for the past 25 years, to become an independent nonprofit organization. Major funding support from Simons Foundation and Schmidt Sciences. Ditching the red for their website. [N]
arXiv is becoming an independent nonprofit organization after 25 years at Cornell University, backed by major funding, which will impact the future of research and academia
Reddit r/MachineLearning
📰
CS-NRRM™ Official Publications: Paper 1 and Paper 2 Are Now Available
Learn about the CS-NRRM's official publications on a 12-year longitudinal human observation archive and its significance in research and development
Medium · Data Science
📰
Found a potential mistake in an ICLR 2026 blogpost [D]
Verify a potential mistake in an ICLR 2026 blog post and learn how to effectively report errors in academic publications
Reddit r/MachineLearning
📰
Rebuttals Move Peer-Review Scores, but Initial-Review Structure Bounds the Movement
Learn how author rebuttals impact peer-review scores and the factors that influence their effectiveness in ICLR 2024-2025, using LLMs for measurement
ArXiv cs.AI
Up next
How to get started With Drug Discovery using BioAI: Computational Biology ( 4K UHD Med Masterclass )
Sudarshan's Multiverse
Watch →