Data Science SQL Interview Question Walkthrough | SQL Sundays #5
Skills:
SQL Analytics90%
Key Takeaways
The video provides a walkthrough of a data science SQL interview question, demonstrating how to approach and solve SQL problems in a real interview style, utilizing tools such as StrataScratch and 365 Data Science.
Full Transcript
hello my friends it is time for another sql sunday all right let's jump straight into it alright so today's question is from microsoft uh it is called download fax find the total number of downloads for paying and non-paying users by date include all records where non-paying customers have more downloads than paying customers the output should be sorted by earliest date first and contain 3 column three columns date non-paying downloads and paying downloads okay so sorry about the noise i am clearing the space on this table all right so for ms user dimension we have user id account id and we have ms account dimensions account id paying customers and we have ms download fax which is date user id and downloads let us look at these tables so ms user dimensions okay it has user id indeed and their account id okay so i'm curious if the user id one user only has one account or it is not the case that's a question that i'm going to be keeping in the back of my head um and another one see let's see ms account dimensions account id and paying customer so i'm assuming paying customer will show me some sort of data um either identifying that it's a paying customer or not or it's a table that only has paying customers so let us see okay cool so this is a table that um has account id and it tells you if it's a paying customer or not so i assume that it's either no or yes and for ms download fax this is a fact table that tells you date user id and the downloads okay it is look at the expected output um so it's going to be date it's going to be non-paying and it's going to be paying and it's sorted by date okay cool this question has quite a few steps to it so let's try to break this down um so let's see find a total number of downloads from paying and non-paying users by date so we only use we only want to include records where non-paying customers have more downloads than paying customers so that's going to be a filter over there and then the output should be sorted by earlier state first and contain three columns okay so some of the assumptions i'm going to make here as i was saying earlier the um [Music] actually okay so for it for use i was talking about how like user each user would only have one account id um i don't know if that actually matters here because even if they did have multiple account ids and then they used it on every single like multiple times a day we're like you know using different accounts since we're just summing up the different downloads they're doing i don't think it actually matters in this case all right let us tackle this question okay let's see so first we want to do some joins so we're going to be joining together these tables ms user dimension ms account dimension and ms download fax okay and then after that we will be we'll be grouping by date and for each date we want to use a case one so we want to sum all downloads for non-paying customers and also for paying customers and finally uh oh yeah we want to sort by earliest date so we want to order by date and then finally include only records where non-paying customers have more downloads than paying customers so this is going to be another filter so okay so non-paying customers downloads is greater than paying customers cool let's get started on the query okay let us see so from ms user dimension join ms account dimension join ms download fax so for this joint oh no okay for this drawing we are joining on uh let's give this name so user account dimensions call that a and download fast call that f so on you dot what are we joining on um account id is equal to a dot account id and for this one we are joining on you don't user id is equal to f dot user id oops cool um group by date so i'm going to write the select name in first select so we want the date and then so we want to sum all downloads from non-paying customers so here we're going to need to use the case one statement so we can do case one what's it called paying customer paying customer is equal to no then downloads and as uh is it called non-paying so this will for each time that we have a paying customer uh not paying customer then we're gonna get the number of downloads and what we want to do here is that we actually want to sum this together and that will give us all the downloads for that day for the non-paying customer and we want to do the same for the paying customer so in case when paying customers equals yes then downloads and as paying okay so we are going to be grouping by the date here and then we want to order by date as well um we want it ascending right yeah so we can just leave that and finally um so we need to do this filter so there's like a few ways of doing this i generally prefer using like a a cte um because i think it's just a little bit clearer to read for the interviewer and and for myself so let's do that um it's kind of like with temp as not the best name but whatever it is okay we attempt as that and then we will say from temp we will select all and where what is it again non-paying customer is greater than paying customer so non-paying is greater than pay this is a longer query it's also a little bit harder so let's just make sure that we actually got everything right um let's see from ms user dimension okay we're joining these together that looks fine okay group by order by temp as okay we're finding a sum of it and then we're also finding finding some case when paying customers equal to yes then downloads cool um okay i think this should be correct shall we check the solution all right okay yay you got it right hey okay um i want to actually see here because this is like um i wonder what they did in terms of their solution um let's see okay so okay so this is their inner query so let's also look at that some case when okay so this looks about the same um group by date order by date okay so and then select here yeah so they over here they did like a group by statement and then a having statement um yeah i think that's yeah that's that's the same here i don't know if you need to order it again over here um this is already in order i don't know i i don't think you actually need to order again but i'm not sure um yeah i mean i guess like having it in there is is better so better safe than sorry i can definitely see like there's some uh servers in which if you don't explicitly order explicitly order it again then it would not it would be out of order but i think it's like a pretty minor detail so it's okay if you didn't catch that i mean i i didn't catch that um so i think that's correct oh they did a left joint here so okay they did a left join on both of these why did they do a left join um ms user dimension okay so i would actually argue that maybe you shouldn't do a left join yeah so i think i would actually argue that maybe you shouldn't do a left joining here my reasoning is because if you're joining these together and say like you're joining together the user dimensions and then the account dimensions and it doesn't exist on account dimensions then you could get um that you wouldn't know if they're a paying customer or not you might just you would just get like a bunch of nulls in that case and the same over here so i think um i guess like it doesn't really matter um that much in this case because it seems like all the um what is it called the dummy data as well as like the checks that they have are really just not really looking into that case um and also because you are explicitly saying like when paying customers you go like yes but when paying customers equal to no um so i guess like it's okay and you're also summing it yeah so i i don't think it actually makes that much of a difference um in this case okay cool enough of my ramblings um yeah in terms of like what we can do to optimize or improve it you know this is like this is this is a good question i like this question i would love to see your solution for this one because i feel like there's a lot of ways of approaching this one um and i will not doubt that you guys come up with much more optimal ways of doing things than i have um even yeah so i would love to see that leave it as a comment below um but currently like off the top of my head i can't really think of anything that will make him more optimized maybe i don't know if there's a way of doing it so that you can only have one pass through but if you guys i can figure out a way of doing that please do let me know that's it for me and that's it for this episode of sequel sunday if you guys want to check out strata scratch check it out in the link below alright see you guys in the next video
Original Description
Hi friends! Welcome back to SQL Sundays! A weekly SQL data science interview questions and answers walkthrough.
This is another data science SQL interview question and answer done from beginning to end in a real interview style.
🔗Affiliates
========================
My SQL for data science interviews course (10 full interviews):
https://365datascience.com/learn-sql-for-data-science-interviews/
365 Data Science:
https://365datascience.pxf.io/WD0za3 (link for 57% discount for their complete data science training)
Check out StrataScratch for data science interview prep:
https://stratascratch.com/?via=tina
📲Socials
========================
instagram: https://www.instagram.com/hellotinah/
linkedin: https://www.linkedin.com/in/tinaw-h/
discord: https://discord.gg/5mMAtprshX
🤯Study with Tina
========================
Study with Tina channel:
https://www.youtube.com/channel/UCI8JpGrDmtggrryhml8kFGw
How to make a studying scoreboard:
https://www.youtube.com/watch?v=KAVw910mIrI
Scoreboard website:
scoreboardswithtina.com
livestreaming google calendar:
https://bit.ly/3wvPzHB
🎥Other videos you might be interested in
========================
SQL Sundays Playlist:
https://www.youtube.com/playlist?list=PLVD3APpfd1tuXrXBWAntLx4tNaONro5dA
How I learned SQL from Scratch in 11 Days to Pass my FANNG SQL Interview: https://www.youtube.com/watch?v=vaD3ZFFNwhM
How I consistently study with a full-time job:
https://www.youtube.com/watch?v=INymz5VwLmk
How I would learn to code (if I could start over):
https://www.youtube.com/watch?v=MHPGeQD8TvI&t=84s
🐈⬛🐈⬛About me
========================
Hi, my name is Tina and I'm a data scientist at a FAANG company. I was pre-med studying pharmacology at the University of Toronto until I finally accepted that I would make a terrible doctor. I didn't know what to do with myself so I worked for a year as a research assistant for a bioinformatics lab where I learned how to code and became interested in data science. I then di
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Tina Huang · Tina Huang · 18 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
▶
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
How to choose between software engineering and data science | 5 Key Considerations
Tina Huang
How I got Software Engineering and Data Science Internships | Computer Science Job Search Part 1
Tina Huang
How I Became a Data Scientist | Computer Science Job Search Part 2
Tina Huang
3rd Year Statistics, Data Science, Computer Science Resume | Reviewing Your Resumes Ep. 1
Tina Huang
How to learn SQL for data science interview (the minimize effort maximize outcome way)
Tina Huang
3rd Year CS Resume (and asian drink) Review | Reviewing Your Resumes Ep. 2
Tina Huang
Are you a student? If yes, this is the best data science project for you!
Tina Huang
The Ultimate Guide to Land a Data Science Internship | schedule and time management survival guide
Tina Huang
Upenn MCIT Program Details and Real Student Experiences - Dr. Arvind Bhusnurmath
Tina Huang
Real Data Science SQL Interview Questions and Answers # 1 | Data Science Interview Questions
Tina Huang
3 More Unique and Impactful Projects to get a Data Science Job
Tina Huang
Real Data Science SQL Interview Questions and Answers # 2 | Data Science Interview Questions
Tina Huang
THANK YOU FOR 1000! | Proper intro | Random facts about myself
Tina Huang
A day in the life of a data scientist (FAANG data scientist remote)
Tina Huang
SQL Data Science Interview Questions and Answers (interview style walk-through) | Question 3
Tina Huang
Biology to Data Science (data professor's tips on how to get a data science research position)
Tina Huang
SQL Data Science Interview Questions and Answers (interview style walk-through) | SQL Sundays #4
Tina Huang
Data Science SQL Interview Question Walkthrough | SQL Sundays #5
Tina Huang
Data Science Resume Round-Up With @KenJee_ds - Episode 2
Tina Huang
SQL Data Science Interview Question Walkthrough | SQL Sundays #6
Tina Huang
Data Science vs Software Engineering Interview | 3 Key Differences
Tina Huang
Data Science SQL Interview Question Walkthrough (real interview style) | SQL Sundays #7
Tina Huang
The data science resume that got me my FAANG (MANGA?) job (entry level data scientist)
Tina Huang
Data Science SQL Interview Question Walkthrough (real interview style) | SQL Sundays #8
Tina Huang
Interview with a quant trader
Tina Huang
How I chose my masters degree (as an international student)
Tina Huang
The software engineering resume that got me into FAANG and Goldman Sachs (internship)
Tina Huang
3 tips to avoid debt for a masters #SHORTS
Tina Huang
A hard work day (ft. new NLP project) | FAANG data science isn't chill | vlog 1
Tina Huang
The comments sections are WILD | YouTube sentiment analysis - Data science project for beginners
Tina Huang
Do you have what it takes to be a great data scientist?
Tina Huang
How to learn data science in 2022 (the minimize effort maximize outcome way)
Tina Huang
A productive day as a data scientist | day in the life of a data scientist vlog #2
Tina Huang
How to learn math for data science (the minimize effort maximize outcome way)
Tina Huang
Internship that made me rethink my career...(technology summer analyst at Goldman Sachs)
Tina Huang
How to get a data science job
Tina Huang
cake and big sister advice 🖤
Tina Huang
the most underrated data job in 2021
Tina Huang
My career changing computer science masters degree in 15 minutes (Upenn MCIT)
Tina Huang
Data science interview tips (product and technical interviews)
Tina Huang
Needed to learn javascript in 3 hours - would not recommend
Tina Huang
from management consultant to software engineer | Humans of MCIT
Tina Huang
Overview, Review and Study Tips - Google Data Analytics Professional Certificate
Tina Huang
Overview, Review and Study Tips - Google Data Analytics Professional Certificate (condensed version)
Tina Huang
Watch this video before applying to Georgia Tech OMSCS
Tina Huang
How to self study technical things
Tina Huang
What's the best certificate for data analysts? Google vs IBM Data Analyst Certificate
Tina Huang
FAANG Data scientist reviews: Datacamp, Dataquest, 365 Data Science
Tina Huang
How I would learn to code (if I could start over)
Tina Huang
The quant trader interview guide
Tina Huang
We code a trading bot live! @jacobamaral
Tina Huang
Why you should have a portfolio website
Tina Huang
60k cooking and Q&A (ft. Uncle Roger fried rice)
Tina Huang
How to self study using MOOCS (Udemy, Coursera, and other online courses) | self study online
Tina Huang
Data Science SQL Interview Question Walkthrough | SQL Sundays #9
Tina Huang
Watch me build my portfolio! | How to create a portfolio website
Tina Huang
How I stop myself from burning out
Tina Huang
How I take notes - Tips for efficient note taking that speeds up learning
Tina Huang
How I design effective study plans for ANY SUBJECT (and stick with them) | trading, coding etc.
Tina Huang
How I'm learning to trade (forex swing trading)
Tina Huang
More on: SQL Analytics
View skill →Related Reads
📰
📰
📰
📰
On July 1, 2026, arXiv will spin out from Cornell University, its home for the past 25 years, to become an independent nonprofit organization. Major funding support from Simons Foundation and Schmidt Sciences. Ditching the red for their website. [N]
Reddit r/MachineLearning
CS-NRRM™ Official Publications: Paper 1 and Paper 2 Are Now Available
Medium · Data Science
Found a potential mistake in an ICLR 2026 blogpost [D]
Reddit r/MachineLearning
Rebuttals Move Peer-Review Scores, but Initial-Review Structure Bounds the Movement
ArXiv cs.AI
🎓
Tutor Explanation
DeepCamp AI