The Ultimate Coding Setup for Data Science
In this video I go though how I setup my operating system to code for datascience. We talk about operating system, terminal, IDEs and more.
Tmux settings repo: https://github.com/gpakosz/.tmux
My tmux and vim settings: https://github.com/RobMulla/vim_settings
Timeline:
00:00 Intro
00:22 Operating System
03:53 Terminal Stuff
09:19 Virtual Environments
12:40 IDEs
Follow me on twitch for live coding streams: https://www.twitch.tv/medallionstallion_
My other videos:
Speed Up Your Pandas Code: https://www.youtube.com/watch?v=SAFmrTnEHLg
Speed up Pandas Code: https://www.youtube.com/watch?v=SAFmrTnEHLg
Intro to Pandas video: https://www.youtube.com/watch?v=_Eb0utIRdkw
Exploratory Data Analysis Video: https://www.youtube.com/watch?v=xi0vhXFPegw
Working with Audio data in Python: https://www.youtube.com/watch?v=ZqpSb5p1xQo
Efficient Pandas Dataframes: https://www.youtube.com/watch?v=u4_c2LDi4b8
* Youtube: https://youtube.com/@robmulla?sub_confirmation=1
* Discord: https://discord.gg/HZszek7DQc
* Twitch: https://www.twitch.tv/medallionstallion_
* Twitter: https://twitter.com/Rob_Mulla
* Kaggle: https://www.kaggle.com/robikscube
#datascience #python #coding
What You'll Learn
The video demonstrates how to set up an operating system for data science coding, covering tools such as Ubuntu, tmux, and Vim, as well as environment management with Anaconda and conda, and code editing with Jupyter Lab and VS Code.
Full Transcript
what does the perfect data science setup look like I'm talking something that'll speed up your productivity let you code faster and more efficiently well there might not be a perfect setup but I'm going to walk you through today what I use on a daily basis for doing data science projects I'm going to break this down into a few different parts so let's go at it first by talking about operating system now the operating system I have set up on my machine and I use this about 90 percent of the time on my personal machines is Linux and the personal favorite flavor of Linux is Ubuntu the main reason why I use Ubuntu is because it's the most popular version of Linux out there because of that things are usually up to date and I don't have to worry about bugs being introduced into my operating system I would say my second favorite operating system is just the Mac OS I use max for work a lot of the time and it's Unix based so it's going to be very similar to Linux that being said I do have dual boot set up on my computer so I sometimes will load it up in Windows and the reason why is because it used to be that any gaming that I'd want to do windows was sort of the only way to do it in Linux the games wouldn't load now recently especially with games like Elden ring they've been working better in Linux than Windows so I actually haven't booted up Windows in years I also don't game is much as I used to the reason why I like running Linux for data science is because most of the work you're going to be doing is on a server in the cloud server or if you want to set up your home server Linux is the main operating system that everything runs on and if your main PC is running on it then it's just going to be super easy to transition over to what other stuff that you're going to work on also it boots up super fast it's very clean and there's none of that bloatware there I remember when Windows added to their start menu all the news feed and stuff that you don't really necessarily always want you don't have any of that junk another reason why I really like using Linux is you can do so much just in the command line when you install packages you can just sudo app update and it'll go look for the updates to any packages you have on your machine Any drivers and then you can sudo apt upgrade to upgrade them and you're working with code and data science the fact that you can just re reproduce things in your operating system by running lines of code that you might find in stack Overflow or in other Solutions is so much more powerful than clicking and moving things around and also sort of on that note a big thing about having Linux backend is that you're running bash let's say I wanted to go to a directory and move files around there's no reason to open up an explorer to select things and to actually drag them into folders if you get good at it you can move files around using command line like MV for move or CP for copying files it sounds like a little thing but being comfortable using bash and being able to move files around like that is such a Time Saver and something you should definitely look into mastering maybe I'll make another video on that another reason why I love using Linux is because if my laptop or a second desktop machine that I have I can easily SSH into that that SSH Gene lets you connect to your other computer and basically interact with it in the command line as if you were on that other machine you can also forward ports so you could say run Jupiter lab on your desktop computer and SSH to in it from a laptop and voila you're running python on your other computer but you're being able to interact with it through a web browser on your laptop the next thing I'd like to talk about is how I interact with the terminal [Music] so I am using mate terminal I also have a solarized dark theme that's just my preference here so you can go into profiles and set up in your terminal what the colors look like but the main and most powerful thing that I use when I am interacting with the terminal is something called tmux so I actually have a Alias called Rob mux and when I run that it'll start a new session of tmux now tmux is just a something that's running on your machine that keeps the state of your terminal so any of the windows or the processes that you have running if you're not in tmux and you're running let's say a python script and you X Out it will shut down that session or whatever you're running however if you're in tmux and you're running something like list everything in this directory and I close out of this this the next time that I go into tmux I'm right back where I left off so that's one of the main reasons why running tmux is great you can also see that tmux has some stuff down here on the bottom how long my machine has been up for what I am actually running here is Bash the clock what I've named this machine so if I've sshed into a different machine that will change and I have all of this set up in my tmux config file so this is where you can configure tmux to look really fancy and beautiful I have a few custom things set up but most of my tmux settings I've gotten from other people out there who have really perfected this art so I actually copied my base configuration file from this GitHub repo which I'll also Link in the description it's what I use to get this pretty bar at the bottom and I even have in my own GitHub repo my exact tmux settings so that if I go on to another server and I want to run tmux exactly like I'm used to on my home PC I can clone that repo copy in the tmux configuration file now some other things that make tmux awesome is that with some simple command line shortcuts you can actually split panes and be running multiple different terminal sessions at the same time the reason why this is great is because you can let's say run a python script in the top right here and then on the left side you can tail maybe a log file if there's a log file in the exact same windows and like I said if this closes down you load it back up you're in the exact same spot you can also add tabs code so control B and T will make a second tab down here you see how there are two numbers switch between them I use shift in the arrow keys everything in tmux has a ton of keyboard shortcuts that you just need to learn and even if you're say running on a Windows machine chances are and data science are going to be sshing into a server where you're going to want to run tmux especially then because if you lose connection to that machine because of a bad internet connection you want tmux to keep that session up did I mention I like tmux I think I've mentioned that enough now a few other things about my terminal I'll just mention is when you're running bash you can set up aliases that can be really helpful so I have a lot of aliases that I use one of them you'll notice that I use a lot which is LC and that's just listing all the files it's actually ls-lash color but I don't want to run that every time so the Alias just lets me set up a short command that I can run and then it'll run the long command Alias is really good make sure you set those up now two other quick things that I use all the time when I'm in my terminal one is H Top This basically lets you see all the processes running on your computer so this is like a system profiler of everything running on your machine so all my cores amount of memory that's being used as a data scientist you might be pushing the limits of your machine so that you might have too much data for memory and you need to keep an eye on the memory H top is great for that there's also b-top which has a little bit of a different look and feel to it I haven't really bought into it but I know some people swear by it and of course for all you old school people out there you can just run the straight old school top and avoid any of the colors and stuff but h-top's my fave another thing similar to h-top is NV top Envy top lets you see the processes being run on your GPU so if you have a machine that has a GPU and you're going to be doing stuff like deep learning this is great because you can see how much memory and how much CPU is being used on each of your devices or on your GPU and I usually have a tab open with NV top and a tab open with h top in my tmux just so I can jump to this if I ever need to all right next let's talk about package managers I know there are a lot of different options out there for keeping different environments in python or whatever project you're working on separate because maybe you want a version of pandas that's older for one project but you're gonna use a newer version for a different project I personally use anaconda and conda to manage my environments and I use pip to install my python packages I don't use conda to install python packages anymore it has that ability but I mainly just use it to containerize my environments usually when people are teaching how to get started with python on your computer they recommend installing Anaconda all Anaconda is is a pre-packaged version of conda with all of the main packages used for data science so I too would recommend using Anaconda as your main thing that you install if you're trying to run Python and data signs on your computer mini kinda is just more of a lightweight version of Anaconda where you can just get the essential packages needed to run some python stuff on your local machine so you can tell I have content installed on this machine because all the way to the left of my terminal here it says base that means I'm in my base content environment and I can just run conda with the conda command and EMV list will list all my environments that I have so over the years I've kind of gotten a few of these different kind of environments for different projects that I've been working on and you can create these using conda create you could tell it a version of python that you want it's just great because it'll keep everything isolated so if I wanted to say activate kaggle 2 then if I type in which python we can see that my main python now is directed to my kaggle 2 which is the second kaggle environment that I made on my machine because the the first one broke so there's really great documentation about how to manage your environments in conda and activating them adding new ones deleting them I've spent a lot of time on this page learning how to set up my environments and I'd recommend you do too now in my environment I can do condo list this will list all the packages that I have installed condo will let you install Connor will let you install more than just python packages pip is what I use to install python packages once I'm in my environment usually I've found pip to be less buggy than conda and it does a really good job of managing dependencies and all the stuff that goes along with of course you could go the docker route but I've managed to avoid that for a lot of reasons except for productionized code that you have to deploy and mainly that's in my work environment and of course installing things with Pip is pretty easy in the command line you just do pip install let's say panda is says I already have it installed I could do something like upgrade and it's going to download the latest version and uninstall my old version take care of all the dependencies you must learn how to use pip if you're working in Python now let's talk about Ides IDE is basically the software that you use to write your code now I'm not very strict about only using one IDE for one thing I actually like to use many different Ides for different things the main one that I would definitely recommend getting at least useful with opening up a file and being able to do some simple edits with because it's the least intuitive is vim or VI so if you have Vim installed in Linux you can just load it up by typing in Vim it'll bring you into this editor I use Vim a lot when I'm logging into a remote machine and I want to edit a file do some quick changes or I'm working on code that's already established and I'm just doing debugging learning the key bindings in Vim can make your life a lot easier and really just things like being able to jump around do quick edits and if I was a real hardcore coder making production code all day I might even say use Vim as your main IDE because it really is that powerful I also really like using Sublime Text I mainly use Sublime Text for code snippet or making to-do lists honestly and I can load up Sublime Text here I find myself using Sublime Text to write a lot of SQL queries which then I'll execute in other code so there is an extension that I use a lot called SQL beautifier that will just take nasty looking SQL code and you can just run it here and it makes a lot easier to read but I'm not gonna lie most of the code that I write as a data scientist is in Jupiter lab and I have a whole video talking about Jupiter Jupiter notebooks Jupiter lab but I'm just going to show you basically why I use Jupiter lab it's because I feel like the environment's really easy to navigate code looks nice and most of the time the code that I'm writing isn't production code it's more exploratory I'm looking at data for the first time I don't know exactly what I'm going to write and jumping from cell to cell is really important and being able to be efficient that's why I'd suggest checking out my tips and tricks section of my Jupiter Lab video if you want to learn more about that there comes a time though when running a Jupiter notebook is just not the right place to be writing code and when you want to abstract that code out and work on it usually the editor that I use is vs code so I'll show you just an example here here's my vs code and I've written a few functions let's say I still want to use these in a notebook to interact with some data that I've written the nice thing is I usually keep these in a separate folder a source code folder right next to my notebooks and then over here in my notebook you can see that I'm actually importing from this scripts file into my notebook so a lot of the code that's abstracted out I will then edit in vs code just because it's a great IDE in general it's lightweight and it is the most popular one out there now I know some people might say you can run notebooks in vs code and I know that's the case but I've tried it and I don't like it as much as the Jupiter lab environment I find things to be smoother and especially the keyboard commands that I use so frequently I haven't been able to switch over to vs code and have the same feel as I do working with Jupiter lab but that's just me I like using git in the command line it just I find find that's the easiest way to track and commit all my changes I also like to write a lot of code in kaggle notebooks that's because mainly because I can share it with you all and you can easily bin up your own environment by copying these notebooks it's also kind of like a public portfolio where you can share all of your code that you've written and then most of the things that I've mentioned when I'm working on a remote machine I can run all these things vs code Jupiter lab I have conda installed have it all set up up the way I want and then I'm just forwarding everything over through SSH to my local machine you don't necessarily need to have a beefy machine to do data science because you can always connect remotely to larger machines and spin them up as you need there you go I hope you found this video helpful maybe showed you a few things that you didn't know about that you can try to either look into or start using in your data science workflow I often get asked the question about how I set things up so this is the way I currently use it it might change down the road as new things come out I find myself changing and adapting to them so if there's anything specific that I've mentioned in this video that you want me to do a deeper dive into let me know in the comments below give me a like subscribe that's the best way you can support me and it's completely free so hopefully you will do that and I'll see you in the next video
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Rob Mulla · Rob Mulla · 57 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
▶
58
59
60
A Gentle Introduction to Pandas Data Analysis (on Kaggle)
Rob Mulla
Exploratory Data Analysis with Pandas Python
Rob Mulla
7 Python Data Visualization Libraries in 15 minutes
Rob Mulla
Kaggle competition starter notebook walkthrough
Rob Mulla
Kaggle Competitions: A Beginner's Guide to Winning
Rob Mulla
Jupyter Notebook Complete Beginner Guide - From Jupyter to Jupyterlab, Google Colab and Kaggle!
Rob Mulla
Audio Data Processing in Python
Rob Mulla
Complete Data Science Project!
Rob Mulla
Make Your Pandas Code Lightning Fast
Rob Mulla
Image Processing with OpenCV and Python
Rob Mulla
Speed Up Your Pandas Dataframes
Rob Mulla
This INCREDIBLE trick will speed up your data processes.
Rob Mulla
Complete Guide to Cross Validation
Rob Mulla
Easy Python Progress Bars with tqdm
Rob Mulla
Economic Data Analysis Project with Python Pandas - Data scraping, cleaning and exploration!
Rob Mulla
Python Sentiment Analysis Project with NLTK and 🤗 Transformers. Classify Amazon Reviews!!
Rob Mulla
Get Started with Machine Learning and AI in 2023
Rob Mulla
The Trick to Get Unlimited Datasets
Rob Mulla
Video Data Processing with Python and OpenCV
Rob Mulla
Object Detection in 10 minutes with YOLOv5 & Python!
Rob Mulla
Pandas for Data Science #shorts
Rob Mulla
Object Detection in 60 Seconds using Python and YOLOv5 #shorts
Rob Mulla
Machine Learning for Facial Recognition in Python in 60 Seconds #shorts
Rob Mulla
Time Series Forecasting with XGBoost - Use python and machine learning to predict energy consumption
Rob Mulla
Detect Text in Images with Python - pytesseract vs. easyocr vs keras_ocr
Rob Mulla
Solving an Impossible Riddle with Code
Rob Mulla
Do these Pandas Alternatives actually work?
Rob Mulla
Time Series Forecasting with XGBoost - Advanced Methods
Rob Mulla
Data Science Uncut - Data Shootout Kaggle Competition (Aug 1 2022 Stream)
Rob Mulla
Kaggle Dataset Creation from Scratch- Data Science Uncut (Aug 10 2022)
Rob Mulla
Chess Board Computer Vision AI - Data Science Uncut (Sep 7, 2022)
Rob Mulla
25 Nooby Pandas Coding Mistakes You Should NEVER make.
Rob Mulla
DEFCON Hacking AI CTF Solution on Kaggle - Data Science Uncut Sep 11, 2022
Rob Mulla
More Chessboard Computer Vision AI - Data Science Uncut - Sep 13
Rob Mulla
Medallion Data Science Live Stream
Rob Mulla
Community Kaggle Competition Overview - Corn Classification (
Rob Mulla
Deep Learning Image Classification - Corn Kernels - Data Science Uncut
Rob Mulla
OpenAI Whisper Demo: Convert Speech to Text in Python
Rob Mulla
Yolov7 Custom Object Detection in Python Tutorial - Chess Piece Detection
Rob Mulla
Live Kaggle Coding - Enzyme Stability Prediction - Data Science Uncut Sep, 27 2022
Rob Mulla
Finding Chess Cheaters with Python! - Data Science Uncut Livestream
Rob Mulla
Data Science Uncut - Kaggle Community Competition & Chess Data Analysis - Oct 4, 2022
Rob Mulla
Flight Delay Dataset Creation (Data Science Uncut)
Rob Mulla
5 Reasons to Kaggle #shorts
Rob Mulla
♟️ Data Science - Chess Data Analysis
Rob Mulla
EXTREME PYTHON & DATA SCIENCE LIVE STREAM
Rob Mulla
What is Clustering in ML?
Rob Mulla
What is K-Nearest Neighbors?
Rob Mulla
LIVE CODING: Flight Data Exploration with Pandas & Python
Rob Mulla
Kaggle Survey vs. Twitter Sentiment
Rob Mulla
If Top Chess.com Players were STOCKS - Live Coding Data Anaylsis Stream
Rob Mulla
Data Visualization BATTLE!
Rob Mulla
LIVE CODING: Stocks & Sentiment Analysis
Rob Mulla
Progress Bar in Python with TQDM
Rob Mulla
Flight Cancellation Data Analysis
Rob Mulla
Synthetic Dataset Creation for Machine Learning - Blender and Python
Rob Mulla
The Ultimate Coding Setup for Data Science
Rob Mulla
Dataset Creation SPEED RUN - Live Coding With Python & Pandas
Rob Mulla
Data Wrangling with Python and Pandas LIVE
Rob Mulla
Forecasting with the FB Prophet Model
Rob Mulla
More on: Tool Use & Function Calling
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
FREE AI Sin City Photo Generator — Turn Any Photo Into High-Contrast Noir Art (2026)
Dev.to AI
Google makes Gemini’s personalized image generation free for all US users
The Next Web AI
Gemini’s personalized AI image generation is now free for U.S. users
TechCrunch AI
WebP's Compression Secret: How a 1MB PNG Becomes a 200KB WebP
Dev.to · swift king
Chapters (5)
Intro
0:22
Operating System
3:53
Terminal Stuff
9:19
Virtual Environments
12:40
IDEs
🎓
Tutor Explanation
DeepCamp AI