The Ultimate Coding Setup for Data Science

Rob Mulla · Beginner ·🎨 Image & Video AI ·3y ago
In this video I go though how I setup my operating system to code for datascience. We talk about operating system, terminal, IDEs and more. Tmux settings repo: https://github.com/gpakosz/.tmux My tmux and vim settings: https://github.com/RobMulla/vim_settings Timeline: 00:00 Intro 00:22 Operating System 03:53 Terminal Stuff 09:19 Virtual Environments 12:40 IDEs Follow me on twitch for live coding streams: https://www.twitch.tv/medallionstallion_ My other videos: Speed Up Your Pandas Code: https://www.youtube.com/watch?v=SAFmrTnEHLg Speed up Pandas Code: https://www.youtube.com/watch?v=SAFmrTnEHLg Intro to Pandas video: https://www.youtube.com/watch?v=_Eb0utIRdkw Exploratory Data Analysis Video: https://www.youtube.com/watch?v=xi0vhXFPegw Working with Audio data in Python: https://www.youtube.com/watch?v=ZqpSb5p1xQo Efficient Pandas Dataframes: https://www.youtube.com/watch?v=u4_c2LDi4b8 * Youtube: https://youtube.com/@robmulla?sub_confirmation=1 * Discord: https://discord.gg/HZszek7DQc * Twitch: https://www.twitch.tv/medallionstallion_ * Twitter: https://twitter.com/Rob_Mulla * Kaggle: https://www.kaggle.com/robikscube #datascience #python #coding

What You'll Learn

The video demonstrates how to set up an operating system for data science coding, covering tools such as Ubuntu, tmux, and Vim, as well as environment management with Anaconda and conda, and code editing with Jupyter Lab and VS Code.

Full Transcript

what does the perfect data science setup look like I'm talking something that'll speed up your productivity let you code faster and more efficiently well there might not be a perfect setup but I'm going to walk you through today what I use on a daily basis for doing data science projects I'm going to break this down into a few different parts so let's go at it first by talking about operating system now the operating system I have set up on my machine and I use this about 90 percent of the time on my personal machines is Linux and the personal favorite flavor of Linux is Ubuntu the main reason why I use Ubuntu is because it's the most popular version of Linux out there because of that things are usually up to date and I don't have to worry about bugs being introduced into my operating system I would say my second favorite operating system is just the Mac OS I use max for work a lot of the time and it's Unix based so it's going to be very similar to Linux that being said I do have dual boot set up on my computer so I sometimes will load it up in Windows and the reason why is because it used to be that any gaming that I'd want to do windows was sort of the only way to do it in Linux the games wouldn't load now recently especially with games like Elden ring they've been working better in Linux than Windows so I actually haven't booted up Windows in years I also don't game is much as I used to the reason why I like running Linux for data science is because most of the work you're going to be doing is on a server in the cloud server or if you want to set up your home server Linux is the main operating system that everything runs on and if your main PC is running on it then it's just going to be super easy to transition over to what other stuff that you're going to work on also it boots up super fast it's very clean and there's none of that bloatware there I remember when Windows added to their start menu all the news feed and stuff that you don't really necessarily always want you don't have any of that junk another reason why I really like using Linux is you can do so much just in the command line when you install packages you can just sudo app update and it'll go look for the updates to any packages you have on your machine Any drivers and then you can sudo apt upgrade to upgrade them and you're working with code and data science the fact that you can just re reproduce things in your operating system by running lines of code that you might find in stack Overflow or in other Solutions is so much more powerful than clicking and moving things around and also sort of on that note a big thing about having Linux backend is that you're running bash let's say I wanted to go to a directory and move files around there's no reason to open up an explorer to select things and to actually drag them into folders if you get good at it you can move files around using command line like MV for move or CP for copying files it sounds like a little thing but being comfortable using bash and being able to move files around like that is such a Time Saver and something you should definitely look into mastering maybe I'll make another video on that another reason why I love using Linux is because if my laptop or a second desktop machine that I have I can easily SSH into that that SSH Gene lets you connect to your other computer and basically interact with it in the command line as if you were on that other machine you can also forward ports so you could say run Jupiter lab on your desktop computer and SSH to in it from a laptop and voila you're running python on your other computer but you're being able to interact with it through a web browser on your laptop the next thing I'd like to talk about is how I interact with the terminal [Music] so I am using mate terminal I also have a solarized dark theme that's just my preference here so you can go into profiles and set up in your terminal what the colors look like but the main and most powerful thing that I use when I am interacting with the terminal is something called tmux so I actually have a Alias called Rob mux and when I run that it'll start a new session of tmux now tmux is just a something that's running on your machine that keeps the state of your terminal so any of the windows or the processes that you have running if you're not in tmux and you're running let's say a python script and you X Out it will shut down that session or whatever you're running however if you're in tmux and you're running something like list everything in this directory and I close out of this this the next time that I go into tmux I'm right back where I left off so that's one of the main reasons why running tmux is great you can also see that tmux has some stuff down here on the bottom how long my machine has been up for what I am actually running here is Bash the clock what I've named this machine so if I've sshed into a different machine that will change and I have all of this set up in my tmux config file so this is where you can configure tmux to look really fancy and beautiful I have a few custom things set up but most of my tmux settings I've gotten from other people out there who have really perfected this art so I actually copied my base configuration file from this GitHub repo which I'll also Link in the description it's what I use to get this pretty bar at the bottom and I even have in my own GitHub repo my exact tmux settings so that if I go on to another server and I want to run tmux exactly like I'm used to on my home PC I can clone that repo copy in the tmux configuration file now some other things that make tmux awesome is that with some simple command line shortcuts you can actually split panes and be running multiple different terminal sessions at the same time the reason why this is great is because you can let's say run a python script in the top right here and then on the left side you can tail maybe a log file if there's a log file in the exact same windows and like I said if this closes down you load it back up you're in the exact same spot you can also add tabs code so control B and T will make a second tab down here you see how there are two numbers switch between them I use shift in the arrow keys everything in tmux has a ton of keyboard shortcuts that you just need to learn and even if you're say running on a Windows machine chances are and data science are going to be sshing into a server where you're going to want to run tmux especially then because if you lose connection to that machine because of a bad internet connection you want tmux to keep that session up did I mention I like tmux I think I've mentioned that enough now a few other things about my terminal I'll just mention is when you're running bash you can set up aliases that can be really helpful so I have a lot of aliases that I use one of them you'll notice that I use a lot which is LC and that's just listing all the files it's actually ls-lash color but I don't want to run that every time so the Alias just lets me set up a short command that I can run and then it'll run the long command Alias is really good make sure you set those up now two other quick things that I use all the time when I'm in my terminal one is H Top This basically lets you see all the processes running on your computer so this is like a system profiler of everything running on your machine so all my cores amount of memory that's being used as a data scientist you might be pushing the limits of your machine so that you might have too much data for memory and you need to keep an eye on the memory H top is great for that there's also b-top which has a little bit of a different look and feel to it I haven't really bought into it but I know some people swear by it and of course for all you old school people out there you can just run the straight old school top and avoid any of the colors and stuff but h-top's my fave another thing similar to h-top is NV top Envy top lets you see the processes being run on your GPU so if you have a machine that has a GPU and you're going to be doing stuff like deep learning this is great because you can see how much memory and how much CPU is being used on each of your devices or on your GPU and I usually have a tab open with NV top and a tab open with h top in my tmux just so I can jump to this if I ever need to all right next let's talk about package managers I know there are a lot of different options out there for keeping different environments in python or whatever project you're working on separate because maybe you want a version of pandas that's older for one project but you're gonna use a newer version for a different project I personally use anaconda and conda to manage my environments and I use pip to install my python packages I don't use conda to install python packages anymore it has that ability but I mainly just use it to containerize my environments usually when people are teaching how to get started with python on your computer they recommend installing Anaconda all Anaconda is is a pre-packaged version of conda with all of the main packages used for data science so I too would recommend using Anaconda as your main thing that you install if you're trying to run Python and data signs on your computer mini kinda is just more of a lightweight version of Anaconda where you can just get the essential packages needed to run some python stuff on your local machine so you can tell I have content installed on this machine because all the way to the left of my terminal here it says base that means I'm in my base content environment and I can just run conda with the conda command and EMV list will list all my environments that I have so over the years I've kind of gotten a few of these different kind of environments for different projects that I've been working on and you can create these using conda create you could tell it a version of python that you want it's just great because it'll keep everything isolated so if I wanted to say activate kaggle 2 then if I type in which python we can see that my main python now is directed to my kaggle 2 which is the second kaggle environment that I made on my machine because the the first one broke so there's really great documentation about how to manage your environments in conda and activating them adding new ones deleting them I've spent a lot of time on this page learning how to set up my environments and I'd recommend you do too now in my environment I can do condo list this will list all the packages that I have installed condo will let you install Connor will let you install more than just python packages pip is what I use to install python packages once I'm in my environment usually I've found pip to be less buggy than conda and it does a really good job of managing dependencies and all the stuff that goes along with of course you could go the docker route but I've managed to avoid that for a lot of reasons except for productionized code that you have to deploy and mainly that's in my work environment and of course installing things with Pip is pretty easy in the command line you just do pip install let's say panda is says I already have it installed I could do something like upgrade and it's going to download the latest version and uninstall my old version take care of all the dependencies you must learn how to use pip if you're working in Python now let's talk about Ides IDE is basically the software that you use to write your code now I'm not very strict about only using one IDE for one thing I actually like to use many different Ides for different things the main one that I would definitely recommend getting at least useful with opening up a file and being able to do some simple edits with because it's the least intuitive is vim or VI so if you have Vim installed in Linux you can just load it up by typing in Vim it'll bring you into this editor I use Vim a lot when I'm logging into a remote machine and I want to edit a file do some quick changes or I'm working on code that's already established and I'm just doing debugging learning the key bindings in Vim can make your life a lot easier and really just things like being able to jump around do quick edits and if I was a real hardcore coder making production code all day I might even say use Vim as your main IDE because it really is that powerful I also really like using Sublime Text I mainly use Sublime Text for code snippet or making to-do lists honestly and I can load up Sublime Text here I find myself using Sublime Text to write a lot of SQL queries which then I'll execute in other code so there is an extension that I use a lot called SQL beautifier that will just take nasty looking SQL code and you can just run it here and it makes a lot easier to read but I'm not gonna lie most of the code that I write as a data scientist is in Jupiter lab and I have a whole video talking about Jupiter Jupiter notebooks Jupiter lab but I'm just going to show you basically why I use Jupiter lab it's because I feel like the environment's really easy to navigate code looks nice and most of the time the code that I'm writing isn't production code it's more exploratory I'm looking at data for the first time I don't know exactly what I'm going to write and jumping from cell to cell is really important and being able to be efficient that's why I'd suggest checking out my tips and tricks section of my Jupiter Lab video if you want to learn more about that there comes a time though when running a Jupiter notebook is just not the right place to be writing code and when you want to abstract that code out and work on it usually the editor that I use is vs code so I'll show you just an example here here's my vs code and I've written a few functions let's say I still want to use these in a notebook to interact with some data that I've written the nice thing is I usually keep these in a separate folder a source code folder right next to my notebooks and then over here in my notebook you can see that I'm actually importing from this scripts file into my notebook so a lot of the code that's abstracted out I will then edit in vs code just because it's a great IDE in general it's lightweight and it is the most popular one out there now I know some people might say you can run notebooks in vs code and I know that's the case but I've tried it and I don't like it as much as the Jupiter lab environment I find things to be smoother and especially the keyboard commands that I use so frequently I haven't been able to switch over to vs code and have the same feel as I do working with Jupiter lab but that's just me I like using git in the command line it just I find find that's the easiest way to track and commit all my changes I also like to write a lot of code in kaggle notebooks that's because mainly because I can share it with you all and you can easily bin up your own environment by copying these notebooks it's also kind of like a public portfolio where you can share all of your code that you've written and then most of the things that I've mentioned when I'm working on a remote machine I can run all these things vs code Jupiter lab I have conda installed have it all set up up the way I want and then I'm just forwarding everything over through SSH to my local machine you don't necessarily need to have a beefy machine to do data science because you can always connect remotely to larger machines and spin them up as you need there you go I hope you found this video helpful maybe showed you a few things that you didn't know about that you can try to either look into or start using in your data science workflow I often get asked the question about how I set things up so this is the way I currently use it it might change down the road as new things come out I find myself changing and adapting to them so if there's anything specific that I've mentioned in this video that you want me to do a deeper dive into let me know in the comments below give me a like subscribe that's the best way you can support me and it's completely free so hopefully you will do that and I'll see you in the next video
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Rob Mulla · Rob Mulla · 57 of 60

1 A Gentle Introduction to Pandas Data Analysis (on Kaggle)
A Gentle Introduction to Pandas Data Analysis (on Kaggle)
Rob Mulla
2 Exploratory Data Analysis with Pandas Python
Exploratory Data Analysis with Pandas Python
Rob Mulla
3 7 Python Data Visualization Libraries in 15 minutes
7 Python Data Visualization Libraries in 15 minutes
Rob Mulla
4 Kaggle competition starter notebook walkthrough
Kaggle competition starter notebook walkthrough
Rob Mulla
5 Kaggle Competitions: A Beginner's Guide to Winning
Kaggle Competitions: A Beginner's Guide to Winning
Rob Mulla
6 Jupyter Notebook Complete Beginner Guide - From Jupyter to Jupyterlab, Google Colab and Kaggle!
Jupyter Notebook Complete Beginner Guide - From Jupyter to Jupyterlab, Google Colab and Kaggle!
Rob Mulla
7 Audio Data Processing in Python
Audio Data Processing in Python
Rob Mulla
8 Complete Data Science Project!
Complete Data Science Project!
Rob Mulla
9 Make Your Pandas Code Lightning Fast
Make Your Pandas Code Lightning Fast
Rob Mulla
10 Image Processing with OpenCV and Python
Image Processing with OpenCV and Python
Rob Mulla
11 Speed Up Your Pandas Dataframes
Speed Up Your Pandas Dataframes
Rob Mulla
12 This INCREDIBLE trick will speed up your data processes.
This INCREDIBLE trick will speed up your data processes.
Rob Mulla
13 Complete Guide to Cross Validation
Complete Guide to Cross Validation
Rob Mulla
14 Easy Python Progress Bars with tqdm
Easy Python Progress Bars with tqdm
Rob Mulla
15 Economic Data Analysis Project with Python Pandas - Data scraping, cleaning and exploration!
Economic Data Analysis Project with Python Pandas - Data scraping, cleaning and exploration!
Rob Mulla
16 Python Sentiment Analysis Project with NLTK and 🤗 Transformers. Classify Amazon Reviews!!
Python Sentiment Analysis Project with NLTK and 🤗 Transformers. Classify Amazon Reviews!!
Rob Mulla
17 Get Started with Machine Learning and AI in 2023
Get Started with Machine Learning and AI in 2023
Rob Mulla
18 The Trick to Get Unlimited Datasets
The Trick to Get Unlimited Datasets
Rob Mulla
19 Video Data Processing with Python and OpenCV
Video Data Processing with Python and OpenCV
Rob Mulla
20 Object Detection in 10 minutes with YOLOv5 & Python!
Object Detection in 10 minutes with YOLOv5 & Python!
Rob Mulla
21 Pandas for Data Science #shorts
Pandas for Data Science #shorts
Rob Mulla
22 Object Detection in 60 Seconds using Python and YOLOv5 #shorts
Object Detection in 60 Seconds using Python and YOLOv5 #shorts
Rob Mulla
23 Machine Learning for Facial Recognition in Python in 60 Seconds #shorts
Machine Learning for Facial Recognition in Python in 60 Seconds #shorts
Rob Mulla
24 Time Series Forecasting with XGBoost - Use python and machine learning to predict energy consumption
Time Series Forecasting with XGBoost - Use python and machine learning to predict energy consumption
Rob Mulla
25 Detect Text in Images with Python - pytesseract vs. easyocr vs keras_ocr
Detect Text in Images with Python - pytesseract vs. easyocr vs keras_ocr
Rob Mulla
26 Solving an Impossible Riddle with Code
Solving an Impossible Riddle with Code
Rob Mulla
27 Do these Pandas Alternatives actually work?
Do these Pandas Alternatives actually work?
Rob Mulla
28 Time Series Forecasting with XGBoost - Advanced Methods
Time Series Forecasting with XGBoost - Advanced Methods
Rob Mulla
29 Data Science Uncut - Data Shootout Kaggle Competition (Aug 1 2022 Stream)
Data Science Uncut - Data Shootout Kaggle Competition (Aug 1 2022 Stream)
Rob Mulla
30 Kaggle Dataset Creation from Scratch- Data Science Uncut (Aug 10 2022)
Kaggle Dataset Creation from Scratch- Data Science Uncut (Aug 10 2022)
Rob Mulla
31 Chess Board Computer Vision AI - Data Science Uncut (Sep 7, 2022)
Chess Board Computer Vision AI - Data Science Uncut (Sep 7, 2022)
Rob Mulla
32 25 Nooby Pandas Coding Mistakes You Should NEVER make.
25 Nooby Pandas Coding Mistakes You Should NEVER make.
Rob Mulla
33 DEFCON Hacking AI CTF Solution on Kaggle - Data Science Uncut Sep 11, 2022
DEFCON Hacking AI CTF Solution on Kaggle - Data Science Uncut Sep 11, 2022
Rob Mulla
34 More Chessboard Computer Vision AI - Data Science Uncut - Sep 13
More Chessboard Computer Vision AI - Data Science Uncut - Sep 13
Rob Mulla
35 Medallion Data Science Live Stream
Medallion Data Science Live Stream
Rob Mulla
36 Community Kaggle Competition Overview - Corn Classification (
Community Kaggle Competition Overview - Corn Classification (
Rob Mulla
37 Deep Learning Image Classification - Corn Kernels - Data Science Uncut
Deep Learning Image Classification - Corn Kernels - Data Science Uncut
Rob Mulla
38 OpenAI Whisper Demo: Convert Speech to Text in Python
OpenAI Whisper Demo: Convert Speech to Text in Python
Rob Mulla
39 Yolov7 Custom Object Detection in Python Tutorial  - Chess Piece Detection
Yolov7 Custom Object Detection in Python Tutorial - Chess Piece Detection
Rob Mulla
40 Live Kaggle Coding - Enzyme Stability Prediction - Data Science Uncut Sep, 27 2022
Live Kaggle Coding - Enzyme Stability Prediction - Data Science Uncut Sep, 27 2022
Rob Mulla
41 Finding Chess Cheaters with Python! - Data Science Uncut Livestream
Finding Chess Cheaters with Python! - Data Science Uncut Livestream
Rob Mulla
42 Data Science Uncut - Kaggle Community Competition & Chess Data Analysis - Oct 4, 2022
Data Science Uncut - Kaggle Community Competition & Chess Data Analysis - Oct 4, 2022
Rob Mulla
43 Flight Delay Dataset Creation (Data Science Uncut)
Flight Delay Dataset Creation (Data Science Uncut)
Rob Mulla
44 5 Reasons to Kaggle #shorts
5 Reasons to Kaggle #shorts
Rob Mulla
45 ♟️ Data Science - Chess Data Analysis
♟️ Data Science - Chess Data Analysis
Rob Mulla
46 EXTREME PYTHON & DATA SCIENCE LIVE STREAM
EXTREME PYTHON & DATA SCIENCE LIVE STREAM
Rob Mulla
47 What is Clustering in ML?
What is Clustering in ML?
Rob Mulla
48 What is K-Nearest Neighbors?
What is K-Nearest Neighbors?
Rob Mulla
49 LIVE CODING: Flight Data Exploration with Pandas & Python
LIVE CODING: Flight Data Exploration with Pandas & Python
Rob Mulla
50 Kaggle Survey vs. Twitter Sentiment
Kaggle Survey vs. Twitter Sentiment
Rob Mulla
51 If Top Chess.com Players were STOCKS - Live Coding Data Anaylsis Stream
If Top Chess.com Players were STOCKS - Live Coding Data Anaylsis Stream
Rob Mulla
52 Data Visualization BATTLE!
Data Visualization BATTLE!
Rob Mulla
53 LIVE CODING: Stocks & Sentiment Analysis
LIVE CODING: Stocks & Sentiment Analysis
Rob Mulla
54 Progress Bar in Python with TQDM
Progress Bar in Python with TQDM
Rob Mulla
55 Flight Cancellation Data Analysis
Flight Cancellation Data Analysis
Rob Mulla
56 Synthetic Dataset Creation for Machine Learning - Blender and Python
Synthetic Dataset Creation for Machine Learning - Blender and Python
Rob Mulla
The Ultimate Coding Setup for Data Science
The Ultimate Coding Setup for Data Science
Rob Mulla
58 Dataset Creation SPEED RUN - Live Coding With Python & Pandas
Dataset Creation SPEED RUN - Live Coding With Python & Pandas
Rob Mulla
59 Data Wrangling with Python and Pandas LIVE
Data Wrangling with Python and Pandas LIVE
Rob Mulla
60 Forecasting with the FB Prophet Model
Forecasting with the FB Prophet Model
Rob Mulla

This video teaches how to set up a data science coding environment, covering operating system selection, terminal session management, environment management, and code editing. It provides a comprehensive overview of the tools and techniques needed to get started with data science coding.

Key Takeaways
  1. Install Ubuntu as the primary operating system
  2. Configure tmux for terminal session management
  3. Use Anaconda and conda for environment management
  4. Install pip for package management
  5. Familiarize yourself with Vim for code editing
  6. Use Jupyter Lab for exploratory data science tasks
  7. Edit abstracted code in VS Code
💡 Using a Linux-based operating system and managing environments with Anaconda and conda can streamline data science coding workflows.

Related AI Lessons

Google makes Gemini’s personalized image generation free for all US users
Google's Gemini personalized image generation is now free for all US users, allowing them to generate images informed by their Google data
The Next Web AI
Gemini’s personalized AI image generation is now free for U.S. users
Gemini's AI image generation is now free for U.S. users, allowing for personalized images based on user interests and data
TechCrunch AI
WebP's Compression Secret: How a 1MB PNG Becomes a 200KB WebP
Learn how WebP compresses images more efficiently than PNG and JPEG, and why it matters for web development
Dev.to · swift king
Beyond TinyPNG: Fast, Private, and Zero-Server Image Conversion
Learn how to achieve fast, private, and zero-server image conversion beyond TinyPNG, and why it matters for developers and designers
Dev.to · Yao Xiao

Chapters (5)

Intro
0:22 Operating System
3:53 Terminal Stuff
9:19 Virtual Environments
12:40 IDEs
Up next
OpenAI Kills Sora then Descends into Chaos
ColdFusion
Watch →