Tips To Build A Good Data Science / Machine Learning Project (For Your Portfolio)

Abhishek Thakur · Intermediate ·📐 ML Fundamentals ·5y ago

Key Takeaways

The video provides tips on building a good data science and machine learning project for a portfolio, covering aspects such as project selection, code quality, execution, and presentation. Tools like flake8, pylint, black, and Jupyter Notebook are recommended for writing better code, while Streamlit, Dash, Flask, and Gunicorn are suggested for creating web applications.

Full Transcript

hello everyone and welcome to my new video this video is very different from what i usually do i usually code and uh talk about machine learning projects but here in this video i'm going to give you some tips on how to build a better machine learning or data science project for your portfolio and if you haven't subscribed to my channel yet please do it now in the last talks which is a series on my uh youtube channel we had vladimir klovikov and he talked about a project that you maybe you have seen demos of little videos of it uh floating around or on linkedin or twitter by multiple persons so the project was about detecting masked faces in the pandemic world and this is this is like if you have seen the videos you would think like this is a simple project but it's not and he shared it in such a way which was like super interesting and amazing and i haven't i haven't seen it being shared in that way so in case you missed it take a look at the video he also shared a bunch of tips on how to get started with your data science building your data science portfolio so inspired by that video and adding my own goodness on top of it i'm sharing a bunch of tips that i think will be useful for you to build a better good looking uh data science portfolio or machine learning building machine learning projects the first thing is about choosing the right project so choose something that excites you choose something that encourages you to work hard do something that you think would be helpful for others and choose something that you think you will be able to do in a much better manner than others so it doesn't have to be an entirely new project or something very innovative it can be something very simple so a lot of people when they start with machine learning or data science projects they start with the titanic data set so think about that in data is there something that you can do there to make it uh useful so it should not be like using random forest to achieve 1.0 accuracy by overfitting the test data no not like that is there something you can build on top of a titanic dataset that you think the community will spend time on if you think you can then go ahead in the end it's all about execution so it's not about how difficult or how easy your project is it's about how you have executed it and that's very important the next thing that we have is experiments so experiment experiment and experiment yes you need to perform a lot of different kinds of experiments would you like to publish something which has an accuracy of 75 percent or would you like to publish the same thing with an accuracy of 80 or 85 percent so if i'm looking at your code and you you've done something and gives you accuracy of 85 75 percent and someone someone else has uh done something similar with a much higher accuracy who am i going to choose if i if i start building something on top of it so what is that you have to figure out what is the cost associated with taking the accuracy from 75 percent to 85 percent so that can only be done by paying a lot of experiments keeping track of these experiments and not rushing so do not rush so make your experiments is going to take some time but that's how it is and um try to understand the data and the problem itself and think what you can do so that it doesn't fail in many situations so that it's kind of foolproof it won't be but you have to think like that and uh you have to present it in a way better than others and that all can only be done by doing hundreds of experiments so now you have you have been performing a lot of experiments and while performing experiments is also very necessary to write good code since you would like to like the world to use your data science or machine learning project it's very important that the code should be readable and well documented right so this will enable others to fork your project and build something on top of it but if you're if you're like very lazy and you're writing very bad code i don't think anybody is going to try to uh take up something from there until unless i i it's very something very revolutionary right so you should follow a coding standard you can use tools like flicate uh pylint to help you uh with uh um writing better code and you can also use a code formatter like black i've introduced black in one of my tips and tricks videos so you can take a look it's super amazing and i love it so you need to remember that your project needs to go that extra mile because it has to be used by others so you have to you have to go that extra mile use modules to make it look good and use jupyter notebook only for showing the demos so if you're using jupyter notebooks they must import from your project and not like i i don't want i don't want to open a jupyter notebook which has like thousands of cells and um hundreds of thousands lines of code right so if you do that you will be building a very good data science project and most public notebooks that i see on maybe on kaggle or google collab or wherever on github so these uh make you fall asleep it's because there's just a lot of things in these notebooks which are not required they can be well imported from other from from your own project like if you have done it in a modular way so you just import it from there and then use it in a notebook it's so super simple and easy and also good on eyes and it doesn't make you fall asleep so now you have done um a good and understandable code and you have chosen a project you have good and understandable code now now it's time to share the project with others so share your code with the good documentation so when you think your code base is in good state you can share it now you can you can publish a code on github for example but before you publish on your code on github make a good readme and if you don't know what readme is go take a look right now readme is something that whenever someone opens your github repository the first thing that they see is the readme and your readme should consist of things like what this project is about how to run it how to run the project how how is it different from other similar projects and what what makes it makes your project more interesting so all these questions can be answered by readme and they can be answered in uh a direct manner they can be answered indirectly so people are not going to look at your code and engage if they don't know what the project is about so they have to know what the project is about another thing that you should consider while creating github repository is a license so you you do you want your course code to be used by others then think about about the license because most of the time people are not going to use your code if they if it doesn't have a license associated with it and the license is very really very important and license of the project like you have different licenses like apache mit license bsd you have gpl license and licenses depend on uh what kind of libraries you have used for your project so it depends on the license of those libraries so take some time and see what kind of license suits your project and uh you must add a license to your product when you add license you enable others to use it otherwise maybe if your project is interesting maybe people will contact you to add license to your project but if you don't add a license but if you add license then people won't won't contact you and they will use it directly so once you're done with all this the next step is sharing uh next step is packaging the project so now you have shared your code and it has a good documentation next step is packaging next shipping and then shipping um in python world is packaging project and making it available on pip so package your project so package your project and publish it on pu ipi so so that if somebody has to use your project they can just whip install it so many projects cannot be packaged and that's perfectly fine if you think that your project cannot be packaged and it cannot be published on paper that's okay just learn it and uh don't just do uh this publishing thing packaging thing just for the sake of it because if it cannot be packaged don't do it um learning how to pack package your project is one thing but over doing it is another and when you package your projector you must always remember that you you should have good automation so if i pip install your project and if i want to run it i i don't want to run 10 more steps after pip install right if i have to do that i would not i would not uh use your package maybe i will use something else so you need to take care of automation too and one example i will give you like if your project is uh has like if you have created a deep learning order to detect faces let's say and now now someone is using your package and uh they call a function to detect faces and it downloads some weight so you you can add this like you don't person doesn't have to download the weights manually it can be downloaded on on its own when the function is called for the first time and it can be cached so that it can be used over and over again and it doesn't have to download every time the function is called so that's what makes your python package more interesting and useful now you have come very far now it's time to make a good looking web application you have worked quite a lot you've worked hard on your code and not just that you have also like packaged it so now it's time that you uh use the package that you have created and create a web application from it so it must must come from your python project it should not be like another application so most of the times the recruiters or managers they they don't want to look at the code so they they would probably read about your project or if you have shared it somewhere they will read about it and then they want to see it in action and seeing something in action is much better than just reading the code and trying to use it so i would like to see some project in action first then i would try to see okay now maybe maybe i can use it for uh what i'm doing so you can choose any kind of framework you want and you can try to make it super simple or super complicated web web applications so you can use streamlet dash flask ginger to whatever really and as a data scientist you are not a front-end developer but you can you have to try to go that extra mile to learn some something new and if you are learning some css or html while doing its luxury on top isn't it you can also use it at your work so if you if at work you're building something some algorithms maybe maybe you can try to show it using a web application and it's much more interesting so this approach is also going to help you when you're presenting your work in a company most managers won't care about uh complicated mathematics or your algorithms right they would like to see uh your stuff in action and i'm pretty sure like when when people see it they would go faster about like deploying it in production so that's very useful in work too now let's say you have built a model or project which is used for uh detecting image of uh detecting like faces of people in uh image right so where is the demo make a short video i have to make a short video you maybe make a short video from your web application you can make a short video uh demonstrating like it must be connected to your web application and your project right so demonstrate it and i remember that if i have to reproduce it i must be able to reproduce it just by looking at your github repository so i look at your readme and i must be able to reproduce that demo of yours if i can do that that's super awesome right so many times a project that cannot be uh demoed in a video right there are many projects which you cannot demo them in in a video and don't don't worry about it but if you cannot make a short video uh demoing your web application or your project so you can you can always make a short video explaining the different parts of your project and how it's useful so demos always help but don't try to do the same demos like others so think outside the box be innovative and uh as i've already said the project is not about if the project has been done before or not it's all about execution you just need to do it better than others and when you have everything in place uh write a blog post so write an article describing what you have done so these days people write a lot of blog posts and articles about machine learning and data science but most of the articles are not very good um and it shows a lot of desperation 90 percent of the content in article is not even related to the title so let's say you have you have done a project about semantic similarity using birth i don't want to read the history and geography of bird what word is albert works because if i don't know that i won't come to your article i already know the background of word now i want to see it in action and see what you have done you have done about something about semantic similarity i want to see that and try to talk about only what you have done by giving a small little introduction that's it but don't don't describe your uh project using stories and that's just a big no no at least from my side and uh keep these things in mind when you're writing your next blog post for your uh project or like in general um that you're it's not like only 10 of your content should be relevant to the title or should do justice with the title i think all of your content must do justice to the title 1995 of it and if you do it that that way your blog post is also going to be quite famous and for writing blog posts you can use any kind of platform you want so don't care about platform much you can use github io free pages as long if your content is good traffic will come that's how it works and in the end when you have everything in place you can share your project by uh like on linkedin or on twitter so instead of tagging millions of people people what you can do is you can use hashtags right and you write the post in a very interesting manner and there will be engagement uh post your demos there will be engagement so you don't need to tag thousands of people there and you can also write a message to experts experts that you know maybe on twitter on linkedin and you can ask to get their opinion on what you have done so don't ask for liking commenting and retweeting they they are going to do it anyways if they find your project to be interesting they are going to do it you don't have to ask that and sharing in proper way is very useful in the long run and that's all my friends that's all you need to do it might look very difficult in the in the beginning so in the beginning you will feel like there are so many things that you have to do for one single project yes you have to do all these things if you want to present it in a good way and after after sec after first time i was like second time third time this thing things will become much easier you will know how to present each and every project of yours in your portfolio maybe you don't even need like 10 different projects maybe you just need a couple of projects but done in a good way and presented in a good manner that's all you need and with with the next project you will be able to take off the check boxes very fast and you will also improve on your workflow so go ahead and give this a try and let me know if it works for you oh there's just one more thing i almost forgot and like and subscribe and share my video if you like it then see you next time goodbye

Original Description

In this special video, I share my views on what's required for your data science project to make it good. These are a few tips that will help you build a good #DataScience / #MachineLearning projects and will be helpful in building a good #Portfolio Please subscribe and like the video to help me keep motivated to make awesome videos like this one. :) To buy my book, Approaching (Almost) Any Machine Learning problem, please visit: https://bit.ly/buyaaml Follow me on: Twitter: https://twitter.com/abhi1thakur LinkedIn: https://www.linkedin.com/in/abhi1thakur/ Kaggle: https://kaggle.com/abhishek
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Abhishek Thakur · Abhishek Thakur · 50 of 60

1 Episode 1.1: Intro and building a machine learning framework
Episode 1.1: Intro and building a machine learning framework
Abhishek Thakur
2 Episode 1.2: Building an inference for the machine learning framework
Episode 1.2: Building an inference for the machine learning framework
Abhishek Thakur
3 Episode 2: A Cross Validation Framework
Episode 2: A Cross Validation Framework
Abhishek Thakur
4 Tips N Tricks #2: Setting up development environment for machine learning
Tips N Tricks #2: Setting up development environment for machine learning
Abhishek Thakur
5 Episode 3: Handling Categorical Features in Machine Learning Problems
Episode 3: Handling Categorical Features in Machine Learning Problems
Abhishek Thakur
6 BERT on Steroids: Fine-tuning BERT for a dataset using PyTorch and Google Cloud TPUs
BERT on Steroids: Fine-tuning BERT for a dataset using PyTorch and Google Cloud TPUs
Abhishek Thakur
7 Special Announcement: Approaching (almost) any machine learning problem
Special Announcement: Approaching (almost) any machine learning problem
Abhishek Thakur
8 Training BERT Language Model From Scratch On TPUs
Training BERT Language Model From Scratch On TPUs
Abhishek Thakur
9 Bengali.AI: Handwritten Grapheme Classification Using PyTorch (Part-1)
Bengali.AI: Handwritten Grapheme Classification Using PyTorch (Part-1)
Abhishek Thakur
10 Bengali.AI: Handwritten Grapheme Classification Using PyTorch (Part-2)
Bengali.AI: Handwritten Grapheme Classification Using PyTorch (Part-2)
Abhishek Thakur
11 Episode 4: Simple and Basic Binary Classification Metrics
Episode 4: Simple and Basic Binary Classification Metrics
Abhishek Thakur
12 Training Sentiment Model Using BERT and Serving it with Flask API
Training Sentiment Model Using BERT and Serving it with Flask API
Abhishek Thakur
13 Episode 5: Entity Embeddings for Categorical Variables
Episode 5: Entity Embeddings for Categorical Variables
Abhishek Thakur
14 Tips N Tricks #5: 3 Simple and Easy Ways to Cache Functions in Python
Tips N Tricks #5: 3 Simple and Easy Ways to Cache Functions in Python
Abhishek Thakur
15 Multi-Lingual Toxic Comment Classification using BERT and TPUs with PyTorch
Multi-Lingual Toxic Comment Classification using BERT and TPUs with PyTorch
Abhishek Thakur
16 Text Extraction From a Corpus Using BERT (AKA Question Answering)
Text Extraction From a Corpus Using BERT (AKA Question Answering)
Abhishek Thakur
17 10K Subscribers: Approaching (almost) Any Machine Learning Problem and Talk Show
10K Subscribers: Approaching (almost) Any Machine Learning Problem and Talk Show
Abhishek Thakur
18 Data Processing For Question & Answering Systems: BERT vs. RoBERTa
Data Processing For Question & Answering Systems: BERT vs. RoBERTa
Abhishek Thakur
19 Tips N Tricks #6: How to train multiple deep neural networks on TPUs simultaneously
Tips N Tricks #6: How to train multiple deep neural networks on TPUs simultaneously
Abhishek Thakur
20 Sentencepiece Tokenizer With Offsets For T5, ALBERT, XLM-RoBERTa And Many More
Sentencepiece Tokenizer With Offsets For T5, ALBERT, XLM-RoBERTa And Many More
Abhishek Thakur
21 Talks # 1:Andrey Lukyanenko - Handwritten digit recognition w/ a twist &  topic modelling over time
Talks # 1:Andrey Lukyanenko - Handwritten digit recognition w/ a twist & topic modelling over time
Abhishek Thakur
22 Episode 6: Simple and Basic Evaluation Metrics For Regression
Episode 6: Simple and Basic Evaluation Metrics For Regression
Abhishek Thakur
23 Talks # 2: Subhaditya Mukherjee - Image restoration using Deep Learning: Dehazing
Talks # 2: Subhaditya Mukherjee - Image restoration using Deep Learning: Dehazing
Abhishek Thakur
24 Basic git commands everyone should know about
Basic git commands everyone should know about
Abhishek Thakur
25 How do I start my career in Data Science?
How do I start my career in Data Science?
Abhishek Thakur
26 Talks # 3: Lorenzo Ampil - Introduction to T5 for Sentiment Span Extraction
Talks # 3: Lorenzo Ampil - Introduction to T5 for Sentiment Span Extraction
Abhishek Thakur
27 Detecting Skin Cancer (Melanoma) With Deep Learning
Detecting Skin Cancer (Melanoma) With Deep Learning
Abhishek Thakur
28 Talks # 4: Sebastien Fischman - Pytorch-TabNet: Beating XGBoost on Tabular Data Using Deep Learning
Talks # 4: Sebastien Fischman - Pytorch-TabNet: Beating XGBoost on Tabular Data Using Deep Learning
Abhishek Thakur
29 Build a web-app to serve a deep learning model for skin cancer detection
Build a web-app to serve a deep learning model for skin cancer detection
Abhishek Thakur
30 Talks # 5: Parul Pandey: Data Science, Diversity and Kaggle
Talks # 5: Parul Pandey: Data Science, Diversity and Kaggle
Abhishek Thakur
31 Implementing original U-Net from scratch using PyTorch
Implementing original U-Net from scratch using PyTorch
Abhishek Thakur
32 Tips N Tricks # 8: Using automatic mixed precision training with PyTorch 1.6
Tips N Tricks # 8: Using automatic mixed precision training with PyTorch 1.6
Abhishek Thakur
33 Talks # 6: Mani Sarkar: From backend development to machine learning
Talks # 6: Mani Sarkar: From backend development to machine learning
Abhishek Thakur
34 Dockerizing the skin cancer detection web application
Dockerizing the skin cancer detection web application
Abhishek Thakur
35 How to train a deep learning model using docker?
How to train a deep learning model using docker?
Abhishek Thakur
36 Building an entity extraction model using BERT
Building an entity extraction model using BERT
Abhishek Thakur
37 Train custom object detection model with YOLO V5
Train custom object detection model with YOLO V5
Abhishek Thakur
38 Talks # 7: Moez Ali: Machine learning with PyCaret
Talks # 7: Moez Ali: Machine learning with PyCaret
Abhishek Thakur
39 How to convert almost any PyTorch model to ONNX and serve it using flask
How to convert almost any PyTorch model to ONNX and serve it using flask
Abhishek Thakur
40 Hyperparameter Optimization: This Tutorial Is All You Need
Hyperparameter Optimization: This Tutorial Is All You Need
Abhishek Thakur
41 I finally got a copy of "Approaching (Almost) Any Machine Learning Problem"
I finally got a copy of "Approaching (Almost) Any Machine Learning Problem"
Abhishek Thakur
42 Captcha recognition using PyTorch (Convolutional-RNN + CTC Loss)
Captcha recognition using PyTorch (Convolutional-RNN + CTC Loss)
Abhishek Thakur
43 Live Q&A: Getting Started With Data Science
Live Q&A: Getting Started With Data Science
Abhishek Thakur
44 WTFML: Simple, reusable code for PyTorch models
WTFML: Simple, reusable code for PyTorch models
Abhishek Thakur
45 Talks # 8: Sebastián Ramírez; Build a machine learning API  from scratch  with FastAPI
Talks # 8: Sebastián Ramírez; Build a machine learning API from scratch with FastAPI
Abhishek Thakur
46 Data Science PC Configs: From Low Range to Super-High Range
Data Science PC Configs: From Low Range to Super-High Range
Abhishek Thakur
47 BERT Model Architectures For Semantic Similarity
BERT Model Architectures For Semantic Similarity
Abhishek Thakur
48 I just got access to GitHub's Codespaces and it's amazing!
I just got access to GitHub's Codespaces and it's amazing!
Abhishek Thakur
49 Talks # 9: Vladimir Iglovikov; Detecting Masked Faces In The Pandemic World
Talks # 9: Vladimir Iglovikov; Detecting Masked Faces In The Pandemic World
Abhishek Thakur
Tips To Build A Good Data Science / Machine Learning Project (For Your Portfolio)
Tips To Build A Good Data Science / Machine Learning Project (For Your Portfolio)
Abhishek Thakur
51 Docker For Data Scientists
Docker For Data Scientists
Abhishek Thakur
52 How To Become A Data Scientist In 1 Year (Learn From A Real World Example)
How To Become A Data Scientist In 1 Year (Learn From A Real World Example)
Abhishek Thakur
53 Talks # 10: Tanishq Abraham; What are CycleGANs? (a novel deep learning tool in pathology)
Talks # 10: Tanishq Abraham; What are CycleGANs? (a novel deep learning tool in pathology)
Abhishek Thakur
54 Deploy Any Machine Learning Or Deep Learning Model On Google Cloud Platform (App Engine)
Deploy Any Machine Learning Or Deep Learning Model On Google Cloud Platform (App Engine)
Abhishek Thakur
55 Pair Programming: Deep Learning Model For Drug Classification With Andrey Lukyanenko
Pair Programming: Deep Learning Model For Drug Classification With Andrey Lukyanenko
Abhishek Thakur
56 VS Code (codeserver) on Google Colab / Kaggle / Anywhere
VS Code (codeserver) on Google Colab / Kaggle / Anywhere
Abhishek Thakur
57 Talks # 11: Jean-François Puget; Did you know GPUs are not just for Deep Learning?
Talks # 11: Jean-François Puget; Did you know GPUs are not just for Deep Learning?
Abhishek Thakur
58 End-to-End: Automated Hyperparameter Tuning For Deep Neural Networks
End-to-End: Automated Hyperparameter Tuning For Deep Neural Networks
Abhishek Thakur
59 Deploy Any Machine Learning (or Deep Learning) Endpoint on Google Cloud Platform In 10 minutes
Deploy Any Machine Learning (or Deep Learning) Endpoint on Google Cloud Platform In 10 minutes
Abhishek Thakur
60 Ensembling, Blending & Stacking
Ensembling, Blending & Stacking
Abhishek Thakur

This video provides tips on building a good data science and machine learning project for a portfolio. It covers aspects such as project selection, code quality, execution, and presentation. By following these tips, viewers can create a well-structured and effective project that showcases their skills.

Key Takeaways
  1. Choose a project that excites you and encourages you to work hard
  2. Experiment and experiment to achieve the best results
  3. Write good code that is readable and well-documented
  4. Use tools like flake8 and pylint to help with writing better code
  5. Create a web application to showcase your project in action
  6. Make a short video demo of your project
  7. Write a blog post describing your project and its execution
💡 A well-structured and effective project is key to a strong portfolio, and can be achieved by following a set of best practices and using the right tools.

Related AI Lessons

Up next
Learn Deep Learning by Hand (Beginner's Guide - Part 1)
Thu Vu
Watch →