Using Computer Code to Decipher Genetic Code - Part 2 (Bioinformatics 101)

Data Professor · Beginner ·🛠️ AI Tools & Apps ·6y ago

Key Takeaways

This video series, Bioinformatics 101, provides a non-technical introduction to the field of bioinformatics, covering topics such as computational models, machine learning algorithms, and structure-activity relationship analysis, with tools like QSAR modeling, Python, and artificial neural networks.

Full Transcript

welcome back to the data professor YouTube channel if you new here my name is Shannon not a sin Ahmad and I'm an associate professor of bioinformatics on this YouTube channel we cover about data science concepts and practical tutorials so if you're into this type of content please consider subscribing okay so let's compare between bioinformatics and data science and so to the left by informatics is aimed at making sense and gaining insights from biological data and so data science is just more generic term to mean make sense and gain insights from data and so we can see that bioinformatics and data science are quite similar in the sense that biology is replacing domain knowledge and so statistics and computer science here are essentially the same and so the domain knowledge in bioinformatics it's essentially biology ok so this is just a crude comparison between both field and bio formatic might not entail the use of machine learning algorithms and there are other parts of bio formatic that are not accounted for in this simplified comparison just a point to note ok so you might ask why do we need computational models for drug discovery so let's look at some case study of other areas so computers such as IBM D blue had defeated human beings in jeopardy and chess Google released a self-driving car nASA uses computers to simulate space mission computers are being used to design aircrafts and cars supermarkets and shopping malls are using purchase history to analyze and predict our spending behavior for example the membership card that they gave us they could track our purchase and then analyze the data for example buy Market Basket analysis by seeing which products do we normally buy together and also Amazon go has made it possible to use AI and face recognition to allow shoppers to just walk in walk out right just walk in grab what you need put it in your bag and this leave and so with all of these case study in mind the natural question would be why not use it for discovering this and developing new drugs and so the thing is we do we do use computers for drug discovery and so the first example is to discern the structure-activity relationship of chemical library and so this means that the chemical structure of compounds or molecules are being correlated to their molecular features which is a quantitative or qualitative description of the chemical structure and so such computational model such predictive models are now being used as alternative to some experimental work and a notable example here is the use of deep learning to decode and encode Smiles notation of molecules in order to analyze the structure activity relationship and based on such model you said to generate new molecules so this is a very interesting application and on the right here we see that there is the PCA analysis or the principal component analysis which tells the clustering of the active drugs and the inactive drugs meaning compounds that have good activity toward the target protein and bad activity toward the target protein so green and red so Green is good red is bad okay so additional examples computational models can be quickly built to predict the pharmacokinetics and bioactivity of query compounds by pharmacokinetic it means the absorption distribution metabolism excretion and toxicity of drugs and the bio activity is the inhibition or activation of the target protein and so the data that you have collected in part one essentially falls into this bullet point particularly the bio activity of query compounds okay so the compounds that you have compiled which inhibit the corona virus are compounds that have been tested experimentally to inhibit the protein from the corona virus and so you could change the name of this target protein to other protein such as aroma taste or other protein that you are interested in and so the compounds are modulating and by modulating I mean they are trying to control the target protein by either inhibiting it or to activate it okay and so you could think of these molecule as the on and off switch of the target protein so when you apply that molecule you could turn it off or turn it on in terms of the function of the protein and so your predictive model using machine learning will allow biologists and chemists to understand the relationship between which molecular features give rise to the biological activity and such predictive models can be applied for personalized medicine and so this figure is an example of the functional group analysis from structure activity relationship model building from one of our papers and other specific questions that can be answered by computational models include what target proteins what target proteins can my compound bind to and so the target protein could be the aromatase or it could be the protease it could be the glucose CDs amylase etc and so with the target protein of your interest you want to know which compound or small molecule could come and modulate this target proteins activity either to inhibit or to activate it and so the second question could be what type of compound can bind and modulate the bio activity of the target protein of my interest and the third question would be are there any similar compounds to my query compound that may potentially exert similar binding behavior so let's say that you have a fda-approved drug that has been known to inhibit aromatase enzyme and you want to know is there any other small molecule that have similar structure with this fda-approved drug and so the reason being that the fda-approved drug might be a good drug but it might have some side effects that are undesirable and so the goal of drug discovery and drug design is to develop drugs with minimal side effects and so that entails the optimization of the pharmacokinetics comprising of the absorption distribution metabolism excretion toxicity okay and so it would be interesting to see is there any similar structure to the fda-approved drug that have similar binding effects but with a safer pharmacokinetic profile and so all of these could be answered by applying machine learning and data science okay and so as we know data science is the process of identifying and making sense of hidden patterns and so this hidden patterns could be the knowledge and we try to make sense of hidden patterns that are found in the large amounts of data and so typically the data has this hierarchy going from raw unstructured data to becoming more structured data and then once we structure the data we could uncover the patterns and once we have the patterns we can gain knowledge and finally we could apply it to have wisdom okay and so on to QSAR modeling which stands for quantitative structure-activity relationship modeling so this is a mathematical modeling that tries to find relationship between chemical structure and the bioactivity or the biological activity and so the chemical structure can be represented by a set of molecular descriptors which could be quantitative or qualitative and the molecular descriptor will be about the physical chemical properties of the molecule the molecular descriptor could either be global features or local features by global meaning molecular weights at the holistic level and local features could mean like the small minor detail like the functional groups hydroxy group nitrogen atoms or the charge a specific portion of the molecule okay so I'm gonna cover that more in the subsequent video tutorials where we generate them a little descriptor and so I'm gonna tell you which one are the global features and which ones are the local features and so this is the workflow of the QSAR modeling or accuser modeling and so this is from the first review article that I have written back in 2009 so it was my first review article which I wrote about the Q star modeling so you could copy this and paste it into Google to read more details about it or I could also provide it in the description down below and I could also provide it in the video description so you can check that out and another similar terminology related to the Q star modeling would be pro do chemometric modeling and so I'm not gonna go into much detail about this it's more advanced which would be better to save for future videos as well and so let's think of the pro you chemometric model as several QSAR models combined together okay so that's the essential concept of the PCM so we're gonna skip it for now and so this is the holistic level of all of the resources and tools available for drug discovery at the holistic level and so this is from one of my editorial articles and this is a summary of the procedures for the development of the QSAR model so it's essentially the development of machine learning models and when we use machine learning to make sense of this chemical data biological data we change the name to QSAR but it's essentially machine learning model so as you can see we have data collection biological data collection we generate the descriptor feature generation feature selection data pre-processing splitting of the data to training and test set validating the model internally and developed a model and performing evaluation of the model performance and also to perform external validation on an external sets okay and so this is a list of the chemical databases and the list of molecular descriptor softwares and so for future tutorial videos of this bioinformatics project series we're gonna use free software or open source software so don't worry that you have to buy expensive software for following along so we're gonna make use of the open source software and this is the list of the computational chemistry software so don't worry about that so back in 2016 we organized the first international conference on pharmaceutical bioinformatics so it was a collaboration with Uppsala University so we bring together into ciass of drug discovery and design and so this was the poster advertisement for the conference and yours truly and so the question here is why do we need to develop our own bioinformatics tool so you might notice that there might be several thousands of bioinformatics tools that are already in existence so do we think that all possible tools would have already been developed what do you think is it true or false so let me know in the description and we could discuss about this and the second question is bioinformatics tools will be available forever will it be available forever let me know in the comments true or false existing tools may lack certain features that we need in our own project what do we do do we develop our own tools or do we proceed without this feature so just go ahead and ignore the feature that we wanted okay so some more questions for you which path will you take will you hire a programmer to develop the bioinformatics tool or will you learn how to program ok so there's two possible answers so let me know in the comments down below which is your answer so these are selected web servers and software that we have developed in our research group and so these software as you can see are all related to by informatics and the first one was wrapping the weaker software inside Python in order to automate the development of neural network models and support vector machine models so actually we built this back in early 2010 or 11 and it was the time when no automatic ml was available and so perhaps it was among one of the first automated data mining software and the software here was mentioned in one of the book chapters published by Springer in the book artificial neural network and we also develop other bioinformatics web server including OSF P cryoprotectant pret bio curator and Pi max and so our research group are practicing research reproducibility whereby we try to share the code and data that were used to prepare the analysis in the papers that we published so that other interested users can reproduce the work and perhaps make use of it in their own research and these are some of the book chapters and review articles that we have contributed and so this is the bonus I was talking about earlier on which is the steps to developing a bioinformatics tool and so as you can see essentially there is six steps and so the first step would be to come up with the concept of the bioinformatics tool so you want to figure out what bioinformatics tool or software that you want to develop and normally the idea for the bioinformatics tool will come from an unsolved problem that you might have or it might be an inconvenience to your project that you figure out would be an interesting topic to explore further for example it could be a problem or something that might slow down your analysis and if you are able to solve that by means of developing a bi automatic tool that it could not only save your time but other people's time as well okay so coming up with the concept of the bottleneck tool could come from the problems that you encounter and so once you come up with the tool you want to make a list of the features that you want to see so make a wish list of the features and so step number three is to list the sequential workflow methodology meaning the protocol or the pseudocode of the bioinformatic tool like step one step two step three what do you do how do you process the data okay and so step number four is that you have to realize that bioinformatics tool are essentially a collection of small tasks meaning that if you click on one button it will invoke a particular task it could be performed by a particular function that you develop and if you click on another button it will perform another function right so it is essentially a collection of tasks and so you weave it together to get this software so step number five is you want to work on the coding of each of these small tasks individually and so let's think of it as like a chapter in a book right so many chapters will comprise a book so you will work on individual chapters and within no time you will be able to complete all of the chapter to form the entire book and so the last step is to make sure that the entire workflow were as desired okay so this is about testing about debugging so you want to try from the beginning you want to input the data and see whether it finds the desirable intermediate data and the output data which is the final data right okay and so if you find value in this video please give it a thumbs up and if you haven't yet subscribed please subscribe to the channel and as always the best way to learn data science is to do data science and please enjoy the journey thank you for watching please like subscribe and share and I'll see you in the next one but in the meantime please check out these videos

Original Description

This is a 2 Part series (Bioinformatics 101). I will provide a Non-Technical Introduction to the Exciting field of Bioinformatics so that you can get started in applying Data Science / Machine Learning to explore and model interesting data sets in biology, medicine and the life sciences. This is Part 2 and make sure to watch Part 1 (https://www.youtube.com/watch?v=p5iZxIT16KQ) first. 🌟 Buy me a coffee: https://www.buymeacoffee.com/dataprofessor ⭕ Timeline 0:20 Bioinformatics vs Data science 1:09 Why do we need computational models in drug discovery 2:12 Why do we need computational models in drug discovery (2) 3:26 Why do we need computational models in drug discovery (3) 5:04 Specific questions that can be answered by computational models 6:41 Data Science 7:19 QSAR modeling 8:18 Workflow of QSAR modeling 8:44 Proteochemometric modeling 9:10 Overview of Computational tools in Drug Discovery 10:33 1st International Conference on Pharmaceutical Bioinformatics (ICPB 2016) 10:55 Why develop our own Bioinformatics tools 11:39 How to develop a Bioinformatics tools 13:11 Steps to developing a Bioinformatics tool After completing this video, make sure to get started in our hands-on Bioinformatics Project series: ✅Watch Part 1 (Bioinformatics Project): https://youtu.be/plVLRashaA8 ⭕ Playlist: Check out our other videos in the following playlists. ✅ Data Science 101: https://bit.ly/dataprofessor-ds101 ✅ Data Science YouTuber Podcast: https://bit.ly/datascience-youtuber-podcast ✅ Data Science Virtual Internship: https://bit.ly/dataprofessor-internship ✅ Bioinformatics: http://bit.ly/dataprofessor-bioinformatics ✅ Data Science Toolbox: https://bit.ly/dataprofessor-datasciencetoolbox ✅ Streamlit (Web App in Python): https://bit.ly/dataprofessor-streamlit ✅ Shiny (Web App in R): https://bit.ly/dataprofessor-shiny ✅ Google Colab Tips and Tricks: https://bit.ly/dataprofessor-google-colab ✅ Pandas Tips and Tricks: https://bit.ly/dataprofessor-pandas ✅ Python Data Science Proje
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Professor · Data Professor · 0 of 60

← Previous Next →
1 How a Biologist became a Data Scientist
How a Biologist became a Data Scientist
Data Professor
2 WEKA Tutorial #1.1 - How to Build a Data Mining Model from Scratch
WEKA Tutorial #1.1 - How to Build a Data Mining Model from Scratch
Data Professor
3 WEKA Tutorial #1.2 - How to Build a Data Mining Model from Scratch
WEKA Tutorial #1.2 - How to Build a Data Mining Model from Scratch
Data Professor
4 WEKA Tutorial #1.3 - How to Build a Data Mining Model from Scratch
WEKA Tutorial #1.3 - How to Build a Data Mining Model from Scratch
Data Professor
5 Computational Drug Discovery: Machine Learning for Making Sense of Big Data in Drug Discovery
Computational Drug Discovery: Machine Learning for Making Sense of Big Data in Drug Discovery
Data Professor
6 Quotes #1 on Big Data and Data Science
Quotes #1 on Big Data and Data Science
Data Professor
7 Quotes #2 on Big Data and Data Science
Quotes #2 on Big Data and Data Science
Data Professor
8 Quotes #3 on Big Data and Data Science
Quotes #3 on Big Data and Data Science
Data Professor
9 Quotes #4 on Big Data and Data Science
Quotes #4 on Big Data and Data Science
Data Professor
10 Quotes #5 on Big Data and Data Science
Quotes #5 on Big Data and Data Science
Data Professor
11 Data Science 101: Starting a Data Science / Data Mining Project
Data Science 101: Starting a Data Science / Data Mining Project
Data Professor
12 Data Science 101: CRISP-DM - Data Mining / Data Science in 6 Steps
Data Science 101: CRISP-DM - Data Mining / Data Science in 6 Steps
Data Professor
13 R Programming 101: How to Define Variables
R Programming 101: How to Define Variables
Data Professor
14 R Programming 101: Read and Write CSV files
R Programming 101: Read and Write CSV files
Data Professor
15 Data Science 101: Basic Command-Line for Data Science
Data Science 101: Basic Command-Line for Data Science
Data Professor
16 Strategies for Learning Data Science in 2020 (Data Science 101)
Strategies for Learning Data Science in 2020 (Data Science 101)
Data Professor
17 Building your Data Science Portfolio with GitHub (Data Science 101)
Building your Data Science Portfolio with GitHub (Data Science 101)
Data Professor
18 R Programming 101: Setting up R programming environment (R, RStudio and RStudio.cloud)
R Programming 101: Setting up R programming environment (R, RStudio and RStudio.cloud)
Data Professor
19 Exploratory Data Analysis in R: Towards Data Understanding
Exploratory Data Analysis in R: Towards Data Understanding
Data Professor
20 Exploratory Data Analysis in R: Quick Dive into Data Visualization
Exploratory Data Analysis in R: Quick Dive into Data Visualization
Data Professor
21 Machine Learning in R: Building a Classification Model
Machine Learning in R: Building a Classification Model
Data Professor
22 Machine Learning in R: Repurpose Machine Learning Code for New Data
Machine Learning in R: Repurpose Machine Learning Code for New Data
Data Professor
23 Data Science 101: Deploying your Machine Learning Model
Data Science 101: Deploying your Machine Learning Model
Data Professor
24 Machine Learning in R: Deploy Machine Learning Model using RDS
Machine Learning in R: Deploy Machine Learning Model using RDS
Data Professor
25 Data Pre-processing in R: Handling Missing Data
Data Pre-processing in R: Handling Missing Data
Data Professor
26 Machine Learning in R: Speed up Model Building with Parallel Computing
Machine Learning in R: Speed up Model Building with Parallel Computing
Data Professor
27 Data Science 101: Overview of Machine Learning Model Building Process
Data Science 101: Overview of Machine Learning Model Building Process
Data Professor
28 Web Apps in R: Building your First Web Application in R | Shiny Tutorial Ep 1
Web Apps in R: Building your First Web Application in R | Shiny Tutorial Ep 1
Data Professor
29 Web Apps in R: Build Interactive Histogram Web Application in R | Shiny Tutorial Ep 2
Web Apps in R: Build Interactive Histogram Web Application in R | Shiny Tutorial Ep 2
Data Professor
30 Web Apps in R: Building Data-Driven Web Application in R | Shiny Tutorial Ep 3
Web Apps in R: Building Data-Driven Web Application in R | Shiny Tutorial Ep 3
Data Professor
31 Web Apps in R: Building the Machine Learning Web Application in R | Shiny Tutorial Ep 4
Web Apps in R: Building the Machine Learning Web Application in R | Shiny Tutorial Ep 4
Data Professor
32 Web Apps in R: Build BMI Calculator web application in R for health monitoring | Shiny Tutorial Ep 5
Web Apps in R: Build BMI Calculator web application in R for health monitoring | Shiny Tutorial Ep 5
Data Professor
33 Machine Learning in R: Building a Linear Regression Model
Machine Learning in R: Building a Linear Regression Model
Data Professor
34 What programming language to learn for Data Science? R versus Python
What programming language to learn for Data Science? R versus Python
Data Professor
35 How to Become a Data Scientist (Learning Path and Skill Sets Needed)
How to Become a Data Scientist (Learning Path and Skill Sets Needed)
Data Professor
36 Using Python in R
Using Python in R
Data Professor
37 Interpretable Machine Learning Models
Interpretable Machine Learning Models
Data Professor
38 Making Scatter Plots in R [Data Visualisation in R series]
Making Scatter Plots in R [Data Visualisation in R series]
Data Professor
39 Machine Learning in Python: Building a Classification Model
Machine Learning in Python: Building a Classification Model
Data Professor
40 Compare Machine Learning Classifiers in Python
Compare Machine Learning Classifiers in Python
Data Professor
41 Hyperparameter Tuning of Machine Learning Model in Python
Hyperparameter Tuning of Machine Learning Model in Python
Data Professor
42 Practical Introduction to Google Colab for Data Science
Practical Introduction to Google Colab for Data Science
Data Professor
43 File Handling in Google Colab for Data Science
File Handling in Google Colab for Data Science
Data Professor
44 Pandas for Data Science: Create and Combine DataFrames / Rename Columns
Pandas for Data Science: Create and Combine DataFrames / Rename Columns
Data Professor
45 Machine Learning in Python: Building a Linear Regression Model
Machine Learning in Python: Building a Linear Regression Model
Data Professor
46 Machine Learning in Python: Principal Component Analysis (PCA) for Handling High-Dimensional Data
Machine Learning in Python: Principal Component Analysis (PCA) for Handling High-Dimensional Data
Data Professor
47 How to Plot an ROC Curve in Python | Machine Learning in Python
How to Plot an ROC Curve in Python | Machine Learning in Python
Data Professor
48 Installing conda on Google Colab for Data Science
Installing conda on Google Colab for Data Science
Data Professor
49 Use native R on Google Colab for Data Science
Use native R on Google Colab for Data Science
Data Professor
50 How to Save and Download files from Google Colab
How to Save and Download files from Google Colab
Data Professor
51 Easy Web Scraping in Python using Pandas for Data Science
Easy Web Scraping in Python using Pandas for Data Science
Data Professor
52 Data Science for Computational Drug Discovery using Python (Part 1)
Data Science for Computational Drug Discovery using Python (Part 1)
Data Professor
53 Pandas Profiling for Data Science (Quick and Easy Exploratory Data Analysis)
Pandas Profiling for Data Science (Quick and Easy Exploratory Data Analysis)
Data Professor
54 Exploratory Data Analysis in Python using pandas
Exploratory Data Analysis in Python using pandas
Data Professor
55 Quick tour of PyCaret (a low-code machine learning library in Python)
Quick tour of PyCaret (a low-code machine learning library in Python)
Data Professor
56 How to Upload Files to Google Colab
How to Upload Files to Google Colab
Data Professor
57 How to Install and Use Pandas Profiling on Google Colab
How to Install and Use Pandas Profiling on Google Colab
Data Professor
58 How to Adjust the Style of Pandas DataFrame
How to Adjust the Style of Pandas DataFrame
Data Professor
59 How to use Bamboolib for Data Wrangling in Data Science
How to use Bamboolib for Data Wrangling in Data Science
Data Professor
60 How to use Pandas Profiling on Kaggle
How to use Pandas Profiling on Kaggle
Data Professor

This video series provides an introduction to bioinformatics, covering topics such as computational models, machine learning algorithms, and structure-activity relationship analysis, with practical applications in drug discovery and personalized medicine.

Key Takeaways
  1. Come up with the concept of the bioinformatics tool
  2. Make a list of the features that you want to see
  3. List the sequential workflow methodology
  4. Realize that bioinformatics tools are a collection of small tasks
  5. Work on the coding of each of these small tasks individually
  6. Test and validate the bioinformatics tool
💡 Bioinformatics tools are a collection of small tasks that can be performed by functions, and can be developed using Python and artificial neural networks.

Related AI Lessons

How to Create a Second Version of Yourself Inside Obsidian Using AI (Step-by-Step Guide)
Learn to create a second version of yourself inside Obsidian using AI with a step-by-step guide
Medium · ChatGPT
How to prepare for Spain civil service TIC exam using AI in 2026
Learn how to prepare for the Spain civil service TIC exam using AI in 2026, boosting your chances of success with technology-driven study techniques
Dev.to · David García
Going Viral! How I Created AI Kissing Videos Step by Step Easily Using AIAI.com
Create viral AI kissing videos using AIAI.com in a step-by-step process, leveraging AI technology for creative content creation
Medium · AI
How to prepare TIC teacher exams in Spain with AI (oposiciones 2026)
Prepare for TIC teacher exams in Spain using AI with these actionable steps
Dev.to AI

Chapters (14)

0:20 Bioinformatics vs Data science
1:09 Why do we need computational models in drug discovery
2:12 Why do we need computational models in drug discovery (2)
3:26 Why do we need computational models in drug discovery (3)
5:04 Specific questions that can be answered by computational models
6:41 Data Science
7:19 QSAR modeling
8:18 Workflow of QSAR modeling
8:44 Proteochemometric modeling
9:10 Overview of Computational tools in Drug Discovery
10:33 1st International Conference on Pharmaceutical Bioinformatics (ICPB 2016)
10:55 Why develop our own Bioinformatics tools
11:39 How to develop a Bioinformatics tools
13:11 Steps to developing a Bioinformatics tool
Up next
Low-Tech, High-Impact: Replacing Your Receptionist With a $15 AI Phone System
Maximum Lawyer
Watch →