Using Computer Code to Decipher Genetic Code - Part 1 (Bioinformatics 101)

Data Professor · Beginner ·🛠️ AI Tools & Apps ·6y ago

Key Takeaways

The video series introduces bioinformatics, a field that applies data science and machine learning to explore and model biological data, using tools such as GenBank, protein databank, and blast, and discusses the importance of understanding omics data and its applications in precision medicine.

Full Transcript

welcome back to the data professor YouTube channel if you new here my name is Shannon Anthes and Ahmad and I'm an associate professor of bioinformatics on this YouTube channel we cover about data science concepts and practical tutorials so if you're into this type of content please consider subscribing so in this video I'm going to give you a brief introduction to the field of bioinformatics because in the first part I talked about how to collect the data but then you might be confused as to what is the data all about and so we're gonna take a step back to look at the big picture about what is bioinformatics and the brief look of what is biology and for those of you who are not biology major this video is for you and for those of you who are computer science major then you want to stick to the end of the video where I will share some of the tips and tricks of what I use in developing bioinformatics tools which are the exact same methodology which I use in my own research group and so without further ado let's get started ok and so if you haven't yet subscribed please subscribe to the channel for the subsequent part of this bioinformatics project series ok so before beginning let's have a look at this wonderful quote that I really like from Florian Marquez in all biology is computational biology so in this article that Florian wrote he mentioned that all biology is really computational biology so this is very eloquently put and so he says that I will argue that computational thinking and computational methods are so central to the quest of understanding life that today all biology is computational biology and so this is particularly true given that the field of biology and medicine is very data-intensive meaning that the life sciences which is the collective name for biology chemistry and all of the Natural Sciences also medicine so they're all housed in the field called life science and so life science is very data-intensive meaning that the general high throughput data because of the advancement in the science and technology which give rise to high throughput equipment where robotics automation computer software programming are used to generate high volume of data and which is coupled with the lowered cost of storage and so when high volume of data are generated from these biological outputs from experiments from measurement and so all of these information are stored in big data leaks and so researchers at the time they might not know what to do with all of these big data in biology and so the natural step is to just store it and figure out later how to make use of that data ok so that might be the thinking before but with the advancement in the field of bioinformatics computational biology there is a paradigm shift towards data intensive biology where we use data to drive biological insights and I read somewhere that Carl Linnaeus who is a Swedish botanist if he was to live today he would probably be a computational biologist and so he is the father of modern taxonomy whereby organisms animals plants are classified into various kingdoms genus species etc ok and so let's proceed to the next slide quest for understanding about life so humankind are in a constant effort to understand the world around us so humans have always been curious and so this had led them to perform worldwide expedition by Christopher Columbus Amerigo Vespucci Magellan which facilitated the discovery of previously unknown parts of the world and in modern times the search of extraterrestrial life as well as understanding of our planet and galaxy led to the development of strong satellites and space expedition to other space which have been created and launched and in the field of life sciences as I have already mentioned the generation of this big data is called omics data and so the omics data allow us to further understand the molecular basis of life and how we can treat diseases and so before we proceed further let's start with a general terminology what is biology well biology is the study of life and living organism and the term biology originates from the greek words bios which means life and loggia which means study and the field essentially studies the biological processes sustaining life this is a very classical biology book Campbell biology which kind of brings back memories of first year of college and so the following bullets are extracted from the table of contents from this book and so I rearranged the ordering of some of the bullet points to start from the macro level and into the micro level at the big picture we have the ecology and the ecosystem right so it's the habitats where living organisms are thriving either symbiotically or not and so ecology deals with the population of different organisms and then if we move on to the next bullet point we look at the individual organism and the individual organism has several levels of organization right so a human being is comprised of organs and organs are made up of tissues and tissues are made up of cells and cells are made up of organelles nucleus and the nucleus and organelles contains the DNA and proteins and the proteins are made up of amino acids amino acids are made up of atoms and atoms are made up of electrons neutrons and protons okay and then to the next bullet point is evolution biological diversity because over the span of time living organisms will evolve by adapting to their own environment and so essentially evolution could be either convergence or divergence meaning that if two organisms live in the same environment but they are different at first and so they could Co evolve and then converge on the other hand two organisms could be coming from the same genus or species but then they lived in environment and so they could diverge right because each of this when placed in different environment they will have to adapt to that environment right and so if we look at it at the biological level at the molecular level their biochemical pathway might have been adapted to fit in with the environment that they are living in and so they are diverging in terms of evolution okay and the book also covers about micro organisms vertebrates and invertebrates right having or not having backbones right vertebrates invertebrates and the microbes or micro organisms are the tiny organisms that we need microscope to see and they could be pathogenic meaning that they cost disease they could be viruses or they could be probiotics such as the ones found in yogurt and so they are in the gut as well and if we look at the animals and plant structure growth and development and so it's the various structural component of animals as well as their growth and development and if we zoom in at the molecular level we look at their cellular structure looking at the molecular metabolism looking at how to sell communicate and then if we zoom in further we're gonna look at the central dogma and the genetic information and so that essential dogma just tell us about the information flow from DNA to RNA to protein and so essentially that's what biology and tales and if you would like to read further I would recommend this Campbell biology book ok so previously I talked about the omics right so what is the omics so historically omics started from the first ohmic which is genomics and so genomic entails information about the genes so there was this big project called the human genome project where where initially scientists had big hopes for the project meaning that it was thought that the completion of that project would bring about understanding of the basis of life but then the completion of the Human Genome Project was just the beginning was just scratching the surface of what genomics has to offer us in understanding about life and so this led to other omics as well such as the protein omics or the proteomic so they are the information about proteins and like komak entails information about sugars lipid Emmaus entails information about lipids metabolites or metabolomics and the interaction between molecule our interact to mix okay and so all of these omics represent big data and so there has never been a better time to learn data science and apply it to biology and there is a lot of interesting data that you could play around with that you could try to make sense and so it's a one area that I'm involved in is drug design drug discovery and so the big data on interactome II is what I use for my research ok let's continue ok so I have already mentioned all of these points so let's begin to the next slide and so with omics comes precision medicine so omics will provide us the basis right it provides us with the information of patients right conceptually if you take patients tissue samples you could perform various type of ohmic analysis and then such big data could then be used to design an antibody that is tailored made to the patient or to take the cells or the tissue samples from the patient and create such organoid which would mimic the organ of the patient and then test an fda-approved drug to see whether the organoid would respond favorably or unfavorably to the drug so this is outside in vitro and so not in the host or the human being or the patient but in the organoid right in the test tube and the benefit of this precision medicine is it allows us to use the data generated from the patient and create tailor-made or customized medicine because each human beings are point zero zero one percent difference in their genomic data and so we can harness the big data coming from this omics derived from the patient in order to perform analysis and figure out the optimal medicine or optimal treatment plan or a customized drug or antibody that is tailor-made specific to the patient and so this field has immense utility and so some of the challenges of this big data is that the omics data are often large and complex and so this is particularly true for the next-generation sequencing which generates humongous size of data and so the size is simply not feasible to download from the internet but in order to perform such analysis you will have to have someone send you the hard drive containing the genomic data right because let's say that data is very big in the terabytes or petabytes order then it's not feasible to use the Internet to download that so it's more convenient and more economical to test and in the hard drive by post okay and so the curse of dimensionality owing to the large variables will render conventional statistical methods rather difficult to perform and so we need to use machine learning and artificial intelligence in order to make sense of this big data and so you might ask what is bioinformatics and so as I mention in the previous slide what was once seemed impossible and formidable it's now possible by a field called bioinformatics and so we could think of the field of bioinformatics as a field that applies statistics and information theory to make sense of big biological data and so by information theory this would encompass machine learning data bases and other informatics approach so bioinformatics allows us to harness the big data that are available in order to deduce and understand the molecular basis of how disease arises particularly for example how mutated genes work and how do they give rise to the downstream effects an example would be to identify which gene is responsible for a disease and to compare the gene frequency of two cohort cohort meaning to population so the population compressing of people having the disease and those not having the disease and so bioinformatics lies at the interface between biology and computer science so we're essentially applying concepts from computer science to make sense of big data in biology and so bioinformatics is especially important in this age of post genomic era where there are various omec as I have mentioned genome proteome at abalone microbiome meta-genome interactome and so let's have a look at some of the common tasks in bioinformatics so the first common tasks would be to search and so by searching I mean search public databases for information about the gene proteins RNA and the biochemical pathway and so this comes in the form of databases like GenBank protein databank egg database you need prod right and services from the NCBI such as the blast where you could have a query sequence either the gene or the protein and then you would last and so the blast will allow you to identify the identification of the unknown sequence so what is the name of the gene or what is the name of the putative protein based on the similarity search which is essentially the sequence alignment right so this brings us to the second part is compare right because we are able to compare the similarity between two sequences or more than two by means of performing sequence alignments and so the third would be to construct models such as structural models of protein structure and also to build predictive model particularly using machine learning in order to make sense of retrospective data and finally to integrate and curate so this will take the most amount of time it is essentially data collection and pre-processing and so as they say garbage in garbage out so the most important part of data science is high-quality data and so the integration and curation of data is very crucial and it's the pillar for the success of bioinformatics okay so computational biology versus bioinformatics so both terms might be used interchangeably they're pretty synonymous they're very similar and so let's think of computational biology as the application of computational technique to understand biology and let's think of bioinformatics as the development of algorithms and tools to analyze and solve biological data right so both are similar right so bio formatic entails more of a technical term meaning to apply algorithms and make tools to solve the biology data by computational biology is simply taking an existing tool or software that a bioinformatician has developed and used that to solve biology problem so what are bioinformatics tools they are databases software's web servers and so they help us to analyze and gain insights from the biology data and so the bioinformatics tool could be either commercially available where we have to pay to use it or it could be freely available either in the form of publicly free to use or free for academic Institute's okay and so this is a breakdown of the commercial software versus the free bioinformatics software and so in terms of the cost features support and ease of use right so for commercial you either pay a one-time fee or a rolling fee by subscription we're asked if it is free it could either be no cost meaning you don't have to pay whether you're from industry or academia or it could be free for academia meaning that those from industry will have to pay and so the features for commercial software would be a bit more reliable because the company is paid to make progress whereas free software or coming from open source project are relying on volunteers and the volunteers are people who spend their free time to develop together as a community the features of the software and so the time at which the features are released might not be as strict at periodic as the commercial company but that is changing owing to the growing community base of some open source projects however there is no guarantee because volunteers can come in or come out and so that might affect the reliability or the dependency of the rolling of the features and so if we think of in terms of these supports commercial company has dedicated staff to provide support whereas the free project would have no dedicated support and rely on community he supports oh just people users who are helping one another so let's think of Stack Overflow okay community supports ease of use so commercial software are usually intuitive and have few bugs because they have to debug and test rigorously before they roll out the software but free projects might have bugs and they might be more difficult to use so this really depends on the popularity or the number of contributor to the open source project okay and so you have to wait in the pros and cons of the commercial and free and sometimes I see that commercial software are essentially comprised of features coming from free software by weaving together in a seamless manner meaning that the commercial software might just be a stitch of the features from free software but made it into a uniform workflow that's easy to use and friendly to the non coder or non programmer okay and so if you find value in this video please give it a thumbs up and if you haven't yet subscribed please subscribe to the channel and as always the best way to learn data science is to do data science and please enjoy the journey thank you for watching please like subscribe and share and I'll see you in the next one but in the meantime please check out these videos

Original Description

This is a 2 Part series (Bioinformatics 101). I will provide a Non-Technical Introduction to the Exciting field of Bioinformatics so that you can get started in applying Data Science / Machine Learning to explore and model interesting data sets in biology, medicine and the life sciences. This is Part 1 and please also stay tuned for Part 2 which is coming up really soon. 🌟 Buy me a coffee: https://www.buymeacoffee.com/dataprofessor ⭕ Timeline 1:14 Quote (All biology is computational biology) 3:40 Quests for undertanding about life (1) 4:02 Quests for undertanding about life (2) 4:34 What is Biology? (1) 4:56 What is Biology? (2) 8:06 OMICs give rise to Big Data in Life Science 9:36 Biology and Big Data 11:12 Challenges of Big data 12:07 What is Bioinformatics? (1) 12:35 What is Bioinformatics (2) 13:20 What is Bioinformatics? (3) 13:33 Common tasks in Bioinformatics 15:10 Computational Biology vs Bioinformatics 15:55 What are Bioinformatics Tools 16:18 Commercial vs Free Bioinformatics Tools After completing this video, make sure to get started in our hands-on Bioinformatics Project series: ✅Watch Part 1 (Bioinformatics Project): https://youtu.be/plVLRashaA8 ⭕ Playlist: Check out our other videos in the following playlists. ✅ Data Science 101: https://bit.ly/dataprofessor-ds101 ✅ Data Science YouTuber Podcast: https://bit.ly/datascience-youtuber-podcast ✅ Data Science Virtual Internship: https://bit.ly/dataprofessor-internship ✅ Bioinformatics: http://bit.ly/dataprofessor-bioinformatics ✅ Data Science Toolbox: https://bit.ly/dataprofessor-datasciencetoolbox ✅ Streamlit (Web App in Python): https://bit.ly/dataprofessor-streamlit ✅ Shiny (Web App in R): https://bit.ly/dataprofessor-shiny ✅ Google Colab Tips and Tricks: https://bit.ly/dataprofessor-google-colab ✅ Pandas Tips and Tricks: https://bit.ly/dataprofessor-pandas ✅ Python Data Science Project: https://bit.ly/dataprofessor-python-ds ✅ R Data Science Project: https://bit.ly/dataprofessor-r-ds ⭕ Subscribe:
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Professor · Data Professor · 0 of 60

← Previous Next →
1 How a Biologist became a Data Scientist
How a Biologist became a Data Scientist
Data Professor
2 WEKA Tutorial #1.1 - How to Build a Data Mining Model from Scratch
WEKA Tutorial #1.1 - How to Build a Data Mining Model from Scratch
Data Professor
3 WEKA Tutorial #1.2 - How to Build a Data Mining Model from Scratch
WEKA Tutorial #1.2 - How to Build a Data Mining Model from Scratch
Data Professor
4 WEKA Tutorial #1.3 - How to Build a Data Mining Model from Scratch
WEKA Tutorial #1.3 - How to Build a Data Mining Model from Scratch
Data Professor
5 Computational Drug Discovery: Machine Learning for Making Sense of Big Data in Drug Discovery
Computational Drug Discovery: Machine Learning for Making Sense of Big Data in Drug Discovery
Data Professor
6 Quotes #1 on Big Data and Data Science
Quotes #1 on Big Data and Data Science
Data Professor
7 Quotes #2 on Big Data and Data Science
Quotes #2 on Big Data and Data Science
Data Professor
8 Quotes #3 on Big Data and Data Science
Quotes #3 on Big Data and Data Science
Data Professor
9 Quotes #4 on Big Data and Data Science
Quotes #4 on Big Data and Data Science
Data Professor
10 Quotes #5 on Big Data and Data Science
Quotes #5 on Big Data and Data Science
Data Professor
11 Data Science 101: Starting a Data Science / Data Mining Project
Data Science 101: Starting a Data Science / Data Mining Project
Data Professor
12 Data Science 101: CRISP-DM - Data Mining / Data Science in 6 Steps
Data Science 101: CRISP-DM - Data Mining / Data Science in 6 Steps
Data Professor
13 R Programming 101: How to Define Variables
R Programming 101: How to Define Variables
Data Professor
14 R Programming 101: Read and Write CSV files
R Programming 101: Read and Write CSV files
Data Professor
15 Data Science 101: Basic Command-Line for Data Science
Data Science 101: Basic Command-Line for Data Science
Data Professor
16 Strategies for Learning Data Science in 2020 (Data Science 101)
Strategies for Learning Data Science in 2020 (Data Science 101)
Data Professor
17 Building your Data Science Portfolio with GitHub (Data Science 101)
Building your Data Science Portfolio with GitHub (Data Science 101)
Data Professor
18 R Programming 101: Setting up R programming environment (R, RStudio and RStudio.cloud)
R Programming 101: Setting up R programming environment (R, RStudio and RStudio.cloud)
Data Professor
19 Exploratory Data Analysis in R: Towards Data Understanding
Exploratory Data Analysis in R: Towards Data Understanding
Data Professor
20 Exploratory Data Analysis in R: Quick Dive into Data Visualization
Exploratory Data Analysis in R: Quick Dive into Data Visualization
Data Professor
21 Machine Learning in R: Building a Classification Model
Machine Learning in R: Building a Classification Model
Data Professor
22 Machine Learning in R: Repurpose Machine Learning Code for New Data
Machine Learning in R: Repurpose Machine Learning Code for New Data
Data Professor
23 Data Science 101: Deploying your Machine Learning Model
Data Science 101: Deploying your Machine Learning Model
Data Professor
24 Machine Learning in R: Deploy Machine Learning Model using RDS
Machine Learning in R: Deploy Machine Learning Model using RDS
Data Professor
25 Data Pre-processing in R: Handling Missing Data
Data Pre-processing in R: Handling Missing Data
Data Professor
26 Machine Learning in R: Speed up Model Building with Parallel Computing
Machine Learning in R: Speed up Model Building with Parallel Computing
Data Professor
27 Data Science 101: Overview of Machine Learning Model Building Process
Data Science 101: Overview of Machine Learning Model Building Process
Data Professor
28 Web Apps in R: Building your First Web Application in R | Shiny Tutorial Ep 1
Web Apps in R: Building your First Web Application in R | Shiny Tutorial Ep 1
Data Professor
29 Web Apps in R: Build Interactive Histogram Web Application in R | Shiny Tutorial Ep 2
Web Apps in R: Build Interactive Histogram Web Application in R | Shiny Tutorial Ep 2
Data Professor
30 Web Apps in R: Building Data-Driven Web Application in R | Shiny Tutorial Ep 3
Web Apps in R: Building Data-Driven Web Application in R | Shiny Tutorial Ep 3
Data Professor
31 Web Apps in R: Building the Machine Learning Web Application in R | Shiny Tutorial Ep 4
Web Apps in R: Building the Machine Learning Web Application in R | Shiny Tutorial Ep 4
Data Professor
32 Web Apps in R: Build BMI Calculator web application in R for health monitoring | Shiny Tutorial Ep 5
Web Apps in R: Build BMI Calculator web application in R for health monitoring | Shiny Tutorial Ep 5
Data Professor
33 Machine Learning in R: Building a Linear Regression Model
Machine Learning in R: Building a Linear Regression Model
Data Professor
34 What programming language to learn for Data Science? R versus Python
What programming language to learn for Data Science? R versus Python
Data Professor
35 How to Become a Data Scientist (Learning Path and Skill Sets Needed)
How to Become a Data Scientist (Learning Path and Skill Sets Needed)
Data Professor
36 Using Python in R
Using Python in R
Data Professor
37 Interpretable Machine Learning Models
Interpretable Machine Learning Models
Data Professor
38 Making Scatter Plots in R [Data Visualisation in R series]
Making Scatter Plots in R [Data Visualisation in R series]
Data Professor
39 Machine Learning in Python: Building a Classification Model
Machine Learning in Python: Building a Classification Model
Data Professor
40 Compare Machine Learning Classifiers in Python
Compare Machine Learning Classifiers in Python
Data Professor
41 Hyperparameter Tuning of Machine Learning Model in Python
Hyperparameter Tuning of Machine Learning Model in Python
Data Professor
42 Practical Introduction to Google Colab for Data Science
Practical Introduction to Google Colab for Data Science
Data Professor
43 File Handling in Google Colab for Data Science
File Handling in Google Colab for Data Science
Data Professor
44 Pandas for Data Science: Create and Combine DataFrames / Rename Columns
Pandas for Data Science: Create and Combine DataFrames / Rename Columns
Data Professor
45 Machine Learning in Python: Building a Linear Regression Model
Machine Learning in Python: Building a Linear Regression Model
Data Professor
46 Machine Learning in Python: Principal Component Analysis (PCA) for Handling High-Dimensional Data
Machine Learning in Python: Principal Component Analysis (PCA) for Handling High-Dimensional Data
Data Professor
47 How to Plot an ROC Curve in Python | Machine Learning in Python
How to Plot an ROC Curve in Python | Machine Learning in Python
Data Professor
48 Installing conda on Google Colab for Data Science
Installing conda on Google Colab for Data Science
Data Professor
49 Use native R on Google Colab for Data Science
Use native R on Google Colab for Data Science
Data Professor
50 How to Save and Download files from Google Colab
How to Save and Download files from Google Colab
Data Professor
51 Easy Web Scraping in Python using Pandas for Data Science
Easy Web Scraping in Python using Pandas for Data Science
Data Professor
52 Data Science for Computational Drug Discovery using Python (Part 1)
Data Science for Computational Drug Discovery using Python (Part 1)
Data Professor
53 Pandas Profiling for Data Science (Quick and Easy Exploratory Data Analysis)
Pandas Profiling for Data Science (Quick and Easy Exploratory Data Analysis)
Data Professor
54 Exploratory Data Analysis in Python using pandas
Exploratory Data Analysis in Python using pandas
Data Professor
55 Quick tour of PyCaret (a low-code machine learning library in Python)
Quick tour of PyCaret (a low-code machine learning library in Python)
Data Professor
56 How to Upload Files to Google Colab
How to Upload Files to Google Colab
Data Professor
57 How to Install and Use Pandas Profiling on Google Colab
How to Install and Use Pandas Profiling on Google Colab
Data Professor
58 How to Adjust the Style of Pandas DataFrame
How to Adjust the Style of Pandas DataFrame
Data Professor
59 How to use Bamboolib for Data Wrangling in Data Science
How to use Bamboolib for Data Wrangling in Data Science
Data Professor
60 How to use Pandas Profiling on Kaggle
How to use Pandas Profiling on Kaggle
Data Professor

This video series introduces bioinformatics, a field that applies data science and machine learning to explore and model biological data, and discusses the importance of understanding omics data and its applications in precision medicine. The series covers the basics of bioinformatics, including the use of tools such as GenBank and blast, and the application of machine learning and artificial intelligence to biological data. By watching this series, viewers can gain a understanding of how bioinf

Key Takeaways
  1. Understand the basics of bioinformatics and its applications
  2. Learn about omics data and its importance in understanding the molecular basis of life
  3. Apply machine learning and artificial intelligence to biological data
  4. Use bioinformatics tools such as GenBank and blast to analyze biology data
  5. Integrate and curate biology data to gain insights
💡 Bioinformatics is a field that applies statistics and information theory to make sense of big biological data, and its applications in precision medicine have the potential to revolutionize human health.

Related AI Lessons

Chapters (15)

1:14 Quote (All biology is computational biology)
3:40 Quests for undertanding about life (1)
4:02 Quests for undertanding about life (2)
4:34 What is Biology? (1)
4:56 What is Biology? (2)
8:06 OMICs give rise to Big Data in Life Science
9:36 Biology and Big Data
11:12 Challenges of Big data
12:07 What is Bioinformatics? (1)
12:35 What is Bioinformatics (2)
13:20 What is Bioinformatics? (3)
13:33 Common tasks in Bioinformatics
15:10 Computational Biology vs Bioinformatics
15:55 What are Bioinformatics Tools
16:18 Commercial vs Free Bioinformatics Tools
Up next
AI in Care - Katie Furey, Pairly.com
The Access Group
Watch →