The Data Show Ep1 | Elucidating Data Science in Drug Discovery - A CTO's Account
Key Takeaways
The video discusses the application of data science and AI in drug discovery, with a focus on Elucidata's work in this area, and covers topics such as machine learning, traditional statistical analysis, and data science analysis.
Full Transcript
[Music] hello everyone welcome to The Daily Show a video podcast featuring interviews with shoot professionals and data science so as I mentioned in my first video that I'll be sharing anything and everything about the air science which includes connecting you with specialists in the field so I'll be rolling out a few video podcast with CTU senior data scientists and founders of data backed startups now the agenda of these podcasts will be to discuss the role of data science in different domains and fields and to discuss what are the challenges that we are trying to solve and how is data science being incorporated at different organizations so for the first video podcast I have my friend and mentor shota Potter who is the co-founder and CEO of Lisa later so Lisa data is a biotech company that enables and supplement scientists at one of the world's best biomedical labs with their AI enabled cloud platform called polling so we are on a video call to discuss how data science is helping us to solve problems and healthcare and we're gonna learn about the data science framework that illusionator has designed to enable world's best pharmaceutical labs and medical institutions and then later on we will learn from setup himself how his career has panned out to become the CTO of ELISA data so let's hear it from Saddam himself so hello shut up thanks for taking out - your time today and I'm really excited to have you to talk about rise of you know applied data Sciences and the AI to tackle the challenges and the healthcare and especially how easy data is advancing in this direction so to start off with and you know before we discuss all of those questions I'd like to ask you this very fundamental question as there are a lot of notions around it so how do you unpack the term data science you know it means I think very different things for different people I think for us you know I now work it essentially means going from semi-structured information to insights right that's that's very broadly you know semi structured data insights that whole journey is what we call data science and are you know again what I do it can be variably used with by informatica when you know in their differences as well but I think Taylor science is a more agnostic or domain agnostic term which I think probably the presence doing whatever you do to go from data to some kind of actionable insights step that you can take yeah that's that's how I would describe it okay so well I think of you know a data scientist along somewhere along the same lines someone who is you know coding literate and is data savvy and you know loves problem-solving and is curious about what's underneath the data set so now I know that you have you know a lot of hats to wear and being the co-founder and CEO of a tech company and you have a packed schedule almost every day but still I mean I mean what are your responsibilities as the CEO of the organization and how would you describe a regular day Attalus leader as a data scientist I I don't have the regular day over data scientist I think I deliver later but it's like in my days I think a lot of time is spent working with the teams that are doing a lot of this right I mean of course I think as an organization we have two two sides said what is engineering in productivity the other is four so data science Ian well you know you're trying to get insights from data so I think my my work you know to a large extent involves making sure that we are on the right track here delivering diamond again and you know most rikki-tik we are making the right decisions about the direction that we want to move that's that's Miley but again I think that looks very different for somebody who is doing redesign sort of day in day out but I think my sense would be that you know for a company like our setting day they of course spend a lot of time creating models you know making sure the data is prepared to generate the insides or then finally sort of communicating it is I think a big part of what we do when there is substantial time spent in sort of communicating what we get from the data so I think I guess what this what the team does I think for me it's sort of you know further away from what goes on day to day but this is ensuring that as a data science company as an analytics company that we are sort of creating the right tech product as well as you know we solving the right problem solving that way trying to make sure that that's happening in the BI delivering time and again and if we are you know generating IP in that process that we are generating stuff that we can claim at the company is important but of course you know the most part is to deliver to customers and make sure that they are getting value or what we do it must feel you know fulfilling given the impact that you guys making and the kind of challenging problems that elucidate is solving we are all witnessing the rise of Peter and along with it its complexity in healthcare which also reasons you know why we have data science and a increasingly applied within the field you know in the form of machine learning which is being used in almost all clinical settings then we have imaging analytics for disease diagnosis and AI which is running strong in radiology so how how is data science or you know artificial intelligence making fundamental advances in healthcare and then how is you see data incorporating data science to democratize health care so you're doing you know I think I can talk about you know what we'll be doing nice sizes and that will and early stage license where I think I think data size means a lot of very different things you know one of which i think is it's machine learning but but then there are others and it's a lot of work but we do I think still our way statistical in nature way traditional modeling you know differential equations in steaks and so on and that that has a lot to do with the nature of the data that we would you know and machine learning is you know usually seem to be more sort of factual when data is repeatable and and you know as well as the final outcomes huh far as machine learning anything helps automate a lot of the you know way we have used it is to automate a lot of manual tasks right things that have to be done again and again for example like you know curation of custom kind of data can you automate that curation right whether it comes to images over the actual data or semi-structured either so that's kind of how we have used machine learning a lot at this point as well as I think we have used it to pick out patterns where you know where it's not up here so you know from multiple kinds of data how do you pattern but then a lot of lot else of what we do is a very traditional statistical analysis and data science analysis right there we are creating models day in and day out to take data from particular experiments and try to make sense of that so that'sthat's what the bulk of what we do looks like and and we are incorporating that is I think is is asking the question that if you know more scientists play with work with the kind of data that we work with still work with a couple of features and you know they it cannot make use in real time of all of the features and all of the data that they're generating so how do you solve that how do you how do you guide decision making in drug discovery by looking at the whole system rather than just a small part so we try and scale that that's very what we do most of the time there's a big pattering of what we do is so thinking about how do you look at different data sets and merge them together to make sense of you know a system which is again a weight you know a cool sort of thing when you look at especially when you try to model phenomena which are happening at the subcellular level he's a very very small molecules and they're a 10 to power minus 19 10 20 meter site so it's very small entity that you're trying to model and that that way is so bad so you know what we're trying to do is scale up scale that up you know go from looking at one or two features look at you know hundreds of features at the same time and how do you do that over and over again in a you know reproducible manner right this way I think a lot of the text that you know we do also comes in and for us I think these two are inseparable as well where you know a lot of what we do is scale what what is being done on a couple of features so which is you know data engineering and then of course that is product design when you talk about how do you take how do you share data how do you work collaboratively the beauty so I guess that's what it means well yeah I'm glad you mention about drug discovery and and the the entire workflow that you have then and there's a lot of work going on in drug development these days so for the times that we are living in when the whole world is trying to get a new vaccine out your novel coronavirus if you can tell us what does this process look like or you know how does a drug come to market and also which part of this phase of the entire drug development process does elucidate as work so an impact on um you know pertinent pressure for these times but probably your drug discovery process is split into two parts like the preclinical phase in the clinical phase and the broad difference is that at the clinical stage you're trying a drug or a molecule or therapy on in the clinic or on like you know on real human beings in a preclinical phase you're trying to come up with which is in some sense I'm opening the problem but that's the part that we are most involved it so we do a lot of different parts of the preclinical drug discovery process again that itself can be broken down into further stages but I would go into that for brevity and then your drug discovery looks like you sort of have a hypothesis you have a system you have a way to attack it which you think is is interesting and you want to work on that and you start from there and it almost sauna which takes about four or five years for a drug to get to clinical trials and then about the five to six years to to get approved as a drug so in total the drug typically takes about ten to twelve years depending upon what the drug is on average to down at the market that's not quite same as you know the vaccines that we are talking about four novels which you know that's a more well understood from and Addison said it's you know it typically takes about a couple of years to get out or get a vaccine out or something you know forever is like a wire so it's an interesting time to be working in drugs we ourselves working with you know some approaches to try out existing drugs on corona and and you know I think that still so the best shot that the world has to get out of this before a vaccine comes out which typically takes tumble and so we have for example working with trying to see if if you know existing drugs could be effective against porno wise and again that's a problem that a lot of people are really trying at this one yes that's that's a split of the drug discovery stage B clinical and clinical we work at a preclinical stage you know in the snapshot expert n2l us takes more than a billion dollars usually and some of the few processes that over time has become more lengthier and more expensive if you think of making a new car or making a new spaceship that costs less and is faster today but finding a drug is becoming costlier and taking time in the last 20 30 years but in Tunisia to be honest I mean having this conversation makes me realize how important it is to demystify these processes and considering the fact that you know who asked amount of investment is being made in drug development so we have learned that lucida enables a lot of pharmaceutical and academic labs across the world so I'd like to go a little deeper here you know I read about this framework on his elucidate has website that enables research scientists to process and interpret their lab generated data so if you could throw some light on what this data science framework is all about we call it epic framework is that yeah yeah you got little bit framework with which we think is you know the life story of data when it comes to our domain and that's kind of how we see our product and what we do is that it starts with ingestion which is what you know kind of e stands for and one of the frequent problems in in this kind of data which is very different from business analytics or you know GIS data is that data is often not ready to to just be analyzed so that's why a nation the way a key part of the problem is not just about you know it is how much data are you ingesting we can be largest but but also that the data of some needs to be to go through a process of structuring before it can be analysis ready so that's injection that's e then B is from processing but you know all of us understand you know which means essentially you go through you know create large models and and run them and and again the problem in everything that we deal with is that they're often the data is not high velocity but Saibal you and you know high volume data which requires a lot of modeling power to be able to run so I mean you know some of it looks like sort of machine learning kind of GPU level modeling but a lot of it is you know it's pure circle modeling done on very high RAM and high spec machines and that's the processing stage here I mean up there of course other problems apart from hardware processing stage as well where a lot of scientists that we work with trying to streamline processes and know they can often be very slow it can take days for for a run of data and how do you how do you bring that down to something that takes minutes then you go to you know I which we call interpretation and and that's where you know it's part of human exercise part as well as something that can be aided by machines or data and that's why we what we try to do is we try to put the data that you have generated in the context of data that is available in the world which in very acting is important for all kinds of data analysis but especially for scientific analysis is really important because whatever you have learned from your particular experiment always has to be understood in the context of all that is known before that in otherwise it's it's you know kind of meaningless so that's that's interpretation and of course the prediction has other parts by visualizing data are you trying to articulated those and then collaboration which is where I think you're working with your team or you know your collaborators and science is often very very collaborative so a lot of the work we do are you know literally between consortiums which are spread across you know ten different countries and 15 different organizations and they are trying to together come up with for example a particular molecule or just so on you know making a class of proteins as possible drug targets so it's very collaborative right if you have 15 different organizations working on a problem cross it across geographies and collaboration again becomes a really important exercise and of course data that we deal with there are legitimate concerns around security and so on so that's the framework that we divided you know it's about an epoch that's how we think of a lot of lot of problems of data starting and finally ending up with something that can be used meeting it with these four stages pretty comprehensive framework and I am sure the research the scientists who are using it must be having making it a it mostly make it easier for them so and I'm sure the viewers will be shutting down all these points so now shifting the focus from Lucy data to your career trajectory so you know from mathematics and computing engineer to being the CEO of an organization a biotech organization and then augment a labs at big pharma companies and you know medical institutions what what was this journey like I mean okay you know pull the definition of decisions queries upon yeah talking about it and I think a lot of people make really I don't think I had to Frank but you know I you know I start the first job that I had after college was this consultant and yeah I work for organizational spread across a few different ography building based on database data AV business consulting and I did that for very short time because I you know didn't quite enjoy it as much and then I left to work for a nonprofit organization working in distance education again oh you know interestingly a theme which is very very critical to the times that we live in an interesting will of my previous organization stream nonprofit that I was part of just got mentioned by Michael Dell who is the founder foundation it's pretty cool work and they have you know it's called Avanti they have some cool one so I did that so it was really very different sort of you know problem way different things that we were trying to do and after that I shifted to sort of working this particular yeah so yeah you know it's been a very journey I would say that it's been you know had a very clear direction of where exactly I wanted to end up and have been or I've been following that but more than that I think it's it's been around doing what I enjoy doing doing something that I think can be impactful and just following that and not you know not looking too far ahead as well things can be very hard to predict very long term career tragic reason anything yeah it's just important to do things if you like and do them the best you can so that was my um really how I look at my career journey and I think you know it's reasonable so I personally draw a lot of inspiration from your journey and you know how you carry yourself each day aspiring to learn so and help others around you so we'll the next question that I have is related to something that is commonly misunderstood or not properly explained in the data science community so how much do you think does your mathematical aptitude contributes to your work in data science and this is a very common question among all aspiring vieira scientists or data professionals I think I think personally for me because I my undergrad and and was also in that and that being a choice I you know one of those things I enjoyed math and I just did it without I finally enjoyed that more than the computer science part of what I study so for me I think it's it so you know it's an important part it's something that I enjoy I enjoy reading problem or model and how it was built and you know what is the math that went behind it so I think for me that you know that way is important recently I've been reading about you know some models running natural language processing and again I enjoy the mathematical aspects of it so I do read it but I think I think you know when you talk about data science it the whole spectrum I think that people who are mathematically inclined people who are a lot more inclined towards you know data engineering people who are sort of inclined with understanding the tope mean and making sense of the data domain that you're working right that big discovery will that be financed so I think I think it you know really varies and then depending on what you enjoy and what you like I think you choose different directions or the shrimp you know you put different emphases on different parts again I think some level of understanding of math and being just sure about it is first important because you know you're often interpreting results and it won't really add up if you don't understand the fundamental math behind for example what a p-value means but I think you can get through to its merits you know it's not the end of the world and so I think it really depends I think you know it takes a whole village it takes T it takes a whole team with different kinds of information different people together something very compelling so I think you know I would you know again the fundamental level of math is really really important and needs to be developed that level of understanding but I think beyond that you can you know it can take you oh yeah yeah and what not way too much about how really especially if you're applying model setting really do you need to put the level of fundamental math Thanks I think that happens day to day but again that that might depend upon the problems that we were solving okay yeah well I know that elusive ADA keeps hiring so when you are interviewing what qualities do you look for in a candidate appearing for the position of a data scientist you know Aunt Lucy later given us a spectrum we look for a bunch of things you know we have to look for some fundamental level of understanding of you know mathematical concepts we do check way exhaustively for scientific fundamentals I would call it I mean not even math but how do you interpret data how to know which one you know which condition you using which ones you should not use it's like those things are important we supposed to get you know some basic pudding skills you don't have to be rock star but you need to go a little bit we also I think as an organization also look at you know your ability to consume example publications and that's this is very important in our work and I assume that if you're working for financial institution developer you know and things but for our publications and so forth are really important so we you know I think they have around with everyone where you know those things are also checked for also activated so so it's a spectrum I mean you know different people I different places some people are strong with it you know being able to code out stuff and build packages or together web applications and so on but yeah it's so pretty diverse set of course communication I think is the medication is really important for anybody I do you know do you need to communicate but here exactly that's the viewers will be making a note of all these focal points you just shared and even if someone wants to you know try and apply for the position of their science data analyst at Lycia later they must be yeah they must have a lot to take from this conversation so next up we have the rapid-fire round where I will keep firing questions your way and we'll have like you know four or five seconds to respond to the question so just keep it brief short and crisp alright so let's start so the first question is our or Python maybe our because you know in bioinformatics is local way to use the bite given a chance to change the field of your work which another field would you want to work in interesting I haven't thought about that I think I think given the time cept I we live in I think public health would be definitely an interesting area okay but I think I would enjoy for example like automated calls okay next up a recent book that you have read that you have enjoyed reading recent book that I've read I have read quite a few of late I really enjoyed open by Andre Agassi I think that's in any sort of biography so if you notice quotes or if you're in tennis so it's sort of biography and legacy of a character I should say and I really enjoy that okay what sources or platforms do you use to stay in form of the new technologies or libraries so I subscribe to a bunch of like email things which you know just land up in my inbox and you follow them I think you know bioconductor has as meaningless but I think is interesting i I think news at Y Combinator is very interesting to get a sense of oh gentle like things and I mean there's just lot else I think a lot of it also is driven by or people in my team recommend paper that they would sell so yeah of course I also subscribe to meaningless by nature and science so given a chance to hang out with one famous personality with whom would you like to spend an evening I mean I think given the times I would set out on Tony ouchy I think I would love to spend you know anything with him I think he's you know he's doing incredible work and the way he has been able to do it is amazing I think you know I guess more I learn about him when I read a New Yorker file on them and I would you know suggest anybody wanting to eat to check that out I sweat amazing characters you know top scientist for a long long time and also in very active in public and that's very very hard intersection right to be a scientist of the caliber that is and also participate in public elders so how many hours do you usually work in a day I don't know 10 12 I'm not cutting like episode but and then how do you decompress yourself I run so you know usually when it's allowed to run outside but otherwise I work out but I enjoy running so that was a wrap to our rapid-fire round and I have one last request to you so if you could if you have some final message or comment for all this firing data professionals I mean I think you know really take a problem which interests you I think it's not just about the tools in DNS I'm setting the best legal scientist that I have at least come across apply those tools to a problem but a daily user feel connected to that you know so I would you know - racing would be to really apply it to a problem area that that you would enjoy in you know do you go to work with okay well thank you so much setup thanks again for taking out the time and you know sharing your deep experience in the field and demystifying these conceptions as well as misconceptions around data science and I'm sure the viewers have a lot to take away from this editing conversation that we've had so thank you so much so now I you'll see expose of different data-driven organizations and if you have any suggestions towards questions that you might want to ask these luminaries and experts in their fields so feel free to comment down below please give this video a thumbs up and subscribe to the channel for more interesting data science content I will be raising a video every week so yeah until my next video keep learning data science with herself
Original Description
In this interview, we talk about the rise of applied data science and AI to tackle the challenges in drug discovery and how Elucidata is advancing in this direction.
Elucidata - http://elucidata.io/
Learning Resources from Elucidata - https://elucidata.io/resources/?category=publications
You can also follow me on:
Twitter where I share tips & tricks and what I find intriguing: https://twitter.com/tyagi_harshit24
Medium where I write: https://medium.com/@harshit_tyagi
LinkedIn: https://www.linkedin.com/in/tyagiharshit/
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Harshit Tyagi · Harshit Tyagi · 15 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
▶
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Your PATH to learning Data Science
Harshit Tyagi
Ideal Python environment setup for Data Science projects - Unix shell, Anaconda and Git.
Harshit Tyagi
Building COVID-19 interactive dashboard from Jupyter Notebook | No frontend/backend coding required.
Harshit Tyagi
Introduction to Jupyter Notebooks - Interface | Ipython Kernel | Sharing | GitHub
Harshit Tyagi
Python fundamentals for Data Science - Part 1 | Data types | Strings | Lists
Harshit Tyagi
Python fundamentals for Data Science - Part 2 Dictionaries | Conditionals | Loops | Functions
Harshit Tyagi
Python fundamentals for Data Science - Part 3 OOPS | Working with External Libraries & Modules
Harshit Tyagi
NumPy Essentials for Data Science - part-1 | One Dimensional Array
Harshit Tyagi
NumPy Essentials for Data Science - part-2 | Multi-Dimensional Array
Harshit Tyagi
Math For Data Science | Practical reasons to learn math for Machine/Deep Learning
Harshit Tyagi
Linear Algebra Ep 1 | Introduction to Vectors, Matrices and Tensors using NumPy
Harshit Tyagi
Linear Algebra Ep 2 | Dot Product in Linear Algebra for Data Science
Harshit Tyagi
Python vs R | The BEST programming language for your Data Science Project
Harshit Tyagi
Linear Algebra for Data Science Ep3 | Identity and Inverse Matrices | NumPy
Harshit Tyagi
The Data Show Ep1 | Elucidating Data Science in Drug Discovery - A CTO's Account
Harshit Tyagi
Google Certified TensorFlow Developer | Learning Plan, Tips, FAQs & my Journey
Harshit Tyagi
Speeding up your Data Analysis | Hacks & Libraries
Harshit Tyagi
How to build an Effective Data Science Portfolio
Harshit Tyagi
End-to-End Machine Learning Project Tutorial - Part 1
Harshit Tyagi
Data Preparation with Sci-kit learn and Pandas | End-to-End ML Project Tutorial - Part 2
Harshit Tyagi
Training and Fine-Tuning ML Models with Sklearn | End-to-End ML Project Tutorial - Part 3
Harshit Tyagi
Deploying a Trained ML model via Flask on Heroku | End-to-End ML Project Tutorial - Part 4
Harshit Tyagi
Three Decades of Practising Data Science | Interview with Dean Abbott
Harshit Tyagi
Calculating Vector Norms - Linear Algebra for Data Science - IV
Harshit Tyagi
Ep1 - Getting Started | Zero to Hero in Computer Vision with TensorFlow
Harshit Tyagi
Ep3 - Designing Data Experiments to enhance your Product | Rapido's Data Science Lead, Pramod N
Harshit Tyagi
Building projects with fastai - From Model Training to Deployment
Harshit Tyagi
October AI - Video Calling with One-Tenth of Internet Bandwidth
Harshit Tyagi
November AI - Breakthrough in biology after 50 years | Datasets, books, research papers and more...
Harshit Tyagi
Data Science learning roadmap for 2021
Harshit Tyagi
Talk is cheap, BUILD - Microsoft Software Engineer | Interview with Abhirath Batra
Harshit Tyagi
Building a Habit of Reading Research Papers | Ft. Anurag Ghosh(Microsoft Researcher)
Harshit Tyagi
Tableau vs Python - Building a COVID tracker dashboard
Harshit Tyagi
[Explained] What is MLOps | Getting started with ML Engineering
Harshit Tyagi
Dmitry Petrov - Creator of DVC | ML Systems, Teams, Scaling challenges, and Learning Data Science
Harshit Tyagi
Five hard truths about building a career in Data Science
Harshit Tyagi
Computing gradients using TensorFlow | Training a Linear Regression model from scratch.
Harshit Tyagi
Foundations for Data Science & ML - First steps for every beginner!
Harshit Tyagi
Course Outline - Foundations for Data Science & ML
Harshit Tyagi
How Machine Learning uses Linear Algebra to solve data problems
Harshit Tyagi
Calculus for ML - How much you should know to get started
Harshit Tyagi
Building a buzzing stocks news feed using NLP and Streamlit | Named Entity Recognition & Linking
Harshit Tyagi
AI Engineer - The next big tech role!
Harshit Tyagi
AI researcher vs AI engineer | The next big tech role!
Harshit Tyagi
Reviewing LLMs for content creation
Harshit Tyagi
Building a chatGPT-like bot on WhatsApp #coding #chatgpt #engineering
Harshit Tyagi
High Signal AI - the most action-oriented newsletter on the web! #ai
Harshit Tyagi
Building an AI-powered Discord Chatbot Locally for FREE using Ollama
Harshit Tyagi
Build a second brain with Khoj 🧠 #ai #obsidian #plugins #productivity #engineering #notes
Harshit Tyagi
Summarising YouTube Videos using Ollama on Discord | Becoming an AI Engineer - Ep 2
Harshit Tyagi
Watch the full video on my channel - Roadmap to become an AI Engineer.
Harshit Tyagi
Mesop - Python-based UI framework from Google!
Harshit Tyagi
How I automated my YouTube | Gumloop tutorial | No Code
Harshit Tyagi
ARC PRIZE - Win $1Million to Beat the ARC-AGI benchmark
Harshit Tyagi
Microsoft's Autogen vs CrewAI - tested on a diverse range of use cases
Harshit Tyagi
Claude #AI artifacts are just amazing!
Harshit Tyagi
OpenAI releases CriticGPT to correct GPT-4's mistakes | Read the paper with me
Harshit Tyagi
Day in my life | Vlog #1
Harshit Tyagi
How to add AI Copilot to your application using CopilotKit | Tutorial
Harshit Tyagi
Quick Questions with an AI Founder - Anudeep Yegireddi
Harshit Tyagi
More on: ML Maths Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
AI Security Isn't a Product. It's an Engineering Discipline.
Dev.to AI
Why Solving Legal AI's Context Problem Is Harder Than You Think
Forbes Innovation
How Can We Truly Protect Information Privacy in the Age of Artificial Intelligence?
Medium · Machine Learning
The AI Validation Gap: The $2.5 Trillion Blind Spot In Enterprise AI
Forbes Innovation
🎓
Tutor Explanation
DeepCamp AI