Genetic Algorithms - Learn Python for Data Science #6
Key Takeaways
The video demonstrates the use of Genetic Algorithms and the Teapot library in Python to build a Gamma Radiation Classifier, automating the process of selecting the best Machine Learning model and hyperparameters. The Teapot library uses genetic programming to optimize the Machine Learning pipeline, and the video shows how to use it to train a model and predict the classification of gamma radiation.
Full Transcript
hello world it's SJ in this video we're going to use genetic programming to identify if some energy is gamma radiation or not I'm getting angry gamma raay now I wish data science is a way of thinking about Discovery a data scientist needs to decide the right question to ask like who's the best candidate to vote for in the US election then decide what data set to use like tweet history of candidates and pass endorsements of each cand it and lastly decide what machine learning model to use on the data to discover the right answer life goes on with the right data computing power and machine learning model you can discover a solution to any problem but knowing which model to use can be challenging for new data scientists there are so many of them that's where genetic programming can help genetic algorithms are inspired by the darwinian process of natural selection and they're used to generate solutions to optimization and search problems they have three properties selection crossover and mutation you have a population of possible solutions to a given problem and a fitness function every iteration we evaluate how fit each solution is with our fitness function then we select the fittest ones and perform crossover to create a new population we take those children and mutate them with some random modification and repeat the process until we get the fittest or best solution so take this problem for instance instance let's say you want to take a road trip across a bunch of cities what's the shortest possible path you could take to hit up each City once and then return back to your home City this is popularly called the traveling salesman problem in computer science and we can use a genetic algorithm to help us solve it let's look at some highlevel python code we have the number of Generations set to 5,000 and the population size set to 100 so we start by initializing our population using our size parameter each individual in our population represents a different solution path then for each generation we compute the fitness of each solution and store it in our population Fitness array now we'll perform selection by only taking the top 10% of the population which are our shortest road trips and produce Offspring from them by performing crossover then mutate those Offspring randomly and repeat the process as you can see in the animation eventually we will get an optimal solution using this process unlike Apple Maps all right so how does this all fit into data science well it turns out that choosing the right machine learning model and all the best hyperparameters for that model is itself an optimization problem we're going to use a python Library called teapot built on top of pyit learn that uses genetic programming to optimize our machine learning pipeline so after formatting our data properly we need to know what features to input to our model and how we should construct those features once we have those features we'll input them into our model to train on and we'll want to tune our hyper parameters or tuning knobs to get the optimal results instead of doing this all ourselves through trial and error teapot automates these steps for us with genetic programming and it'll output the optimal code for us when it's done so we can use it later so we're going to create a classifier for gamma radiation using teapot after installing our dependencies and then analyze the results teapot is built on the popular pyit learn machine learning library so we'll want to make sure that we have that installed first then we'll install pandas to help us analyze our data and numpy to perform math calculations our first step is to load our data set we use Panda's read CSV method and set the parameter to the name of our saved CSV file this is data collected from a scientific instrument called a cherenov telescope that measures radiation in the atmosphere and these are a bunch of features of whatever type of radiation it picks up thanks Putin since the class object is already organized we'll Shuffle our data to get a better result the iock function of the telescope variable is Panda's way of getting the position in the index and we'll generate a sequence of random indices the size of our data using the permutation function of numpy's random submodule since all the instances are now randomly rearranged we'll just reset all the indices so they are ordered even though the data is now shuffled using the reset index method of pandas with the drop parameter set to True we'll now let our Tel variable know what our two classes are by mapping both of them to an integer with the map method so G or gamma is set to zero and H or hadrin is set to one let's store those class labels which we're going to predict in a separate variable called Tel class and use the values attribute to retrieve them before we train our model we need to split our data into training and validation sets we use the train test split method of pyit learn that we imported to create the indices for both the parameters will be the size of our data set we want both sets to be arrays so we'll set the stratify parameter to our array type then we'll Define what percent of our data we want to be training and testing with these last two parameters we have a 7525 split now in our data and we're ready to train our model we'll initialize the teapot variable using the teapot class with the number of Generations set to five on a standard laptop with four gigs of RAM it takes 5 minutes per generation to run so this will take about 25 minutes this is so Toot's genetic algorithm knows how many iterations to run for and we set verbosity to two which just means show a progress bar in terminal during the optimization process then we can call our fit method on our training data to let it perform optimization using genetic programming the first parameter is the training feature set which will retrieve from our Tel variable along the first access for every training index the second variable is our training class set which will retrieve from our Tel variable like so we can compute the testing error for validation using toot score method with validation feature set as the first parameter and the validation class set as the second we'll export the computed python code to the pipeline. py class using this method and name it in the parameter as a string let's demo this thing after training we'll see that after five generations toot chose the gradient boosting classifier as the most accurate machine learning model to use it also chose the optimal hyperparameters like the learning rate and number of estimators for us yeah boy so to break it down with the right amount of data computing power and machine learning model you can discover a solution to any problem genetic algorithms replicate Evolution via selection crossover and mutation toine an optimal solution to a problem and teapot is a python library that uses genetic programming to help you find the best model and hyperparameters for your use case the winner of the coding Challenge from the last video is Peter Metrano he added some great deepdream samples to his repository and even deep dreamed my own video badass of the week and the runner up is Kyle Jordan good job stitching all the Deep dreamed frames together with one line of code the challenge for this video is to use teapot and a climate change data set that I'll provide to predict the answer to a question you decide this will be great practice in learning to think like a data scientist post your GitHub Link in the comments and I'll announce the winner next time for now I've got to stay fit to reproduce so thanks for watching
Original Description
In this video, we build a Gamma Radiation Classifier and use Genetic Programming to pick the best Machine Learning model + hyper-parameters FOR US in 40 lines of Python.
Challenge for this video:
https://github.com/llSourcell/genetic_algorithm_challenge
Peter's winning code:
https://github.com/PeterMitrano/deep_dream_challenge
Kyle's Runner up code:
https://github.com/ljlabs/deep_dream_challenge/blob/master/Dream_in_video.py
Great chapter on Genetic Algorithms:
http://natureofcode.com/book/chapter-9-the-evolution-of-code/
Link to TPOT:
https://github.com/rhiever/tpot
Join the Wizards Slack Channel:
https://wizards.herokuapp.com/
Please like + subscribe + comment!
Please support me on Patreon!:
https://www.patreon.com/user?u=3191693
Follow me:
Twitter: https://twitter.com/sirajraval
Facebook: https://www.facebook.com/sirajology Instagram: https://www.instagram.com/sirajraval/ Instagram: https://www.instagram.com/sirajraval/
Signup for my newsletter for exciting updates in the field of AI:
https://goo.gl/FZzJ5w
Hit the Join button above to sign up to become a member of my channel for access to exclusive content! Join my AI community: http://chatgptschool.io/ Sign up for my AI Sports betting Bot, WagerGPT! (500 spots available): https://www.wagergpt.xyz
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Siraj Raval · Siraj Raval · 47 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
▶
48
49
50
51
52
53
54
55
56
57
58
59
60
What is Bitcoin?
Siraj Raval
5 Ways to Use Bitcoin
Siraj Raval
BTC Fever - Siraj [Music Video]
Siraj Raval
5 Reasons to Build Decentralized Apps
Siraj Raval
The Interplanetary File System
Siraj Raval
How to Build a Dapp in 3 min
Siraj Raval
Life Before Smartphones
Siraj Raval
4 Ways to Use Smart Contracts
Siraj Raval
3 Dapps You HAVE to See
Siraj Raval
Char's Life as a BitTorrent Engineer
Siraj Raval
4 Reasons AlphaGo is a Huge Deal
Siraj Raval
Build a Neural Net in 4 Minutes
Siraj Raval
Sentiment Analysis in 4 Minutes
Siraj Raval
The Hackathon Life
Siraj Raval
Your First ML App - Machine Learning for Hackers #1
Siraj Raval
Build an AI Composer - Machine Learning for Hackers #2
Siraj Raval
Build a Game AI - Machine Learning for Hackers #3
Siraj Raval
Build a Movie Recommender - Machine Learning for Hackers #4
Siraj Raval
Build an AI Artist - Machine Learning for Hackers #5
Siraj Raval
Build a Chatbot - ML for Hackers #6
Siraj Raval
Build an AI Reader - Machine Learning for Hackers #7
Siraj Raval
Build an AI Writer - Machine Learning for Hackers #8
Siraj Raval
Build a Chatbot w/ an API - ML for Hackers #9
Siraj Raval
One-Shot Learning - Fresh Machine Learning #1
Siraj Raval
Generative Adversarial Nets - Fresh Machine Learning #2
Siraj Raval
Tone Analysis - Fresh Machine Learning #3
Siraj Raval
Generate Rap Lyrics - Fresh Machine Learning #4
Siraj Raval
Build an Autoencoder in 5 Min - Fresh Machine Learning #5
Siraj Raval
Build a Self Driving Car in 5 Min - Fresh Machine Learning #6
Siraj Raval
Build an Antivirus in 5 Min - Fresh Machine Learning #7
Siraj Raval
TensorFlow in 5 Minutes (tutorial)
Siraj Raval
Build a Recurrent Neural Net in 5 Min
Siraj Raval
Build a Simulation in 5 Min
Siraj Raval
Build a TensorFlow Image Classifier in 5 Min
Siraj Raval
Tensorboard Explained in 5 Min
Siraj Raval
Generate Music in TensorFlow
Siraj Raval
Build a Game Bot (LIVE)
Siraj Raval
Deep Learning Frameworks Compared
Siraj Raval
Introduction - Learn Python for Data Science #1
Siraj Raval
Build a Neural Network (LIVE)
Siraj Raval
Twitter Sentiment Analysis - Learn Python for Data Science #2
Siraj Raval
Recommendation Systems - Learn Python for Data Science #3
Siraj Raval
Predicting Stock Prices - Learn Python for Data Science #4
Siraj Raval
Pong Neural Network (LIVE)
Siraj Raval
Deep Dream in TensorFlow - Learn Python for Data Science #5
Siraj Raval
Visualizing Data with D3.js (LIVE)
Siraj Raval
Genetic Algorithms - Learn Python for Data Science #6
Siraj Raval
Enter Siraj [Music Video]
Siraj Raval
Build a Web Scraper (LIVE)
Siraj Raval
Why is P vs NP Important?
Siraj Raval
How to Make a Neural Network (LIVE)
Siraj Raval
How to Make an Amazing Tensorflow Chatbot Easily
Siraj Raval
How to Make an Amazing Video Game Bot Easily
Siraj Raval
How to Make a Tensorflow Neural Network (LIVE)
Siraj Raval
How to Make a Simple Tensorflow Speech Recognizer
Siraj Raval
Joel Shor - Really Quick Questions with an Awesome Google Engineer
Siraj Raval
How to Make a Path Planning Algorithm Easily (LIVE)
Siraj Raval
The Best Way to Prepare a Dataset Easily
Siraj Raval
Catherine Olsson - Really Quick Questions with an OpenAI Engineer
Siraj Raval
How to Make a Tic Tac Toe Neural Network Easily (LIVE)
Siraj Raval
More on: LLM Engineering
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Embeddings Simplified
Medium · RAG
Building LSTMs with PyTorch and Lightning AI Part 7: Resuming Training with Checkpoints
Dev.to · Rijul Rajesh
How AI Learns with Less Labeled Data
Medium · AI
Comparing Sarvam-30B and Qwen2.5–14B on Spider Text-to-SQL: An Active-Parameter Perspective
Medium · LLM
🎓
Tutor Explanation
DeepCamp AI