Train your own language model with nanoGPT | Let’s build a songwriter

Sophia Yang · Intermediate ·🧠 Large Language Models ·3y ago

Key Takeaways

Sophia Yang demonstrates how to train a language model using nanoGPT to build a songwriter, fine-tuning a GPT model on a GPT2 brain trained model, and using a GPU or MPS for training. She utilizes tools like PyTorch, torch, and pandas for data manipulation and model training, and explores retrieval augmented generation and prompt engineering concepts.

Full Transcript

in this video we're going to take a look at Nano GPT and how we can use our own data set to build a songwriter using nanogbt the reason why I'm looking at Nano GPT and doing this video is because this morning I was watching this legendary video by Andre caparthy let's build GPT from scratch in code spelled out it is such amazing video he basically built a entire GPT model in couple hundred lines of code from scratch which is really impressive so then he organized all his code into this repository called Nano gbt so today I'd like to give it a try since I just watched the video I'm so hyped okay let's take a look at this repo together so this is the Repository you need to install some dependencies first okay so there are several examples um we're gonna go through them together the first example is to build a GPD model works on Shakespeare text the first step is to prepare the Shakespeare data into validation and training into a data directory the second step is to train the model and the third step is to generate text with this simple.pi file so you can see this example of the generated text the author mentioned that you might need to use the GPU for this like we if we don't have a GPU if we have a MacBook we can write a simpler version of this model and if you have a apple M1 MacBook you can do device MPS the next section of this readme is talking about how to reproduce gpd2 so it looks like it's similar to the first example the only thing different is with this one we're using a much richer data set the open web text you will need a lot of computer power and while four days to run this model so you need four days uh We're not gonna do that in this video and then okay so there are some dead baselines for gbt2 models and then the next section is fine tuning we can fine-tune our Shakespeare writer based on a gpt2 brain trained model yeah so that's it there are a couple files I want to quickly go through there's a prepare.pi this is where we prepare our data you can see we download data from a txd file online process the data a little bit and separate our data into training and validation we have 90 into training and 10 into validation tokenizer data and then finally we save our data into train.bin and avail.bin trained up Pi is where we train the models it's interesting you can see all the parameters here we can overwrite if you take a look at this file there's no argument parser in this file directly the trick is in this line of code um the 76 and 77 I guess is in this configured rated up high let's take a look configurator dot pi right you can see all the arcs uh right here and the arcs can start with Dash Dash so this is interesting I've never seen this before so basically this file helps you to write argument parser so that you don't need to include all the arcs argument power series link text in your train.pi this is new to me I might want to use that for my own project later yeah so so that's that we also have sample.pi this is where we generate data sample from a train model where we generate text and again we have a bunch of parameters we can overwrite another interesting thing is when we train our model we're actually calling this file from the config folder and as you can see here there's nothing here by just the parameters so I think it's it will be the same if you do Python train.pi and then list all the parameters that exist in this file in the config folder I think this is just an interesting way to organize your different parameters your configurations okay let's uh let's start take a look at the code get clone this Repository now let's go to this Repository so in the first example we prepared the Shakespeare data first you can see we have one thousand one million token generated for the training data more than 100 000 tokens generated for the validation data and then we can train our model because I am running on my computer directly let's run the simplified version but before that we can change our device to MPS since I'm running on the Apple M1 computer if you are running on a normal laptop you can use CPU but that's really slow even MPS is really really slow so you can see the iterations going and uh we'll come back to that okay it's running pretty slowly as you can see it's pretty slow actually and I'm just gonna kill it and see what the result looks like right now so this is step three where we run sample.pi to generate text okay torch is not implemented because the default is using Cuda well we have to Define our device equals MPS or you can use device equal CPU if you don't have an M1 computer ah right it's like the result is not impressive I'm not surprised because we have only trained for 200 iterations if you train it a little more you will be significantly better so I have a GPU so I'm gonna run the same example in a GPU to show you what it looks like so again the first step yeah when I copy and paste I should remove the dollar sign the first step is to prepare the data gives me the same result as before and now we can run train this model here on a GPU yeah I can see like the loss values went down by a bunch it was 4.26 and then 2.31 it's pretty good let's just kill it okay so this is the a checkpoint of the model ah right we're going to copy and paste there it's a dollar sign there let's do torch float 16. cool let's do it again I am for belong when we have done sing with the way okay cool looks like it's working it's pretty cool in this third example I'm gonna use a external data set to train a songwriter let's first create a new folder called lyrics in the data folder and then I found this Spotify million sound data set I want to use let's just download this data set to the to the folder okay let's open up this CSV file we can see all the lyrics in the text column that we can use for training okay so now we have our data we want to create a similar prepared of Pi to get the data ready for training and meditation just the scenes format so I'm just gonna copy and paste this thing prepare dot pi cool just copy and paste this and we know this is a CSV file we only want this text column so we first need to import pandas by the way your environment you also need pandas so make sure you contact install pandas into your environment we do not need this anymore do not need to download the data sets since we have the data set in our local folder already we do not need requests or okay so basically we need a data file that's giving us all the text let's read the data frame first PD dot read CSV but I Mills so data dot CSV let's do theta equals DF text we want it to be string we want to concatenate all the strings let's separate them by I don't know a line break okay hopefully this would work import pandas as PD hmm that's a little weird why can this not find our data file that's very weird maybe we need to define the uh the path completely lyrics is it running I'm not sure it is it running yay that was easy yeah so as you can see if you have a new data file you can fit it into this prepare dot pie oh based on the original example pretty easily um again here I have a CSV file I just combine all the text column into your big text and then feed into the training data and the validation data get the tokens with this tick token API and we were able to create a trend open and welda Bin so if we take a look at data lyrics we can see we have trend.ben and Bell Dobbin okay so we got our data ready now we can start training okay so to try our data I kind of want to copy this file no okay Train lyrics dot pi okay let's just copy this one on directory I want to be about lyrics and lyrics [Music] okay anything else we want to change I think we're all good that's to Train lyrics okay let's let it run for a while we'll come back okay I stopped it at a 500 so iterations because it's drowning so slowly and then we can't run symbol.pi to generate our lyrics I started the lyrics with love to see what the generator will give us okay love will light away take me back give it give me it it doesn't really make sense but it's fine for only 500 iterations of training so so yeah so in so that's it for today's coding thank you for watching bye

Original Description

Real-time coding and exploring nanoGPT with me! See detailed model explanation in Andrej Karpathy's legendary video (best GPT explanation on the internet): https://www.youtube.com/watch?v=kCc8FmEb1nY 🌼 About me 🌼 Sophia Yang is a Senior Data Scientist working at a tech company. 🔔 SUBSCRIBE to my channel: https://www.youtube.com/c/SophiaYangDS?sub_confirmation=1 ⭐ Stay in touch ⭐ 📚 DS/ML Book Club: http://dsbookclub.github.io/ ▶ YouTube: https://youtube.com/SophiaYangDS ✍️ Medium: https://sophiamyang.medium.com 🐦 Twitter: https://twitter.com/sophiamyang 🤝 Linkedin: https://www.linkedin.com/in/sophiamyang/ 💚 #datascience
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Sophia Yang · Sophia Yang · 42 of 60

1 Customer lifetime value in a discrete-time contractual setting (math and Python implementation)
Customer lifetime value in a discrete-time contractual setting (math and Python implementation)
Sophia Yang
2 Time series analysis using Prophet in Python — Math explained
Time series analysis using Prophet in Python — Math explained
Sophia Yang
3 Multiclass logistic/softmax regression from scratch
Multiclass logistic/softmax regression from scratch
Sophia Yang
4 Deploy a Python Visualization Panel App to Google Cloud App Engine
Deploy a Python Visualization Panel App to Google Cloud App Engine
Sophia Yang
5 Deploy a Python Visualization Panel App to Google Cloud Run
Deploy a Python Visualization Panel App to Google Cloud Run
Sophia Yang
6 [Read a paper (with code)] Beyond Accuracy: Behavioral Testing of NLP models with CheckList
[Read a paper (with code)] Beyond Accuracy: Behavioral Testing of NLP models with CheckList
Sophia Yang
7 5-step data science workflow
5-step data science workflow
Sophia Yang
8 Multi-armed bandit algorithms - ETC Explore then Commit
Multi-armed bandit algorithms - ETC Explore then Commit
Sophia Yang
9 Multi-armed bandit algorithms - Epsilon greedy algorithm
Multi-armed bandit algorithms - Epsilon greedy algorithm
Sophia Yang
10 User retention analysis framework | data science product sense
User retention analysis framework | data science product sense
Sophia Yang
11 Visualization and Interactive Dashboard in Python: My favorite Python Viz tools — HoloViz
Visualization and Interactive Dashboard in Python: My favorite Python Viz tools — HoloViz
Sophia Yang
12 Multi-armed bandit algorithms: Thompson Sampling
Multi-armed bandit algorithms: Thompson Sampling
Sophia Yang
13 The Easiest Way to Create an Interactive Dashboard in Python
The Easiest Way to Create an Interactive Dashboard in Python
Sophia Yang
14 Big Data Visualization Using Datashader in Python | How does Datashader work and why is it so fast?
Big Data Visualization Using Datashader in Python | How does Datashader work and why is it so fast?
Sophia Yang
15 Why do you want to be a data scientist? Don't be a data scientist if ...
Why do you want to be a data scientist? Don't be a data scientist if ...
Sophia Yang
16 Johnny Depp v Amber Heard Twitter Sentiment Analysis | Is Camille Vasquez the real winner | 🤗 NLP
Johnny Depp v Amber Heard Twitter Sentiment Analysis | Is Camille Vasquez the real winner | 🤗 NLP
Sophia Yang
17 How to build a product that sells itself | Product-led Growth | Book Summary | Read a book with me
How to build a product that sells itself | Product-led Growth | Book Summary | Read a book with me
Sophia Yang
18 Designing Machine Learning Systems | book summary | Read a book with me
Designing Machine Learning Systems | book summary | Read a book with me
Sophia Yang
19 Where do data scientists/analysts go next? Love and hate in data analytics (ft. Shashank Kalanithi)
Where do data scientists/analysts go next? Love and hate in data analytics (ft. Shashank Kalanithi)
Sophia Yang
20 Meet the Author: Fundamentals of Data Engineering | DS/ML book club
Meet the Author: Fundamentals of Data Engineering | DS/ML book club
Sophia Yang
21 What's new in hvPlot releases 0.8.0 & 0.8.1?
What's new in hvPlot releases 0.8.0 & 0.8.1?
Sophia Yang
22 Meet the Author: Machine Learning Design Patterns | What do ML/Research Engineers do at Google?
Meet the Author: Machine Learning Design Patterns | What do ML/Research Engineers do at Google?
Sophia Yang
23 Machine Learning Design Patterns | Google Executive | Investor | Meet the Author
Machine Learning Design Patterns | Google Executive | Investor | Meet the Author
Sophia Yang
24 How to solve data quality issues | Data Reliability | Meet the Author
How to solve data quality issues | Data Reliability | Meet the Author
Sophia Yang
25 Reliable Machine Learning author interview | DS/ML book club
Reliable Machine Learning author interview | DS/ML book club
Sophia Yang
26 Toronto VLOG | First vlog | Meet my favorite author | Toronto ML Summit conference
Toronto VLOG | First vlog | Meet my favorite author | Toronto ML Summit conference
Sophia Yang
27 TOP 6 tech news in 2022 #shorts
TOP 6 tech news in 2022 #shorts
Sophia Yang
28 How to deploy a Panel app to Hugging Face using Docker?
How to deploy a Panel app to Hugging Face using Docker?
Sophia Yang
29 Tech news this week | ChatGPT, Hacks, Snowflake, CES #shorts
Tech news this week | ChatGPT, Hacks, Snowflake, CES #shorts
Sophia Yang
30 🗞️ Tech news this week: ChatGPT, DreamerV3, Muse, VALL-E, Mineral, DoNotPay, Tesla, SBF... #shorts
🗞️ Tech news this week: ChatGPT, DreamerV3, Muse, VALL-E, Mineral, DoNotPay, Tesla, SBF... #shorts
Sophia Yang
31 Tech news this week | Boston Dynamics, Microsoft, Snowflake, Google, and more #shorts
Tech news this week | Boston Dynamics, Microsoft, Snowflake, Google, and more #shorts
Sophia Yang
32 The story of Metaflow | Effective Data Science Infrastructure | Book author interview
The story of Metaflow | Effective Data Science Infrastructure | Book author interview
Sophia Yang
33 Tech news this week #shorts
Tech news this week #shorts
Sophia Yang
34 A day in life of a data scientist | Data Day Texas | Interview 12 authors/speakers
A day in life of a data scientist | Data Day Texas | Interview 12 authors/speakers
Sophia Yang
35 Tech news this week #shorts
Tech news this week #shorts
Sophia Yang
36 Explainable AI with Shapley Values (Part 1: Game Theory)
Explainable AI with Shapley Values (Part 1: Game Theory)
Sophia Yang
37 Explainable AI with Shapley Values (Part 2: Estimate Shapley Values)
Explainable AI with Shapley Values (Part 2: Estimate Shapley Values)
Sophia Yang
38 Explainable AI with Shapley Values (Part 3: KernelSHAP)
Explainable AI with Shapley Values (Part 3: KernelSHAP)
Sophia Yang
39 Tech news this week | AI search war between Microsoft and Google #shorts
Tech news this week | AI search war between Microsoft and Google #shorts
Sophia Yang
40 The Story of ChatGPT's creator OpenAI | From Riches to Fame
The Story of ChatGPT's creator OpenAI | From Riches to Fame
Sophia Yang
41 Explainable AI for Practitioners | Must-read for XAI | author interview
Explainable AI for Practitioners | Must-read for XAI | author interview
Sophia Yang
Train your own language model with nanoGPT | Let’s build a songwriter
Train your own language model with nanoGPT | Let’s build a songwriter
Sophia Yang
43 The easiest way to work with large language models | Learn LangChain in 10min
The easiest way to work with large language models | Learn LangChain in 10min
Sophia Yang
44 The BEST browser? AI article summary, image generation, website insights. Microsoft Edge Copilot!
The BEST browser? AI article summary, image generation, website insights. Microsoft Edge Copilot!
Sophia Yang
45 startup scene in data | insights from 50+ data startups from Data Council
startup scene in data | insights from 50+ data startups from Data Council
Sophia Yang
46 NLP with Transformers author interview with Lewis Tunstall from Hugging Face
NLP with Transformers author interview with Lewis Tunstall from Hugging Face
Sophia Yang
47 4 ways to do question answering in LangChain | chat with long PDF docs | BEST method
4 ways to do question answering in LangChain | chat with long PDF docs | BEST method
Sophia Yang
48 5 Steps to Build a Question Answering PDF Chatbot: LangChain + OpenAI + Panel + HuggingFace.
5 Steps to Build a Question Answering PDF Chatbot: LangChain + OpenAI + Panel + HuggingFace.
Sophia Yang
49 4 Autonomous AI Agents: “Westworld” simulation, Camel, BabyAGI, AutoGPT, Camel ⭐ LangChain ⭐
4 Autonomous AI Agents: “Westworld” simulation, Camel, BabyAGI, AutoGPT, Camel ⭐ LangChain ⭐
Sophia Yang
50 MiniGPT4: image understanding & open-source!
MiniGPT4: image understanding & open-source!
Sophia Yang
51 BEST Practices in Prompt Engineering: Learnings and Thoughts from Andrew Ng's New Course
BEST Practices in Prompt Engineering: Learnings and Thoughts from Andrew Ng's New Course
Sophia Yang
52 Designing Machine Learning Systems author interview with Chip Huyen
Designing Machine Learning Systems author interview with Chip Huyen
Sophia Yang
53 Tech news this week: code interpreter, Mojo, Redpajama, MPT7b, StarCoder #shorts
Tech news this week: code interpreter, Mojo, Redpajama, MPT7b, StarCoder #shorts
Sophia Yang
54 🤗 Hugging Face Transformers Agent | LangChain comparisons
🤗 Hugging Face Transformers Agent | LangChain comparisons
Sophia Yang
55 📢 Tech news this week #shorts
📢 Tech news this week #shorts
Sophia Yang
56 📢 Tech news this week #shorts
📢 Tech news this week #shorts
Sophia Yang
57 The BEST ChatGPT Plugins | Brand NEW Bing Search | Web browsing, CODING, summarizing, and more
The BEST ChatGPT Plugins | Brand NEW Bing Search | Web browsing, CODING, summarizing, and more
Sophia Yang
58 Tech news this week #shorts #short
Tech news this week #shorts #short
Sophia Yang
59 📢 Tech news this week #shorts
📢 Tech news this week #shorts
Sophia Yang
60 Deep Learning with PyTorch Author Interview with Eli Stevens, Luca Antiga, and Thomas Viehmann
Deep Learning with PyTorch Author Interview with Eli Stevens, Luca Antiga, and Thomas Viehmann
Sophia Yang

This video teaches how to train a language model using nanoGPT to build a songwriter, covering concepts like fine-tuning, retrieval augmented generation, and prompt engineering. Sophia Yang demonstrates how to prepare data, train a model, and generate lyrics using the trained model.

Key Takeaways
  1. Prepare Shakespeare data into training and validation sets
  2. Train a GPT model using a simplified version on a MacBook
  3. Fine-tune a GPT model on a GPT2 brain trained model
  4. Use a GPU or MPS for training
  5. Prepare data with prepare.py script
  6. Train model on Apple M1 computer with MPS
  7. Train model on GPU for faster performance
  8. Use Spotify million sound dataset for training a songwriter model
  9. Copy and paste train.py script to train model
  10. Copied the Train lyrics.py file to the lyrics directory
💡 Fine-tuning a pre-trained language model on a specific dataset can significantly improve its performance on a particular task, such as generating lyrics for a songwriter.

Related AI Lessons

The 2026 AI Model Release Race: Every Major LLM Launch You Need to Know
Stay updated on the 2026 AI model release race, including major LLM launches like Claude Sonnet 5 and GPT-5.6, to leverage the latest advancements in AI technology
Dev.to AI
Call GPT, Claude, and Gemini from one API key — a 3-step setup
Access GPT, Claude, and Gemini through one API key with a 3-step setup using Modelishub
Dev.to AI
Your LLM Doesn’t Pick Stocks — It Remembers Them
Discover how LLMs remember stock picks rather than making actual predictions, and why this matters for AI-driven investment strategies
Medium · Machine Learning
Word Representation
Learn how word representation works in NLP and its importance in understanding human language, enabling applications like text classification and language translation
Medium · NLP
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →