Stoic Philosophy Text Generation with TensorFlow

James Briggs · Beginner ·🧬 Deep Learning ·6y ago

Skills: LLM Foundations90%LLM Engineering80%Fine-tuning LLMs70%Multimodal LLMs60%

Key Takeaways

The video demonstrates how to build a text generator using TensorFlow and Python, training on Stoic philosophy texts, and utilizing techniques such as character embeddings, sequence level windows, and ensemble methods. Key tools used include TensorFlow, BeautifulSoup, and sparse categorical cross-entropy.

Full Transcript

hi in this video we're going to go through or what I'm gonna sort of redesign this code here which is the code that we are currently using to actually train a recurrent neural network model it's been trained on these two days that's here wish I can actually show you over here so the sorry this one Saudi model is actually training on meditations by Marcus Aurelius and also lights from stoic by its Seneca which is in mutton this I'm not going to pronounce it so we're pulling them from these two sources and both open sausages really good anything with letters from stoic is that this is the parent page and I'm reusing beautifulsoup to call the individual letters so it 124 of them so it puts them all in puts them into a dictionary and then that dictionary we sort of extract the text we need from that joining all together and then join up with meditations and that is our training data storage is a big huge stream containing all of meditations and order letters which you can see you can see here I'm just drawing it together so this is so this is the this is a formula get loads and okay so we have the letter name and then we have these local web address and the text itself so we join all those together into one big string and then after that we join meditations and letters into the data that we use for training okay so I so quickly summarize what we do to build the to pre-process today and actually builder Cornell network in an hour so get started with it so if we take a photo cab which is a unique a unique sensory it's a set of the all of the unique herbs that we have within our data but actually so for this we're using characters and not words okay so you can use either in NLP sometimes user words sometimes use characters characters just kind of easier to set up you don't really have to do as much with that let's draw back some benefits for both usually probably use words where in this case we're gonna use characters and then we use we create a character to index the dictionary so we when we feed two cards into the data into the model they need to be numbers okay need to be integers those entities represent and index in an array so we will have the first layer in the model is a embedding array which is it's like a three-dimensional worse a few dimensional model a three dimensional tensor which has the embassies of each letter going down here and these across here is the embedding dimension so you know character a in our case will have 256 floating point values that represent it okay and then three-dimensional because we have that size of 64 so we feed in 64 of everything at any one point okay so that's why we have the character index to convert from a character into an index which can then be read internet dating later and then we have the pulling it back so after after we trained I want to convert our predictions into carrot readable characters the end we use this which goes from the index to the character okay mr. just converting our data into indices here's a sequence level so that is the window of text that we read at any one point so imagine let me give you an example imagine our window is four characters okay and we have the input of Hello with two hours not three so our input has a window size of four okay so it reads help our outputs because we are using this to predict or generate text outputs will be pushed one decks alone okay so where is our input help our output will be eleven that would be our target okay and we do this throughout the entire text okay the only difference first is that we are doing in sequences of 101 okay so define these do i define it so you define here so you can sign for 100 and then a few do things like I said the back side stays for letting dimension turn there 56 again there are a few things in there c'mon see ya so here we're using a tense flow data set object as you ready you saw and that essentially we put our data into this data object and then we can use the functions useful methods on that base object like batching shuffle and also bashing again here okay so that's really useful and this here that's what I said before would be the hello right so what that trunk can be hello input data I felt like me actually I put comment okay so say we're inputting hello this input data will be help in this alone alright so that's why we're splitting into our input and target data here okay and then we map the data object bill here okay we map data object into this function to get out correctly formatted input output dataset so we're using shuffle so here sorry let me just go with this cookie so this is where we are splitting into the sequence left with 101 at a time so you can see sequence length plus 1 okay anything remaining after that because obviously data set is very unlikely to correctly squeeze in two sets of 101 so we just drop the remaining characters if there are any that probably will be voiced I'm going to be many so it's fine so you know here mapping like as you said and then here we show for the day is that okay besides we defined up here so we should for the data and reason we do that is to give a better representation of the data with every single badge okay so we're going to train on 64 sequences at any one time and then update the model weights so shuffling de-esser means that especially with this DRS as well so we have the first part data meditations second part letters right if we didn't shuffle the data it would be training on 64 sequences of meditations at any one time okay and then within that it would probably also be speaking a specific topic at any one time okay or are specific few topics now doesn't give a very good overview of everything my training on so it all days those weights according to that specific topic and that specific book okay a specific text so meditations all letters right so we shuffle a dataset to give a better representation of the data in every single batch and now instead of having you know just meditation just one topic you enough a few different topics and meditations and a few different topics from Blatter's okay so it now works a lot better and then batch so we already use batch here to splinter sequence limbs and then here we use batch again to split in two batches of 64 sequence lens okay okay so here we're building the model so I have two different I'm going to change all of this by the way this is what I'm doing so we have a groove unit model or Alice Jim unit model okay so it's a change here so I'm not going to go into too much funds to be quick but the embedding layer as I said before foe cup size so foe cup size is how many characters this way in case observe - I think 85 the embedding dimension is how is what's 256 it's how detailed every single character is okay and then the batch input shape is the number of sequences I'm gonna put in any one time which is six four in this case okay now we have our lsdm you notice where the actual learning sequence learning comes into play so it's such a long short-term memory unit this is gabe occurring yet the point of using is over recurrent neural network is that they retain a sense of memory long term they do that through these different gates within the units which is really useful obviously of text and then the other point is that we using dropout of 10% on both of these so these units are naturally very deep so they can outfit really easily so having a dropout attempts them means that we mask 10% of the inputs at any one time so in a sentence or written in a word say hello again this will mask then what we must time sender isn't you know contrary split up so we'll mask one letter okay so hello will become H blank LLO right and that just helps team model generalize and then here this is our classification so say it's just a typical in your network densely connected hence dense and that that outputs into [Music] our folk upsides which is probably I think 85 which essentially means output 0 wolf map to a I'll put one more map to be and so on okay and then that's after that we use the to character dictionary again okay so here just building model persistence in a in a function okay summarizing model created last function compiling the model using atom optimizer and for the last function sorry sparse categorical cross-entropy okay here we are saving the model weights every epoch defined by this here which is fed into tens flow during training okay so callback this checkpoint callback which is here okay then at the end we restore the final checkpoint rebuild the model okay so here burning it again but this time so the batch size of 64 it has size of 1 okay so that's so the reason we do that is we don't want the batch size of 64 when we're predicting because then we have to put in see you know a list of 64 starting streams and we don't want to do that so we'd have to put like from 64 times and less which doesn't make sense so instead of doing that and that would take a lot more computing power as off okay so we essentially flatten the model a little bit and then it will only have on that a time or a batch size on one time so then we just feed in a word from and it will predict okay it will generate text and so we we rebuild it yeah load the weights into it and build it again okay no we just summary the model which is the same as above but like I said one inside six fourth of the batch size here I'm just clearing out the memory of the checkpoints because there's a lot of them they take memory so I did after is loaded or the most recent one and then here segment model and koats index of dictionary which I'll go through that later then here we generate X this is an old text generation function is updated a lot now show me show you later so here that you're either saving the model saving the character to index and here so what we have done here which is kind of interesting Hanson this than that often before in this scenario I don't really think I've seen it happen I've seen someone else do it for a chatbot but of the Nats I'm seeing when I was er he's using using most foreign or networks scoring them based on their output based on their English and how grammatically correct everything is and so on scoring their outputs and then choosing a winner okay so you can see here just some really rubbish ones because they're not they're not properly trained okay but some of them are obviously a lot that so these long as you can see where it says meditation safe only been trained on meditations when Marcus Aurelius and they tend to do worse a lot of the time these are training both okay so this one has the top score almost 20 and what is a cloth I thought if it were in a useful book okay and then here as it's for 14 patrol these are a pleasure and a lot of all those in which it were something okay and actually you can see so it's eat something it these are this force so if we scroll up we can see one of them went a little bit crazy it's already bad scope -128 I don't know why this happens it's very weird I had but sometimes they just go crazy which is why I saw buildest and Sam Renault and our learning thing method because some of them occasionally go a bit crazy so by doing that you have four other models or three other models in this case which are there to back it up so one of them goes crazy one of the other models takes over okay which is like really useful I so the output from one of these I haven't really trained very much and it was so much better already it was so much about the only so the thing that you might think is okay using different models so how do you keep them like so speaking about the same thing and they do that by the winning so we split in sentences we we rate a sentence the winning sentence goes back in to the models and then that is used to generate more types so they're continuously being updated will do with the best new sentence okay which is pretty pretty good so this is the the new text generation function it still needs some work so really quickly put together at the moment the rating function so I'll just kind of go over really quickly if the texts empty cuz I want you in some areas before I touched with the ante every now and again I'm not sure if that was an error in my code or the models are just being weird I think it must be an error in the code or something honey to remove this and then like actually figure out okay so take seventy just return it because otherwise it will throw an error when I rating the rest of it so then we normalize textures in we remove all punctuation and locates everything okay and then here we check for correct punctuation so at the end right at the end is that full stop estimation mark question mark or near the end eg if there's a full stop in a newline character is there a full stop Suresh map or question mark mrs. I'm fully working member because it's playing it stops the text generation when there's a full stop for example so I need to update that so it stops the text generation when there's a full stop a summation mark question mark or newline character the other alternative though when it when it stops text generation is when it Wendy when its generate too many characters which is a limit of like 500 at the moon so does that - okay and then we check for too much repetition life it's gonna say you know they're there it's probably a problem I think so then that happened like occasionally not so much of these mothers I have seen Athens that quite often before so it's quite a good rating weight rating as well and then here checking all the words are actual words according to the vocabulary that we had so we actually saved a very cab so I build it separately I don't know if I saw I think I have to code some way some in common so I just read in all the data that we have have and then split into words and split up into a text document all right which is here can we see this don't know don't know if you can see this inner thigh challenge done okay so here here is so you can see it's just a list of all of the words okay then we just put into a regex yeah and we you know associate okay Cinda if it's real or so now already it gets a good rating otherwise no yeah okay and then here is our ensemble class so here is where so this predict method here so initialize a prediction dictionary called self dot meditations so this will have the model name the score and then the text okay which you can see here and then that is used by gladiator products which also controls this function here so this function actually generates attacks and scores it based on here just control was that function Soviet runs up formation yeah and then it finds the highest-scoring sentence or sequence okay so the highest grim one is added to the tapes and then the highest wearing one is set as the new start sequence and then initially we keep Assad string because it's something we typed in like from and we need to keep that in for it for the first iterations make sense but then after the first iteration we don't want to keep feeding back into it because it's a previous sentence so we just want new text after that point so we just set up to false and then it goes through loops true depending on how many times we have said too so here's ten okay that's going to loop through ten times and produce our winning output okay so that's what we have so far what I'm going to do now and what we need to do now is refactor this into something that is clean and not so messy so I'm going to rebuild it into I think a new model a new sorry a new Python file called train I think yeah I think train is fine them colic train and then we will reflect that whenever we are building a new model and training new model so then everything is so so great a bit nicer okay so I'm going to go ahead and get on with that I'll sort of describe what I'm doing every now and again but for most college you can fast-forward you see code being made it shouldn't be okay [Music] [Music] [Music] okay so what we have done now is just refactored just repo that carried into a class here as you can see so now the the model is initiate initialize here we format the date and then we build the model so my intention likes playing it like this is that we can format the data okay and then I can pass multiple model build parameters to it and build several models at once so I haven't build that okay yet so I'm going to go so just feeling like a new PA and it will loop through different model parameters and build them all up throughout you know over night or something so we have several different models that we can then use in the ensemble class up here okay and then after that we will have you know several good models hopefully good models or competing to get the best text right so I think that is pretty good so far so one thing I just noticed actually here types so I'm going to set up and run that and see what we get okay [Music]

Original Description

Explanation of key parts to a RNN text generator built in TensorFlow with Python. 🤖 70% Discount on the NLP With Transformers in Python course: https://bit.ly/3DFvvY5 I've written a couple of Medium articles on this project, if you're interested check them out here: Stoic Philosophy - Built by Algorithms https://towardsdatascience.com/stoic-philosophy-built-by-algorithms-9cff7b91dcbd Supercharged Prediction with Ensemble Learning https://towardsdatascience.com/recurrent-ensemble-learning-caffdcd94092 Music used by Lakey Inspired. 1 - Blue Boi 2 - Falling https://www.youtube.com/channel/UCOmy8wuTpC95lefU5d1dt2Q

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from James Briggs · James Briggs · 1 of 60

← Previous Next →

Stoic Philosophy Text Generation with TensorFlow

Stoic Philosophy Text Generation with TensorFlow

How to Build TensorFlow Pipelines with tf.data.Dataset

How to Build TensorFlow Pipelines with tf.data.Dataset

Every New Feature in Python 3.10.0a2

Every New Feature in Python 3.10.0a2

How-to Build a Transformer for Language Classification in TensorFlow

How-to Build a Transformer for Language Classification in TensorFlow

How-to use the Kaggle API in Python

How-to use the Kaggle API in Python

Language Generation with OpenAI's GPT-2 in Python

Language Generation with OpenAI's GPT-2 in Python

Text Summarization with Google AI's T5 in Python

Text Summarization with Google AI's T5 in Python

How-to do Sentiment Analysis with Flair in Python

How-to do Sentiment Analysis with Flair in Python

Python Environment Setup for Machine Learning

Python Environment Setup for Machine Learning

Sequential Model - TensorFlow Essentials #1

Sequential Model - TensorFlow Essentials #1

Functional API - TensorFlow Essentials #2

Functional API - TensorFlow Essentials #2

Training Parameters - TensorFlow Essentials #3

Training Parameters - TensorFlow Essentials #3

Input Data Pipelines - TensorFlow Essentials #4

Input Data Pipelines - TensorFlow Essentials #4

6 of Python's Newest and Best Features (3.7-3.9)

6 of Python's Newest and Best Features (3.7-3.9)

Novice to Advanced RegEx in Less-than 30 Minutes + Python

Novice to Advanced RegEx in Less-than 30 Minutes + Python

Building a PlotLy $GME Chart in Python

Building a PlotLy $GME Chart in Python

How-to Use The Reddit API in Python

How-to Use The Reddit API in Python

How to Build Custom Q&A Transformer Models in Python

How to Build Custom Q&A Transformer Models in Python

How to Build Q&A Models in Python (Transformers)

How to Build Q&A Models in Python (Transformers)

How-to Decode Outputs From NLP Models (Python)

How-to Decode Outputs From NLP Models (Python)

Identify Stocks on Reddit with SpaCy (NER in Python)

Identify Stocks on Reddit with SpaCy (NER in Python)

Sentiment Analysis on ANY Length of Text With Transformers (Python)

Sentiment Analysis on ANY Length of Text With Transformers (Python)

Unicode Normalization for NLP in Python

Unicode Normalization for NLP in Python

The NEW Match-Case Statement in Python 3.10

The NEW Match-Case Statement in Python 3.10

Multi-Class Language Classification With BERT in TensorFlow

Multi-Class Language Classification With BERT in TensorFlow

How to Build Python Packages for Pip

How to Build Python Packages for Pip

How-to Structure a Q&A ML App

How-to Structure a Q&A ML App

How to Index Q&A Data With Haystack and Elasticsearch

How to Index Q&A Data With Haystack and Elasticsearch

Q&A Document Retrieval With DPR

Q&A Document Retrieval With DPR

How to Use Type Annotations in Python

How to Use Type Annotations in Python

Extractive Q&A With Haystack and FastAPI in Python

Extractive Q&A With Haystack and FastAPI in Python

Sentence Similarity With Sentence-Transformers in Python

Sentence Similarity With Sentence-Transformers in Python

Sentence Similarity With Transformers and PyTorch (Python)

Sentence Similarity With Transformers and PyTorch (Python)

NER With Transformers and spaCy (Python)

NER With Transformers and spaCy (Python)

Training BERT #1 - Masked-Language Modeling (MLM)

Training BERT #1 - Masked-Language Modeling (MLM)

Training BERT #2 - Train With Masked-Language Modeling (MLM)

Training BERT #2 - Train With Masked-Language Modeling (MLM)

Training BERT #3 - Next Sentence Prediction (NSP)

Training BERT #3 - Next Sentence Prediction (NSP)

Training BERT #4 - Train With Next Sentence Prediction (NSP)

Training BERT #4 - Train With Next Sentence Prediction (NSP)

FREE 11 Hour NLP Transformers Course (Next 3 Days Only)

FREE 11 Hour NLP Transformers Course (Next 3 Days Only)

New Features in Python 3.10

New Features in Python 3.10

Training BERT #5 - Training With BertForPretraining

Training BERT #5 - Training With BertForPretraining

How-to Use HuggingFace's Datasets - Transformers From Scratch #1

How-to Use HuggingFace's Datasets - Transformers From Scratch #1

Build a Custom Transformer Tokenizer - Transformers From Scratch #2

Build a Custom Transformer Tokenizer - Transformers From Scratch #2

3 Traditional Methods for Similarity Search (Jaccard, w-shingling, Levenshtein)

3 Traditional Methods for Similarity Search (Jaccard, w-shingling, Levenshtein)

3 Vector-based Methods for Similarity Search (TF-IDF, BM25, SBERT)

3 Vector-based Methods for Similarity Search (TF-IDF, BM25, SBERT)

Building MLM Training Input Pipeline - Transformers From Scratch #3

Building MLM Training Input Pipeline - Transformers From Scratch #3

Training and Testing an Italian BERT - Transformers From Scratch #4

Training and Testing an Italian BERT - Transformers From Scratch #4

Faiss - Introduction to Similarity Search

Faiss - Introduction to Similarity Search

Angular App Setup With Material - Stoic Q&A #5

Angular App Setup With Material - Stoic Q&A #5

Why are there so many Tokenization methods in HF Transformers?

Why are there so many Tokenization methods in HF Transformers?

Choosing Indexes for Similarity Search (Faiss in Python)

Choosing Indexes for Similarity Search (Faiss in Python)

Locality Sensitive Hashing (LSH) for Search with Shingling + MinHashing (Python)

Locality Sensitive Hashing (LSH) for Search with Shingling + MinHashing (Python)

How LSH Random Projection works in search (+Python)

How LSH Random Projection works in search (+Python)

IndexLSH for Fast Similarity Search in Faiss

IndexLSH for Fast Similarity Search in Faiss

Faiss - Vector Compression with PQ and IVFPQ (in Python)

Faiss - Vector Compression with PQ and IVFPQ (in Python)

Product Quantization for Vector Similarity Search (+ Python)

Product Quantization for Vector Similarity Search (+ Python)

How to Build a Bert WordPiece Tokenizer in Python and HuggingFace

How to Build a Bert WordPiece Tokenizer in Python and HuggingFace

Metadata Filtering for Vector Search + Latest Filter Tech

Metadata Filtering for Vector Search + Latest Filter Tech

Build NLP Pipelines with HuggingFace Datasets

Build NLP Pipelines with HuggingFace Datasets

Composite Indexes and the Faiss Index Factory

Composite Indexes and the Faiss Index Factory

This video teaches how to build a text generator using TensorFlow and Python, training on Stoic philosophy texts, and utilizing techniques such as character embeddings, sequence level windows, and ensemble methods. The video covers key concepts such as text generation, character embeddings, and ensemble methods, and provides practical steps for building and fine-tuning a text generator model.

Key Takeaways

Pre-process text by joining meditations and letters into a single string
Create a character to index dictionary to convert characters to indices
Use character embeddings with a 3D tensor and embedding dimension of 64
Define data object with batching, shuffling, and batching again
Map data object into function to get correctly formatted input output dataset
Train on 64 sequences at any one time with shuffling
Use embedding layer with foe cup size 85 and embedding dimension 256
Use LSTM unit with dropout of 10%
Build model persistence in a function
Compile the model using Adam optimizer

💡 The video demonstrates how to use ensemble methods to combine the results of multiple models, allowing for more accurate and diverse text generation.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related Reads

Understanding Deep Learning Through Four Interactive Experiments

Explore deep learning concepts through interactive experiments to gain hands-on understanding

Medium · Data Science

Understanding Deep Learning Through Four Interactive Experiments

Explore deep learning through interactive experiments to gain hands-on understanding

Medium · Deep Learning

Optimizers in Deep Learning: From Gradient Descent to Adam

Learn how optimizers in deep learning work, from basic Gradient Descent to advanced Adam optimizer, to improve model training

Medium · Deep Learning

The Meta-Architecture of Interface Fracture: High-Dimensional Logical Stress and Systemic Collapse…

Learn about the meta-architecture of interface fracture and its relation to high-dimensional logical stress and systemic collapse in deep learning systems

Medium · Deep Learning

Image Classification with ml5.js

The Coding Train