Stoic Philosophy Text Generation with TensorFlow

James Briggs · Beginner ·🧬 Deep Learning ·6y ago

Key Takeaways

The video demonstrates how to build a text generator using TensorFlow and Python, training on Stoic philosophy texts, and utilizing techniques such as character embeddings, sequence level windows, and ensemble methods. Key tools used include TensorFlow, BeautifulSoup, and sparse categorical cross-entropy.

Full Transcript

hi in this video we're going to go through or what I'm gonna sort of redesign this code here which is the code that we are currently using to actually train a recurrent neural network model it's been trained on these two days that's here wish I can actually show you over here so the sorry this one Saudi model is actually training on meditations by Marcus Aurelius and also lights from stoic by its Seneca which is in mutton this I'm not going to pronounce it so we're pulling them from these two sources and both open sausages really good anything with letters from stoic is that this is the parent page and I'm reusing beautifulsoup to call the individual letters so it 124 of them so it puts them all in puts them into a dictionary and then that dictionary we sort of extract the text we need from that joining all together and then join up with meditations and that is our training data storage is a big huge stream containing all of meditations and order letters which you can see you can see here I'm just drawing it together so this is so this is the this is a formula get loads and okay so we have the letter name and then we have these local web address and the text itself so we join all those together into one big string and then after that we join meditations and letters into the data that we use for training okay so I so quickly summarize what we do to build the to pre-process today and actually builder Cornell network in an hour so get started with it so if we take a photo cab which is a unique a unique sensory it's a set of the all of the unique herbs that we have within our data but actually so for this we're using characters and not words okay so you can use either in NLP sometimes user words sometimes use characters characters just kind of easier to set up you don't really have to do as much with that let's draw back some benefits for both usually probably use words where in this case we're gonna use characters and then we use we create a character to index the dictionary so we when we feed two cards into the data into the model they need to be numbers okay need to be integers those entities represent and index in an array so we will have the first layer in the model is a embedding array which is it's like a three-dimensional worse a few dimensional model a three dimensional tensor which has the embassies of each letter going down here and these across here is the embedding dimension so you know character a in our case will have 256 floating point values that represent it okay and then three-dimensional because we have that size of 64 so we feed in 64 of everything at any one point okay so that's why we have the character index to convert from a character into an index which can then be read internet dating later and then we have the pulling it back so after after we trained I want to convert our predictions into carrot readable characters the end we use this which goes from the index to the character okay mr. just converting our data into indices here's a sequence level so that is the window of text that we read at any one point so imagine let me give you an example imagine our window is four characters okay and we have the input of Hello with two hours not three so our input has a window size of four okay so it reads help our outputs because we are using this to predict or generate text outputs will be pushed one decks alone okay so where is our input help our output will be eleven that would be our target okay and we do this throughout the entire text okay the only difference first is that we are doing in sequences of 101 okay so define these do i define it so you define here so you can sign for 100 and then a few do things like I said the back side stays for letting dimension turn there 56 again there are a few things in there c'mon see ya so here we're using a tense flow data set object as you ready you saw and that essentially we put our data into this data object and then we can use the functions useful methods on that base object like batching shuffle and also bashing again here okay so that's really useful and this here that's what I said before would be the hello right so what that trunk can be hello input data I felt like me actually I put comment okay so say we're inputting hello this input data will be help in this alone alright so that's why we're splitting into our input and target data here okay and then we map the data object bill here okay we map data object into this function to get out correctly formatted input output dataset so we're using shuffle so here sorry let me just go with this cookie so this is where we are splitting into the sequence left with 101 at a time so you can see sequence length plus 1 okay anything remaining after that because obviously data set is very unlikely to correctly squeeze in two sets of 101 so we just drop the remaining characters if there are any that probably will be voiced I'm going to be many so it's fine so you know here mapping like as you said and then here we show for the day is that okay besides we defined up here so we should for the data and reason we do that is to give a better representation of the data with every single badge okay so we're going to train on 64 sequences at any one time and then update the model weights so shuffling de-esser means that especially with this DRS as well so we have the first part data meditations second part letters right if we didn't shuffle the data it would be training on 64 sequences of meditations at any one time okay and then within that it would probably also be speaking a specific topic at any one time okay or are specific few topics now doesn't give a very good overview of everything my training on so it all days those weights according to that specific topic and that specific book okay a specific text so meditations all letters right so we shuffle a dataset to give a better representation of the data in every single batch and now instead of having you know just meditation just one topic you enough a few different topics and meditations and a few different topics from Blatter's okay so it now works a lot better and then batch so we already use batch here to splinter sequence limbs and then here we use batch again to split in two batches of 64 sequence lens okay okay so here we're building the model so I have two different I'm going to change all of this by the way this is what I'm doing so we have a groove unit model or Alice Jim unit model okay so it's a change here so I'm not going to go into too much funds to be quick but the embedding layer as I said before foe cup size so foe cup size is how many characters this way in case observe - I think 85 the embedding dimension is how is what's 256 it's how detailed every single character is okay and then the batch input shape is the number of sequences I'm gonna put in any one time which is six four in this case okay now we have our lsdm you notice where the actual learning sequence learning comes into play so it's such a long short-term memory unit this is gabe occurring yet the point of using is over recurrent neural network is that they retain a sense of memory long term they do that through these different gates within the units which is really useful obviously of text and then the other point is that we using dropout of 10% on both of these so these units are naturally very deep so they can outfit really easily so having a dropout attempts them means that we mask 10% of the inputs at any one time so in a sentence or written in a word say hello again this will mask then what we must time sender isn't you know contrary split up so we'll mask one letter okay so hello will become H blank LLO right and that just helps team model generalize and then here this is our classification so say it's just a typical in your network densely connected hence dense and that that outputs into [Music] our folk upsides which is probably I think 85 which essentially means output 0 wolf map to a I'll put one more map to be and so on okay and then that's after that we use the to character dictionary again okay so here just building model persistence in a in a function okay summarizing model created last function compiling the model using atom optimizer and for the last function sorry sparse categorical cross-entropy okay here we are saving the model weights every epoch defined by this here which is fed into tens flow during training okay so callback this checkpoint callback which is here okay then at the end we restore the final checkpoint rebuild the model okay so here burning it again but this time so the batch size of 64 it has size of 1 okay so that's so the reason we do that is we don't want the batch size of 64 when we're predicting because then we have to put in see you know a list of 64 starting streams and we don't want to do that so we'd have to put like from 64 times and less which doesn't make sense so instead of doing that and that would take a lot more computing power as off okay so we essentially flatten the model a little bit and then it will only have on that a time or a batch size on one time so then we just feed in a word from and it will predict okay it will generate text and so we we rebuild it yeah load the weights into it and build it again okay no we just summary the model which is the same as above but like I said one inside six fourth of the batch size here I'm just clearing out the memory of the checkpoints because there's a lot of them they take memory so I did after is loaded or the most recent one and then here segment model and koats index of dictionary which I'll go through that later then here we generate X this is an old text generation function is updated a lot now show me show you later so here that you're either saving the model saving the character to index and here so what we have done here which is kind of interesting Hanson this than that often before in this scenario I don't really think I've seen it happen I've seen someone else do it for a chatbot but of the Nats I'm seeing when I was er he's using using most foreign or networks scoring them based on their output based on their English and how grammatically correct everything is and so on scoring their outputs and then choosing a winner okay so you can see here just some really rubbish ones because they're not they're not properly trained okay but some of them are obviously a lot that so these long as you can see where it says meditation safe only been trained on meditations when Marcus Aurelius and they tend to do worse a lot of the time these are training both okay so this one has the top score almost 20 and what is a cloth I thought if it were in a useful book okay and then here as it's for 14 patrol these are a pleasure and a lot of all those in which it were something okay and actually you can see so it's eat something it these are this force so if we scroll up we can see one of them went a little bit crazy it's already bad scope -128 I don't know why this happens it's very weird I had but sometimes they just go crazy which is why I saw buildest and Sam Renault and our learning thing method because some of them occasionally go a bit crazy so by doing that you have four other models or three other models in this case which are there to back it up so one of them goes crazy one of the other models takes over okay which is like really useful I so the output from one of these I haven't really trained very much and it was so much better already it was so much about the only so the thing that you might think is okay using different models so how do you keep them like so speaking about the same thing and they do that by the winning so we split in sentences we we rate a sentence the winning sentence goes back in to the models and then that is used to generate more types so they're continuously being updated will do with the best new sentence okay which is pretty pretty good so this is the the new text generation function it still needs some work so really quickly put together at the moment the rating function so I'll just kind of go over really quickly if the texts empty cuz I want you in some areas before I touched with the ante every now and again I'm not sure if that was an error in my code or the models are just being weird I think it must be an error in the code or something honey to remove this and then like actually figure out okay so take seventy just return it because otherwise it will throw an error when I rating the rest of it so then we normalize textures in we remove all punctuation and locates everything okay and then here we check for correct punctuation so at the end right at the end is that full stop estimation mark question mark or near the end eg if there's a full stop in a newline character is there a full stop Suresh map or question mark mrs. I'm fully working member because it's playing it stops the text generation when there's a full stop for example so I need to update that so it stops the text generation when there's a full stop a summation mark question mark or newline character the other alternative though when it when it stops text generation is when it Wendy when its generate too many characters which is a limit of like 500 at the moon so does that - okay and then we check for too much repetition life it's gonna say you know they're there it's probably a problem I think so then that happened like occasionally not so much of these mothers I have seen Athens that quite often before so it's quite a good rating weight rating as well and then here checking all the words are actual words according to the vocabulary that we had so we actually saved a very cab so I build it separately I don't know if I saw I think I have to code some way some in common so I just read in all the data that we have have and then split into words and split up into a text document all right which is here can we see this don't know don't know if you can see this inner thigh challenge done okay so here here is so you can see it's just a list of all of the words okay then we just put into a regex yeah and we you know associate okay Cinda if it's real or so now already it gets a good rating otherwise no yeah okay and then here is our ensemble class so here is where so this predict method here so initialize a prediction dictionary called self dot meditations so this will have the model name the score and then the text okay which you can see here and then that is used by gladiator products which also controls this function here so this function actually generates attacks and scores it based on here just control was that function Soviet runs up formation yeah and then it finds the highest-scoring sentence or sequence okay so the highest grim one is added to the tapes and then the highest wearing one is set as the new start sequence and then initially we keep Assad string because it's something we typed in like from and we need to keep that in for it for the first iterations make sense but then after the first iteration we don't want to keep feeding back into it because it's a previous sentence so we just want new text after that point so we just set up to false and then it goes through loops true depending on how many times we have said too so here's ten okay that's going to loop through ten times and produce our winning output okay so that's what we have so far what I'm going to do now and what we need to do now is refactor this into something that is clean and not so messy so I'm going to rebuild it into I think a new model a new sorry a new Python file called train I think yeah I think train is fine them colic train and then we will reflect that whenever we are building a new model and training new model so then everything is so so great a bit nicer okay so I'm going to go ahead and get on with that I'll sort of describe what I'm doing every now and again but for most college you can fast-forward you see code being made it shouldn't be okay [Music] [Music] [Music] okay so what we have done now is just refactored just repo that carried into a class here as you can see so now the the model is initiate initialize here we format the date and then we build the model so my intention likes playing it like this is that we can format the data okay and then I can pass multiple model build parameters to it and build several models at once so I haven't build that okay yet so I'm going to go so just feeling like a new PA and it will loop through different model parameters and build them all up throughout you know over night or something so we have several different models that we can then use in the ensemble class up here okay and then after that we will have you know several good models hopefully good models or competing to get the best text right so I think that is pretty good so far so one thing I just noticed actually here types so I'm going to set up and run that and see what we get okay [Music]

Original Description

Explanation of key parts to a RNN text generator built in TensorFlow with Python. 🤖 70% Discount on the NLP With Transformers in Python course: https://bit.ly/3DFvvY5 I've written a couple of Medium articles on this project, if you're interested check them out here: Stoic Philosophy - Built by Algorithms https://towardsdatascience.com/stoic-philosophy-built-by-algorithms-9cff7b91dcbd Supercharged Prediction with Ensemble Learning https://towardsdatascience.com/recurrent-ensemble-learning-caffdcd94092 Music used by Lakey Inspired. 1 - Blue Boi 2 - Falling https://www.youtube.com/channel/UCOmy8wuTpC95lefU5d1dt2Q
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from James Briggs · James Briggs · 1 of 60

← Previous Next →
Stoic Philosophy Text Generation with TensorFlow
Stoic Philosophy Text Generation with TensorFlow
James Briggs
2 How to Build TensorFlow Pipelines with tf.data.Dataset
How to Build TensorFlow Pipelines with tf.data.Dataset
James Briggs
3 Every New Feature in Python 3.10.0a2
Every New Feature in Python 3.10.0a2
James Briggs
4 How-to Build a Transformer for Language Classification in TensorFlow
How-to Build a Transformer for Language Classification in TensorFlow
James Briggs
5 How-to use the Kaggle API in Python
How-to use the Kaggle API in Python
James Briggs
6 Language Generation with OpenAI's GPT-2 in Python
Language Generation with OpenAI's GPT-2 in Python
James Briggs
7 Text Summarization with Google AI's T5 in Python
Text Summarization with Google AI's T5 in Python
James Briggs
8 How-to do Sentiment Analysis with Flair in Python
How-to do Sentiment Analysis with Flair in Python
James Briggs
9 Python Environment Setup for Machine Learning
Python Environment Setup for Machine Learning
James Briggs
10 Sequential Model - TensorFlow Essentials #1
Sequential Model - TensorFlow Essentials #1
James Briggs
11 Functional API - TensorFlow Essentials #2
Functional API - TensorFlow Essentials #2
James Briggs
12 Training Parameters - TensorFlow Essentials #3
Training Parameters - TensorFlow Essentials #3
James Briggs
13 Input Data Pipelines - TensorFlow Essentials #4
Input Data Pipelines - TensorFlow Essentials #4
James Briggs
14 6 of Python's Newest and Best Features (3.7-3.9)
6 of Python's Newest and Best Features (3.7-3.9)
James Briggs
15 Novice to Advanced RegEx in Less-than 30 Minutes + Python
Novice to Advanced RegEx in Less-than 30 Minutes + Python
James Briggs
16 Building a PlotLy $GME Chart in Python
Building a PlotLy $GME Chart in Python
James Briggs
17 How-to Use The Reddit API in Python
How-to Use The Reddit API in Python
James Briggs
18 How to Build Custom Q&A Transformer Models in Python
How to Build Custom Q&A Transformer Models in Python
James Briggs
19 How to Build Q&A Models in Python (Transformers)
How to Build Q&A Models in Python (Transformers)
James Briggs
20 How-to Decode Outputs From NLP Models (Python)
How-to Decode Outputs From NLP Models (Python)
James Briggs
21 Identify Stocks on Reddit with SpaCy (NER in Python)
Identify Stocks on Reddit with SpaCy (NER in Python)
James Briggs
22 Sentiment Analysis on ANY Length of Text With Transformers (Python)
Sentiment Analysis on ANY Length of Text With Transformers (Python)
James Briggs
23 Unicode Normalization for NLP in Python
Unicode Normalization for NLP in Python
James Briggs
24 The NEW Match-Case Statement in Python 3.10
The NEW Match-Case Statement in Python 3.10
James Briggs
25 Multi-Class Language Classification With BERT in TensorFlow
Multi-Class Language Classification With BERT in TensorFlow
James Briggs
26 How to Build Python Packages for Pip
How to Build Python Packages for Pip
James Briggs
27 How-to Structure a Q&A ML App
How-to Structure a Q&A ML App
James Briggs
28 How to Index Q&A Data With Haystack and Elasticsearch
How to Index Q&A Data With Haystack and Elasticsearch
James Briggs
29 Q&A Document Retrieval With DPR
Q&A Document Retrieval With DPR
James Briggs
30 How to Use Type Annotations in Python
How to Use Type Annotations in Python
James Briggs
31 Extractive Q&A With Haystack and FastAPI in Python
Extractive Q&A With Haystack and FastAPI in Python
James Briggs
32 Sentence Similarity With Sentence-Transformers in Python
Sentence Similarity With Sentence-Transformers in Python
James Briggs
33 Sentence Similarity With Transformers and PyTorch (Python)
Sentence Similarity With Transformers and PyTorch (Python)
James Briggs
34 NER With Transformers and spaCy (Python)
NER With Transformers and spaCy (Python)
James Briggs
35 Training BERT #1 - Masked-Language Modeling (MLM)
Training BERT #1 - Masked-Language Modeling (MLM)
James Briggs
36 Training BERT #2 - Train With Masked-Language Modeling (MLM)
Training BERT #2 - Train With Masked-Language Modeling (MLM)
James Briggs
37 Training BERT #3 - Next Sentence Prediction (NSP)
Training BERT #3 - Next Sentence Prediction (NSP)
James Briggs
38 Training BERT #4 - Train With Next Sentence Prediction (NSP)
Training BERT #4 - Train With Next Sentence Prediction (NSP)
James Briggs
39 FREE 11 Hour NLP Transformers Course (Next 3 Days Only)
FREE 11 Hour NLP Transformers Course (Next 3 Days Only)
James Briggs
40 New Features in Python 3.10
New Features in Python 3.10
James Briggs
41 Training BERT #5 - Training With BertForPretraining
Training BERT #5 - Training With BertForPretraining
James Briggs
42 How-to Use HuggingFace's Datasets - Transformers From Scratch #1
How-to Use HuggingFace's Datasets - Transformers From Scratch #1
James Briggs
43 Build a Custom Transformer Tokenizer - Transformers From Scratch #2
Build a Custom Transformer Tokenizer - Transformers From Scratch #2
James Briggs
44 3 Traditional Methods for Similarity Search (Jaccard, w-shingling, Levenshtein)
3 Traditional Methods for Similarity Search (Jaccard, w-shingling, Levenshtein)
James Briggs
45 3 Vector-based Methods for Similarity Search (TF-IDF, BM25, SBERT)
3 Vector-based Methods for Similarity Search (TF-IDF, BM25, SBERT)
James Briggs
46 Building MLM Training Input Pipeline - Transformers From Scratch #3
Building MLM Training Input Pipeline - Transformers From Scratch #3
James Briggs
47 Training and Testing an Italian BERT - Transformers From Scratch #4
Training and Testing an Italian BERT - Transformers From Scratch #4
James Briggs
48 Faiss - Introduction to Similarity Search
Faiss - Introduction to Similarity Search
James Briggs
49 Angular App Setup With Material - Stoic Q&A #5
Angular App Setup With Material - Stoic Q&A #5
James Briggs
50 Why are there so many Tokenization methods in HF Transformers?
Why are there so many Tokenization methods in HF Transformers?
James Briggs
51 Choosing Indexes for Similarity Search (Faiss in Python)
Choosing Indexes for Similarity Search (Faiss in Python)
James Briggs
52 Locality Sensitive Hashing (LSH) for Search with Shingling + MinHashing (Python)
Locality Sensitive Hashing (LSH) for Search with Shingling + MinHashing (Python)
James Briggs
53 How LSH Random Projection works in search (+Python)
How LSH Random Projection works in search (+Python)
James Briggs
54 IndexLSH for Fast Similarity Search in Faiss
IndexLSH for Fast Similarity Search in Faiss
James Briggs
55 Faiss - Vector Compression with PQ and IVFPQ (in Python)
Faiss - Vector Compression with PQ and IVFPQ (in Python)
James Briggs
56 Product Quantization for Vector Similarity Search (+ Python)
Product Quantization for Vector Similarity Search (+ Python)
James Briggs
57 How to Build a Bert WordPiece Tokenizer in Python and HuggingFace
How to Build a Bert WordPiece Tokenizer in Python and HuggingFace
James Briggs
58 Metadata Filtering for Vector Search + Latest Filter Tech
Metadata Filtering for Vector Search + Latest Filter Tech
James Briggs
59 Build NLP Pipelines with HuggingFace Datasets
Build NLP Pipelines with HuggingFace Datasets
James Briggs
60 Composite Indexes and the Faiss Index Factory
Composite Indexes and the Faiss Index Factory
James Briggs

This video teaches how to build a text generator using TensorFlow and Python, training on Stoic philosophy texts, and utilizing techniques such as character embeddings, sequence level windows, and ensemble methods. The video covers key concepts such as text generation, character embeddings, and ensemble methods, and provides practical steps for building and fine-tuning a text generator model.

Key Takeaways
  1. Pre-process text by joining meditations and letters into a single string
  2. Create a character to index dictionary to convert characters to indices
  3. Use character embeddings with a 3D tensor and embedding dimension of 64
  4. Define data object with batching, shuffling, and batching again
  5. Map data object into function to get correctly formatted input output dataset
  6. Train on 64 sequences at any one time with shuffling
  7. Use embedding layer with foe cup size 85 and embedding dimension 256
  8. Use LSTM unit with dropout of 10%
  9. Build model persistence in a function
  10. Compile the model using Adam optimizer
💡 The video demonstrates how to use ensemble methods to combine the results of multiple models, allowing for more accurate and diverse text generation.

Related Reads

📰
Want to get started with deep learning
Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch
Reddit r/deeplearning
📰
Building a Deepfake Detector From Scratch — What Nobody Tells You
Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media
Medium · Deep Learning
📰
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance
Medium · Deep Learning
📰
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning
Medium · Deep Learning
Up next
Image Classification with ml5.js
The Coding Train
Watch →