Daily Data Live | Sampling Fixes for Transformer Translator
Skills:
LLM Engineering90%
Key Takeaways
Fixes sampling issues for a Transformer translator using Top K Callbacks and Greedy Algos
Full Transcript
what's happening guys oh my i always forget to bring the mic over how y'all doing today we're going to be live streaming and doing our daily data even though it hasn't quite been daily we're going to be going through a little bit of coding and my regular five-ish lines of code a day but i actually need some feedback from you so i'm obviously trying to improve this live stream and make it look a little bit fancier and all that good stuff you know have the cool transitions and whatnot i wanted to get your feedback on one of the like stream overlays that i'm looking at getting at the moment we're gonna jump into some code in a sec as well but um i'll include some links below let me know what you guys think if um let me switch over and show you what i'm talking about so like there's a bunch of these stream overlays that i've been looking at to be able to like do the transitions make things look a little bit more slick and whatnot if you actually take a look at the this is completely unsponsored and unrelated right like i'm i'm literally just doing this all myself but um there's like some great ones here i was looking at this clearview one and it just looks slick like i'm thinking about having like just tightening this up so that like when i'm starting i'm not like looking blankly over there i'm like i don't know if this is streaming or not but it will have like the stream starting stream ending we'll be right back and like then we can have the chat as an overlay so you guys can see what i'm actually responding to in real time um so i'm thinking either this clearview one and i'll link it below so you guys can see it and then there was another one that i kind of liked i think it was called ranked where was it this one this one looks sick i don't know let me know what you think i'm thinking these look good this one like looks i think pretty sick like if you actually watch the demo it actually looks like i don't know maybe this is more for tweet i'm obviously not going to be gaming and what not only if you want me to game maybe who knows we'll see but like let me know i'll link them below if you guys have any feedback honestly it'll be like super appreciated okay but what are we going to be going through today so let's quickly go through the chat see what's happening hey nick can you make a video in total end-to-end project on deep learning and building deployment back in cloud and all that stuff i think i'm going gonna like i've got three i don't know if i've told you my big plans or like um like what i'm kind of working on if you go and check the channel right now i actually released this um that it's like a nine hour compilation of all the reinforcement learning for gaming videos that that we've sort of been working on um and the game plan is to do one big more of those which is probably going to be even bigger than that one but it's going to be on all of the deep learning projects so if you actually take a look there's a whole bunch of end-to-end deep learning projects there's the capuchin bird classification which is like an audio based project there's a i'm trying to the mic is a little bit in the way that there's the capuchin bird one there's the image classification one the face detection one the iris detection one um there's gonna be the three transformer ones that are coming up soon which will one of which will be on text classification image classification so that's using bits and then they'll be the one that i'm going to show today a little bit which is going to be the language transformer one which i think is is pretty cool and there's i think one thing that i figured out that we probably need to fix up but as i'm building all these massive projects i'm going to be consolidating them and putting them together in a structured course so you know which order to do them in in from like easiest to hardest effectively because like the stuff that i'm going through from my perspective it's pretty advanced like i wouldn't expect a beginner to try to go and do this stuff but um you sort of get the idea like i'm trying to stack them together so that you guys have courses that you can follow that you can learn and pick up from um if that's the type of thing that you like let me know um again i'm always open to change and i'm i'm doing this for you guys so um if there's something else that you'd like to see do let me know something about self-driving cars yeah definitely i definitely want to do something about self-driving cars there's um there's a bunch of really good data sets out there and i know there's uh there's a couple of frameworks the thing that i noticed when i was taking a look at it a while ago um what was it open ai gym let me switch over so you can see what i'm searching it's like open ai gym self driving car there's an environment for it ah i can't remember what it's called there's a monster one like it it's huge and it's really really good the only thing is that it's uh it's built for linux so i'm gonna have to get windows subsystem for linux spun up so that we can run that but yeah game plan is to do some more stuff there's a whole heap of additional projects i don't know if you've noticed as well that i'm definitely posting a ton more compared to usual and that's probably because pre my holiday i was uh i'll be honest with you i was massively burnt out um but now i'm good i've relaxed i'm doing baby steps and i'm working hard during the week and i'm relaxing during the weekends that's sort of the the game plan that i've got happening at the moment so um we're looking good write some food recommendations here down to teach some algorithms if you want to see some algorithm stuff let me know like i'm like i i'm always open to feedback if you want like a specific thing or you want to look at like math type stuff let me know um software editing that i'm using for streaming ashley how y'all doing um so i'm doing this on obs right now i thought ranked looks good as well the clear view looks good as well ash wants to learn sql um ranked works but it does feel a little bit twitch yeah keep learning bioinformatics hey thanks for the headphones yeah i use these all the time it felt weird live streaming without headphones real project yeah i'm gonna i'm the game plan is to just keep making it a bunch of content because i'm really enjoying it i'm back into it and and if you like these live streams do let me know like i like i really really like making them because i get to interact with you guys a whole heap more um rather than just making videos but uh the game plan is to still do both so don't think that just because i'm doing live streams i'm gonna drop videos i love doing both but i i think i like live streams because they get to interact with you guys that's my favorite bit what did i do to recover from burnout honestly i just stepped away and and took a little bit of a break and just started just chilling out a little bit more i'm meditating every single day as well now so like in the morning i'm meditating and at night i'm meditating just to clear my head because i'm obviously working a full-time job and doing this as well so just finding a little headspace has definitely helped in that sense so um yeah that's what i'm doing for burnout but i mean i think that's a common problem in tech i was gonna make a video about it i don't know how many people were interested but i think um burnout is a huge thing in tech and particularly in a highly competitive field like deep learning data science all that stuff it is just absolutely brutal i'd imagine that like there's maybe people don't talk about it as much as like i mean this is probably the first time you've heard me talk about it but um i think it's really common and maybe if you guys want to see a video on that or how i'm what i'm doing that ensure that it ideally doesn't happen again it might happen sometime but um yeah like you got to enjoy life to your point um ops typing along to your point you got to enjoy life you got to enjoy life i'd like to see a yoga session with nick yeah thanks ash yeah you can join in on that one oh yeah imposter syndrome is another real one i actually had a video planned for it but um i don't know i did i never ended up doing it but maybe i will okay so what is it that we're going to talk about from a coding perspective but do you guys like me talking about just this other stuff and like i don't know just life and like stuff outside of code and whatnot yeah no i really wish we had the chat on the screen because then you guys could see it in real time but i guess if you're watching it you can probably see the chat okay um what are we doing all right so we are going to all right so yeah so one thing that i noticed as i was working on this magvie oh dan so dan is one of my really good friends ashley is actually my girlfriend she's uh joining in on my live streams now sneakily yeah my best mate dance is that he's in hungary right now how you doing dan have a beer for me alrighty cool oh man i love the power cube i'm loving that you're enjoying this okay what are we doing so we uh so all right yeah so like i was this is the language transformer language translator deep learning model where i had a massive mind brain fart there um and this is based on a data set out of the tensorflow data sets repo now what i actually noticed is i was doing a little bit a lot of comments for yoga i don't do yoga guys one of the um what's called i was actually reading the python deep learning for python book by francois chole deep learning python deep learning with python this book so i was reading this this morning and this is like how what i used to start getting started with transformers and i noticed that when he was building his uh language translator using using transformers he didn't actually have or he used a greedy algorithm instead of like a top k sampling algorithm so if you think about like a top k sampling algorithm it's basically grabbing like the top x best predictions or most likely predictions and it's sampling from that distribution so the way that it's actually done is you can see it here so we pass in a source input we make a prediction and then rather than just taking the most or the highest probability prediction which would be effectively a greedy algorithm we take the top 10 or the top k in this case so let me zoom in as that's probably pretty small we take we take the top 10 most likely words which in this case the top k would be 10 here we could change it to 5 if we wanted to and then we go and pass it through a soft max function to effectively turn that distribution into a range or to for them all to sum up to one because that's the basis basis of probability and then we go and make a random choice based on that distribution now for us why didn't do that which makes me think that i should just follow what he did and use a greedy algorithm so let's actually try to do this so i'm going to say so the translation and we're going to kick it off with the word start and i'm going to go through this in more like when we actually build we'll do the full video and the source input equals en vectorizer which is taking a word and converting it into a vector my name is nicholas and if we go and take a look at that source input is that look is that big enough for you guys to see that is you can say uh i spelt that wrong okay so you can see we've converted it into a numeric representation there which will eventually be passed through to an embedding layer which is part of l deep neural network and you can see that the model that i actually trained has a really high degree of accuracy like 0.9791 which i was like super confused when i was actually going and generating predictions why it looked like it sucked this is something that i actually go through whenever i'm building models i'm like okay this works don't think that i'm building perfect models from the get-go that's that's not me um and it's probably useful for you guys to see this as well because it's a little bit more realistic uh okay all right so we've got our source input now what we're going to do is we're going to loop so the maximum sequence length that we can pass through to our model is 200 words and the maximum output sequence is going to be 200 words as well so we can go for x in range 200 and then what we're going to do is we are going to pass through our source input which is going to be this and so model and model dot input shape all right so it expects i just can't remember the order that expects the source and the target okay so the source is going to be this the target is going to be that we need to pass surely we need to pass that through the vectorizer as well yeah we do okay so model equals model.predict model.predict and we are going to take our target translation so i'm going to say d so this is a english to german translator as well that's what i was it's like i just wanted to give this a crack d source input equals d e i think it's the vectorizer yep it is and we are taking the translation and that is going to be going so the order that we need to pass it through should be en source input desource input uh no this needs to be expanded so mp.expand dims so this is this expects that we pass through multiple sets of inputs so you guys probably see me do the uh np expand dimms trick a ton so this should be inside oh no we'll do it like that and then p dot expand dimms the e source input hey probably wondering like why are we why are we doing a why are we vectorizing inside of the loop well that's because we're going to keep appending the translations from our model back to that source translation line so back to this and we are iteratively going to generate predictions let's just check this works for now before going through the loop we've got a surely i've got a error that nope did that work but that is that that yeah that should be okay okay so let's take a look at our predictions right so we only want the first first or second if we go np dot arg max the first prediction 200 000 uh it should be zero dot zero shouldn't it because we want the first okay so now we should have a vocabulary dictionary which is over here beautiful okay so then we would be going one whatever the prediction is all right so it's unknown so it's generating a prediction of unknown am i reading that right yeah i am reading that right okay there's one generating the cab and so the vocab over here let me explain that a little bit more so vocab is effectively when we generate a vectorizer to vectorize i think of it as taking words and converting them to numeric representations we can go and reverse that process and convert a the numeric representation back to the words that's what our vocab is doing over here and that is what i am using or utilizing down there so list is not iterable what have i got actually going to pass through we need np arg max missing a close what am i generating here should be square brackets my bad okay so that's generating a blank space all right let's do this iteratively so if i grab that bring that here so this is going to be taking so if we want to generate iteratively we should be taking the x input and appending that so this is going to give us our word back or predicted word let's say greedy predicted word equals vocab and then what we want to do is we want to append that word back onto the translation so it's going to be translation plus equals space plus really predicted word let's just set x does zero for now and if i go and take a look at our translation that's got start and then unknown okay so we are looking okay now if we did this iteratively for 200 or 200 times tap that so that's effectively our greedy algorithm there you can see it's iteratively going through 200 times and what you typically do is you have an end statement so if the predicted word is end it can be a little bit problematic if end is part of the sentence okay we've clearly got issues here so you can see it's generating unknown a bunch of time which makes me think that we have not gone and vectorized correctly all right we've at least got our greedy algo transformed so this means that we've got an issue with our vectorizer okay but typically the last thing that you'd go and apply over here is an end statement so if greedily predicted word equals equals end then you'd break and you wouldn't append to the translation oh what have i gotten done there so we can grab this bring that so to be something like that and we need to reset our translation so let's grab this i'm going to delete that delete that delete that what happens if i run it now okay so start and then unknown unknown all right so this is what we need going to need to fix up so maybe we'll do it at the next live stream so at least we've got our greedy algorithm generated now i think i can get rid of this one because i'm not going to use the top k but if you guys want me to share that code i'll share it anyway but let me actually comment this through let's quickly check the chat oh my god we've got a ton of comments all right so uh what's your take on andrew and king's deep learning specialization course oh the deep learning specialization one i think is really really good we're gonna come back to the code but let me have a quick check so the deep learning specialization one is very very good very math heavy but um i liked it i think they're the one thing that i noticed is that from theory to implementation there was maybe a little bit of a gap but i mean that that that's just the nature of that course it is very theoretical heavy um since coming from power cube ao nick since you made a comment classifier is there something like a text generator with keras tf coming soon yup the storm z generator is going to do that i'm just making sure it actually works and works well how you doing rakesh how you doing sunk it um student uh all right you guys are just chatting can we use a custom tokenization rule with the vector as you can alrighty cool all right let's go back to that and i think what i'm going to do start doing is live streams is even though i'm just coding randomly or writing my five lens code i'm actually going to start commenting so you guys see so this is the beginning of the uh translation sequence then we are generating the source input so this is the word that we actually want to convert or this sentence that we want to convert sentence that we want to convert this is the iterative process to generate a prediction then we are tokenizing the german bit which is initially just start just the word start we are then making our first prediction we'll make a prediction using the source and target vectors and then we're greedily me extracting the result readily then we're breaking if the predicted word is end and then we append the predicted word do the full target sequence yeah i'm going to need to go and look at into that although yeah there's something going wrong because of it's generating it's got a really high degree of accuracy and we're still generating unknowns does that mean that we've got like a bias towards unknown that's the thing that i may be thinking i'm gonna go dig into that anyway guys that's a that's the five lines of code today me just uh winging it doing a little bit of top k sampling oh i'm gonna drop the links to those overlays in the comments after the live chat let me know which ones you guys like and i'll probably pick that one and you'll probably see it in the live stream next week but hopefully you do like it hopefully you're enjoying this live stream and at least you got to see a little bit of what's coming up in some of the next videos let's quickly take a look i don't know about that andrew ing's an absolute beast anyway guys thank you so much for tuning into this live stream we will i'll catch you next week peace enjoy the weekend have fun you
Original Description
Daily coding live stream, today working on:
- Top K Callbacks
- Greedy Algos
Oh, and don't forget to connect with me!
LinkedIn: https://bit.ly/324Epgo
Facebook: https://bit.ly/3mB1sZD
GitHub: https://bit.ly/3mDJllD
Patreon: https://bit.ly/2OCn3UW
Join the Discussion on Discord: https://bit.ly/3dQiZsV
Happy coding!
Nick
P.s. Let me know how you go and drop a comment if you need a hand!
#machinelearning #ai #tech
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Nicholas Renotte · Nicholas Renotte · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Face Detection - Build An Image Classifier with IBM Watson - Part 7
Nicholas Renotte
Food Image Classification - Build An Image Classifier with IBM Watson - Part 6
Nicholas Renotte
General Image Classification - Build An Image Classifier with IBM Watson - Part 5
Nicholas Renotte
Installing Watson Developer Cloud - Build An Image Classifier with IBM Watson - Part 4
Nicholas Renotte
Generating Credentials - Build An Image Classifier with IBM Watson - Part 3
Nicholas Renotte
Creating A Service - Build An Image Classifier with IBM Watson - Part 2
Nicholas Renotte
Getting an IBMid - Build An Image Classifier with IBM Watson - Part 1
Nicholas Renotte
How to Analyse Review Data - Part 2 - Python Yelp Sentiment Analysis
Nicholas Renotte
How to Lemmatize Text - Part 4 - Python Yelp Sentiment Analysis
Nicholas Renotte
How to Calculate Sentiment Using TextBlob - Part 5 - Python Yelp Sentiment Analysis
Nicholas Renotte
How to Collect Business Reviews Using Python - Part 1 - Python Yelp Sentiment Analysis
Nicholas Renotte
How to Clean Text Based Data for NLP - Part 3 - Python Yelp Sentiment Analysis
Nicholas Renotte
How to Setup a IBM Watson Personality Insights Service - Part 1 - Watson Personality Insights
Nicholas Renotte
How to Create a Customer Profile with IBM Watson - Part 2 - Watson Personality Insights
Nicholas Renotte
Visualising The Profile Part 3 Watson Personality Insights
Nicholas Renotte
How to Plot Personality Insights Features at Lightspeed - Part 4 - IBM Watson Personality Insights
Nicholas Renotte
Getting Started With IBM Watson Studio Machine Learning - Part 1 - Predicting Used Car Prices
Nicholas Renotte
Upload and Visualize Data In IBM Watson Studio - Part 2 - Predicting Used Car Prices
Nicholas Renotte
Clean Data and Feature Engineer in IBM Watson Studio - Part 3 - Predict Used Car Prices
Nicholas Renotte
Using Watson Model Builder to Predict Car Prices - Part 4 - Predicting Used Car Prices
Nicholas Renotte
Deploy and Make Predictions With Watson Studio - Part 5 - Predicting Used Car Prices
Nicholas Renotte
Getting Started With IBM Watson Discovery - Part 1 - Stock News Crawler
Nicholas Renotte
How to Run Advanced Queries with Watson Discovery - Part 5 - Stock News Crawler
Nicholas Renotte
How to Run Search Queries with IBM Watson Discovery - Part 4 - Stock News Crawler
Nicholas Renotte
How to Understand the Watson Discovery Data Schema - Part 3 - Stock News Crawler
Nicholas Renotte
How to Build a Watson Discovery Web Crawler - Part 2 - Stock News Crawler
Nicholas Renotte
AI learns what to do next using Tensorflow and Python
Nicholas Renotte
Chatbot Crash Course for Absolute Beginners - Full 20 Minute Tutorial
Nicholas Renotte
Shopify Customer Service Chatbot using Python Automation
Nicholas Renotte
Building a Reddit Keyword Research Chatbot
Nicholas Renotte
Chatbot App Tutorial with Javascript Node.js [Part 1]
Nicholas Renotte
Javascript Chatbot From Scratch with React.Js [Part 2]
Nicholas Renotte
Predicting Churn with Automated Python Machine Learning
Nicholas Renotte
Sales Forecasting in Excel with Machine Learning and Python Automation
Nicholas Renotte
Automate Budgeting with Python and Planning Analytics
Nicholas Renotte
AI vs Machine Learning vs Deep Learning vs Data Science
Nicholas Renotte
Optimizing Marketing Spend using Linear Programming || Marketing Opt PT.1
Nicholas Renotte
Solving Optimization Problems with Python Linear Programming
Nicholas Renotte
Loading Data into Planning Analytics with Python || Marketing Opt PT.2
Nicholas Renotte
Building Marketing Dashboards with Planning Analytics Workspace || Marketing Opt PT.3
Nicholas Renotte
Optimizing Resource Allocation with Docplex and Planning Analytics || Marketing Opt PT.4
Nicholas Renotte
Exploratory Data Analysis With Pandas || Python Machine Learning PT.1
Nicholas Renotte
Preparing Pandas Dataframes for Machine Learning || Python Machine Learning PT.2
Nicholas Renotte
Python Machine Learning with Scikit Learn - Regression || Python Machine Learning PT.3
Nicholas Renotte
Deploying Machine Learning Models with Watson Machine Learning || Python Machine Learning PT.4
Nicholas Renotte
Mind Blowing Machine Learning Apps with Node.JS and Watson Machine Learning || Python ML PT.5
Nicholas Renotte
Build FAST Machine Learning Apps with Javascript React.Js and Watson || Python ML PT.6
Nicholas Renotte
Analyzing Twitter Accounts with Python and Personality Insights
Nicholas Renotte
Converting Speech to Text in 10 Minutes with Python and Watson
Nicholas Renotte
Build a Face Mask Detector in 20 Minutes with Watson and Python
Nicholas Renotte
AI Text to Speech in 10 Minutes with Python and Watson TTS
Nicholas Renotte
Pandas for Data Science in 20 Minutes | Python Crash Course
Nicholas Renotte
Language Translation and Identification in 10 Minutes with Python and Watson AI
Nicholas Renotte
Analyse ANY Conversation in 10 Minutes with Python and Watson Tone Analyser
Nicholas Renotte
Deep Reinforcement Learning Tutorial for Python in 20 Minutes
Nicholas Renotte
NumPy for Beginners in 15 minutes | Python Crash Course
Nicholas Renotte
Real Time Pose Estimation with Tensorflow.Js and Javascript
Nicholas Renotte
Transcribe Video to Text with Python and Watson in 15 Minutes
Nicholas Renotte
Serverless Functions for TM1/Planning Analytics in 20 Minutes
Nicholas Renotte
Building a AI Budget Bot for Planning Analytics with Watson Assistant in 20 Minutes
Nicholas Renotte
More on: LLM Engineering
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Claude AI vs ChatGPT: Which One Is Actually Better in 2026?
Medium · AI
Claude AI vs ChatGPT: Which One Is Actually Better in 2026?
Medium · Programming
IntelliBooks: Classic RAG vs Graph RAG vs Agentic RAG – Choosing the Right AI Retrieval Architecture for Enterprise AI
Dev.to AI
Fluid, natural voice translation with Gemini 3.5 Live Translate
Dev.to AI
🎓
Tutor Explanation
DeepCamp AI