How does Google Translate's AI work?
Key Takeaways
Google Translate's AI works using a neural network with an encoder-decoder architecture, incorporating long short-term memory (LSTM) recurrent neural networks (RNNs) and attention mechanisms to translate sentences from English to French.
Full Transcript
you have international friends talking smack behind your back you use Google Translate you're looking up words for a French class you regret taking you use Google Translate you're in a foreign country and just want to ask the waiter for some extra cheese on your taco you use Google Translate Google Translate has quite The Eclectic of applications but have you ever wondered how does it translate stuff how is that all working online we're going to answer these questions today Tech or non techie I'm going to make sure you all follow along and learn something interesting in the end this is code Emporium and with that let's get started language translation how do we translate a sentence in one language to another language to make things concrete let's say That We're translating from English to French our first trial would be you take every word in the English sentence for every word you find the corresponding French translation then spit it out out and we repeat this for every word in the sentence it's a simple strategy and honestly we don't need machine learning for this if we just have a curated database with English to French word translations then we're all set for every English word look it up in the database get the corresponding French word and repeat this for every word that's great but there's a problem with this if you're bilingual or even if you just know English then you know that language has two important components that's tokens and grammar tokens are the smallest units of language grammar defines how these tokens should appear so that they make sense so in this context tokens are words every word is a token it's a beautiful day has five word tokens and grammar is basically a guide or a set of rules that defines an ordering for these words if language was constructed from token and grammar didn't matter then language translation would be so much easier and our simple word translation system we came up with would actually be the state of the art translator however that isn't the case grammar exists and we need to incorporate it in translator logic in order to incorporate grammar we have to ensure many things the first is syntax analysis syntax is basic structure it's basically asking the question does the structure of the sentence look correct in English we could have an adverb followed by an adjective followed by a noun like very big cloud and then we have semantic analysis semantics is meaning and it asks the question does this sentence make sense in context if we don't follow this then we're just outputting gibberish language translation asks the chaos as we need to make sure the translated French sentence follows the similar rules clearly language is more complex than simply an assortment of tokens instead of trying to explicitly Define our own grammar what if we let the machine's neural network do it for us if you haven't heard of neural networks don't worry about it too much just think of it as a component that learns to solve problems by looking at hundreds of thousands of examples this allows the network to learn patterns in data and eventually it would be able to translate a given English sentence to French all on its own now this sounds interesting but what exactly is this network now we can actually derive the neural network architecture required based on the problem we are trying to solve in this case we need a neural network that solves the problem of language translation some English sentence is the input and it should spit out some French sentence the first thing you notice the input and outputs are both sentences or a sequence of words but computers don't understand sentences like humans do so we need to convert it into a form that they do understand and that's numbers more specifically vectors and matrices which are just an assortment of numbers representing data and so we have the first part of our Network a sentence to Vector mapper this part of the network takes an English sentence and spits out a vector of numbers that the computer can understand now this box here is a neural network and since we're dealing with sequences or sentences we use what's called a recurrent neural network now again if you haven't heard of a recurrent neural network think of it as a neural network that learns to solve problems that involve sentences since we're dealing with the problem of language translation and language translation requires sentences well we think current neural network so we took our English sentence and with our current neural network we converted it into a vector now we need to convert this Vector into a French sentence this Vector to sentence mapping can be done with another Network and once again since we're dealing with the sendin transformation we use another recurrent neural network and together these two recurrent neural Nets make the barbone structure for our language translator what we've constructed here is a fundamental structure for the translation and it's called the encoder decoder architecture the first Network encodes the English sentence to computer data and the second decodes the computer data to the French sentence but what are these boxes these rnns exactly they are actually long short-term memory recurring neural networks or lstm rnns we use lstm spells specifically because they can deal with longer sentences fairly well it's a very interesting neural network that was conceived way back in the 9s as simple as it sounds this encoder decoder network with lstm cells was the basis of several papers and was a state-of-the-art network in 2014 not too long ago this was the first time recurrent neural networks became wildly successful for language translation in fact if we take a look at performance the x- axis here represents the number of words in the sentence and the y- axis is the blue score it's basically the accuracy of translation higher the blue score better is the performance so it looks like this encoder decoder architecture works well for medium length sentences with about 15 to 20 words let's see how this does with longer sentence translation with an example say we have an English sentence that we want to translate to French an admitting privilege is the right of a doctor to admit a patient to a hospital or a medical center to carry out a diagnosis or procedure based on his status as a healthcare worker at a hospital now this is a long-winded sentence but a valid one it's saying a doctor has the right to admit a patient for further testing if we were to pass this into the lstm RNN encoder decoder that we talked about we would get this French translation now I don't know French so I can't directly verify how correct this is but let's pop this into Google translate and see its English translation a privilege of admission is the right of a physician to recognize a patient in the hospital or medical center of a diagnosis or to make a diagnosis according to his state of health by comparing this with the original we can see that the meaning of the sentence breaks just just after the term Medical Center the phrase medical center of a diagnosis doesn't make much sense but still it's not bad it was able to keep up for about 20 words now let's try another one consider the English sentence this kind of experience is part of Disney's effort to extend the lifetime of its series and build new relationships with audiences via digital platforms that are becoming ever more important he added when popped into the RNN encoder decoder we get this French translation let's now once again pop this French translation into Google translate and see what it spits out in English this type of experience is part of Disney's initiatives to extend the life of its news and develop links with digital players that are becoming more complex now first off it didn't generate a closing quotation mark instead of an audience with an online influence fluence they were addressed as digital players that's okay I guess but then it says the links are becoming more complex but that isn't the case in the original sentence where it says the relationship is becoming more important once again though not too bad but you can clearly see the quality of the model isn't quite optimal when translating much longer sentences so what can we do to improve this translation remember what I said before about language it has two components tokens and grammar and it is this grammar that makes language so complex the problem with the current model is that it's not entirely addressing this complexity the thing with recurrent neural networks is it's using past information to make decisions about the present this means that while generating the 10th word of a translation in French sentence it looks at the first nine words in the English Source sentence but we know that a word not only depends on the words that come before it in a sentence but also the words that come after it in a sentence all of this gives rise to the context of the word so in order to look in both directions forward and backward we replace the normal recurrent neural network with a bidirectional recurrent neural network interestingly these bnns were introduced way back in 1993 but gained popularity recently with the emergence of deep learning so if we're performing English to French translation while Jing some word in the French translation we are looking at words that come before it and the words that come after it sweet but which words exactly should we focus on more in a large sentence this could be difficult to figure out a method to figure this out was devised in a 2016 paper learning to jointly align and translate I'll explain what this is so don't worry consider an English sentence the agreement on European economic area was signed in August 1992 and this is the corresponding French translation our translator would generate the translated French sentence one word at a time while generating some I word like which words in the English sentence should be considered once would be for the I French word consider the I English word but then we get the old word word translator that we talked about in the beginning of the video and that's no fun since it's more complicated than this it needs to be something the translator learns on its own so given the English sentence and its French translation our translator will try to align them in this example ete is lined up with the English words was and signed really white means super aligned or more attention is focused on that English word while generating the French word while generating the French word europin it looks like the only word it would consult is the English word European the same goes for UT the model learns to focus its attention only on the English word August while generating the French word UT in this way the model looks at thousands of other English sentences and their corresponding French translations and it learns which English words to focus its attention on while generating the words of the French translation this alignment is learned by an extra unit called an attention mechanism and it sits between the encoder and decoder so during translation an English sentence is fed to the encoder it's encoded into some Vector which is just numberers the computer understands it's basically the same English sentence in the computer's eyes then we use an attention mechanism basically asking which French word will be generated by which English words the decoder will then generate the French translation one word at a time focusing its attention on the words determined by the attention mechanism so that's sweet this actually performs better than the original encoder decoder architecture the sentence translation is now more closely aligned with with the original Google Translates AI works exactly like this the only difference is everything is scaled up by this I mean instead of using one lstm for the encoder and decoder we use 8 and we do this because deeper networks help better model complex problems so this network is more capable of understanding the semantics of language and grammar just a recap on the final Network you want to translate English to French you pass the English text word by word to the encoder and it converts these words into a number of word vectors that's the numbers representing these words these are just numbers that represent the words themselves of the sentence these words are then just passed into an attention mechanism and this determines the English words to focus on while generating some French word this data is passed to the decoder which generates the translated frch sentence one word at a time and that's it so if you understood this you understood how Google Translates AI works so yay just know that every time you use Google Translate from now on something not so magical is actually happening behind the scenes thank you guys so much for watching and if you like the video show us some love with a like And subscribe for more awesome content and I'll see you in the next one bye-bye
Original Description
Let’s take a look at how Google Translate’s Neural Network works behind the scenes! Read these references below for the best understanding of Neural Machine Translation!
REFERENCES
[1] Landmark paper of LSTM (Hochreiter et al., 1997): https://www.bioinf.jku.at/publications/older/2604.pdf
[2] Landmark paper of Neural Machine Translation NMT (Kalchbrenner et al., 2013): https://arxiv.org/abs/1306.3584
[3] Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation (Cho et al., 2014): https://arxiv.org/abs/1406.1078
[4] Seq to Seq learning with neural networks (Sutskever et al., 2014): https://arxiv.org/abs/1409.3215)
[5] The paper that introduced Bidirectional RNN : https://pdfs.semanticscholar.org/4b80/89bc9b49f84de43acc2eb8900035f7d492b2.pdf
[6] On the properties of NMP: Encoder-Decoder Approaches (Cho et al., 2014): https://arxiv.org/pdf/1409.1259.pdf Fig. 4 (a)
[7] NMT by jointly learning to align & translate (Bahdanau et al., 2016): https://arxiv.org/pdf/1409.0473.pdf 5.2.2
[8] Google Translate Main paper (Wu et al., 2016): https://ai.google/research/pubs/pub45610
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from CodeEmporium · CodeEmporium · 31 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
▶
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Linear Regression and Multiple Regression
CodeEmporium
Logistic Regression - THE MATH YOU SHOULD KNOW!
CodeEmporium
Generative Adversarial Networks - FUTURISTIC & FUN AI !
CodeEmporium
Deep Learning on the Cloud - GPU TO LEARN FASTER
CodeEmporium
Deep Mind's AlphaGo Zero - EXPLAINED
CodeEmporium
Mask Region based Convolution Neural Networks - EXPLAINED!
CodeEmporium
Attention in Neural Networks
CodeEmporium
Depthwise Separable Convolution - A FASTER CONVOLUTION!
CodeEmporium
One Neural network learns EVERYTHING ?!
CodeEmporium
Neural Voice Cloning
CodeEmporium
AI creates Image Classifiers…by DRAWING?
CodeEmporium
Unpaired Image-Image Translation using CycleGANs
CodeEmporium
K-Means Clustering - EXPLAINED!
CodeEmporium
Random Forest Classification
CodeEmporium
Data Science in Finance
CodeEmporium
Hypothesis testing with Applications in Data Science
CodeEmporium
A/B Testing - Simply Explained
CodeEmporium
The Kernel Trick - THE MATH YOU SHOULD KNOW!
CodeEmporium
Support Vector Machines - THE MATH YOU SHOULD KNOW
CodeEmporium
Principal Component Analysis (PCA) - THE MATH YOU SHOULD KNOW!
CodeEmporium
History of Calculus - Animated
CodeEmporium
Curiosity in AI
CodeEmporium
DropBlock - A BETTER DROPOUT for Neural Networks
CodeEmporium
Autoencoders - EXPLAINED
CodeEmporium
Recurrent Neural Networks - EXPLAINED!
CodeEmporium
LSTM Networks - EXPLAINED!
CodeEmporium
Building an Image Captioner with Neural Networks
CodeEmporium
10 Machine Learning Questions - ANSWERED!
CodeEmporium
How do neural networks work?
CodeEmporium
Evolution of Face Generation | Evolution of GANs
CodeEmporium
How does Google Translate's AI work?
CodeEmporium
How to keep up with AI research?
CodeEmporium
How does YouTube recommend videos? - AI EXPLAINED!
CodeEmporium
Variational Autoencoders - EXPLAINED!
CodeEmporium
Logistic Regression - VISUALIZED!
CodeEmporium
Gradient Descent - THE MATH YOU SHOULD KNOW
CodeEmporium
Boosting - EXPLAINED!
CodeEmporium
Transformer Neural Networks - EXPLAINED! (Attention is all you need)
CodeEmporium
Loss Functions - EXPLAINED!
CodeEmporium
Optimizers - EXPLAINED!
CodeEmporium
NLP with Neural Networks & Transformers
CodeEmporium
Batch Normalization - EXPLAINED!
CodeEmporium
Activation Functions - EXPLAINED!
CodeEmporium
Data Scientist Answers Interview Questions
CodeEmporium
Why use GPU with Neural Networks?
CodeEmporium
How do GPUs speed up Neural Network training?
CodeEmporium
BERT Neural Network - EXPLAINED!
CodeEmporium
ConvNets Scaled Efficiently
CodeEmporium
Transformer Neural Net makes music! (JukeboxAI)
CodeEmporium
What do filters of Convolution Neural Network learn?
CodeEmporium
We're hosting a Machine Learning Conference!
CodeEmporium
MLconfEU 2020: Machine Learning Conference for Software Engineers
CodeEmporium
Are Neural Networks Intelligent?
CodeEmporium
Time Series Forecasting with Machine Learning
CodeEmporium
Few Shot Learning - EXPLAINED!
CodeEmporium
How does a Data Scientist Fight FRAUD?
CodeEmporium
How would a Data Scientist analyze Customer Churn?
CodeEmporium
Expectations with Machine Learning
CodeEmporium
Why Logistic Regression DOESN'T return probabilities?!
CodeEmporium
How you SHOULD code Machine Learning
CodeEmporium
More on: Reading ML Papers
View skill →Related Reads
📰
📰
📰
📰
I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Medium · AI
ICMI 2026 Reviews [D]
Reddit r/MachineLearning
Workshop submission for main conference paper under review [D]
Reddit r/MachineLearning
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Reddit r/MachineLearning
🎓
Tutor Explanation
DeepCamp AI