Can language models understand? Bender and Koller argument.
Skills:
LLM Foundations90%
Key Takeaways
The video discusses the limitations of language models in understanding meaning, referencing the 2020 paper by Bender and Koller, and explains how language models are trained on string prediction tasks, not on understanding meaning, and can't learn meaning from text completion tasks.
Full Transcript
[Music] hey there nice of you to check out this video on the ai coffee break channel that just surpassed 500 subscribers crazy miss coffee bean and me still cannot believe it this channel that started first as a creative and playful spin-off to my semester of teaching in corona mode now has so many subscribers that follow the content regularly this is 20 times as many people than i have thought in one semester and on youtube we can explain things to the whole globe and do this even when i am sleeping because miss coffee bean does the whole explaining thank you all for joining us in our humble beginnings miss coffeebean will do her best not to disappoint you in the future okay enough of this we get to the topic of today's video this video is taking a breathing pause after miss coffee bean is quite overwhelmed with recent advances like gpt3 in machine learning especially in natural language processing or short nlp so if you feel a little overwhelmed too let's take a top-down theoretical view on the whole situation of data-driven language models in this video we will address three important questions what is meaning how far is state-of-the-art in nlp from meaning and do publications please use the word meaning especially we are discussing the paper of emily bender and alexander collar this is not only miss coffee bean's favorite paper of the month but it won the best themed paper award at acl 2020 for the theme of the conference taking stock of where we've been and where we're going even though this video is about nlp we think that there are important messages that everyone in ai should think about the paper delivers theoretical considerations about what the language model trained purely on text completion can learn about meaning what is meaning well wait for it we will come to that later now we ask how often have you heard academic publications news or even miss coffee bean say that large language models especially transformer architectures like gpt3 or bird understand natural language and learn meaning far too often to be honest and in miss coffee beans defense she also puts it like this from time to time trying to simplify things and deliver messages quickly otherwise she's afraid she might be spending too much time on defining appropriate terms or opening a philosophical discussion about how far we are on the ladder towards natural language understanding but now there is no more way around this it is time to discuss this with the help of this paper and at least warn that words like meaning and understanding can be very easily misused and misinterpreted when we are talking about language models like bird or gpd and this leads to an undesirable mismatch between what research tells it does and what it really does so the author's right we argue that the language modeling task because it only uses form as training data can not in principle lead to learning of meaning this is a very strong claim to accept especially if you have seen bird perform magic on biology data or seen gpt3 translate make copycat analogies and many other things but wait a little what does this claim even mean we will not be on the same page until we have explained the terminology so it is time for the terminology taser it is very important to define what a language model is with language models the authors quote refer to any system trained only on the task of string prediction whether it operates over characters words or sentences and sequentially or not end quote so bird predicting masked words or tokens in sentences is definitely falling into the category of a language model now what does meaning mean quote we take linguistic meaning to be the relation between a linguistic form and communicative intent wait what have we just explained the term with two other ones yeah but don't worry because we are at the terminology button here to get away from the range of the terminology taser we will now build an example based on alice and bob talking to each other alice is hungry and she wants bob to bring her food this is her communicative intent she wants him to get her food but for bob to get a message alice needs to realize her communicative intent into a form or into something in the real world this could be imperative air vibrations that bob ears perceive as alice's commanding voice get me food the trace of pen on paper the pixels of this video or the hands and the face in science languages now for bob to get a message he has to understand the meaning so he has to find out the right combination of all possible pairs of expressions and intents that alice could have notice that bob does not have access to the communicative intent but only to the form then if he gets the right meaning of the form without access to the communicative intent he has an understanding of the situation but he has to be careful because the meaning of alice's get me food expression is not to be confused with the many conventional meanings to alice's expression for example she might have said get me food as a request for something to eat or she might just have wanted bob to bring her their dog called food this is not a conventional meaning of food you might say well keep in mind that this is only the term for describing all potential meanings of an expression the conventional meaning does not have to be neither widely accepted nor in any dictionary think about you and your best friend inventing words or using words as a code this meaning is in no dictionary but you both know what that utterance means but just imagine the gap between form and meaning is bigger than one things because the number of communicative intents is huge we as humans cheat a little because we have access to the physical social or mental situation of our interlocutor for the language model so a system that was trained only on strings to do string completion does not and while humans given a form try really hard to make a mental model of the communicative intent of the interlocutor and understand the meaning a language model never does it gets the form and compute similarities and likelihoods to other forms that have to be the right response without building a model of the communicative intent and this one is crucial in the relation between form and meaning so the authors argue that quote the language modeling task because it only uses form as training data can not in principle lead to learning of meaning yes so what gpt 3 is still performing like it would know a lot about the world well it seems like quote their apparent ability to reason is sometimes a mirage built on leveraging artifacts in the training data so on form not meaning end quote all that the language model has seen is an enormous amount of utterances about the world but it does not seek clues from the communicative intent and it also doesn't try to model it to put it another way would you expect a child to learn what language means if it did not interact with the world at all but hear a recorded tape and trying to repeat the words to make this even more clear the authors propose an octopus thought experiment miss coffee bean retells it now with minor alterations once upon a later time we have again alice and bob now separated on two different islands because bob did not succeed in bringing alice food but because they still miss each other they are sending messages to each other through an underwater cable only that mr o this octopus is super intelligent and has tapped into their communication line of course mr o does not speak english but over time he learns patterns and can predict what bob's responses to alice are and start sending messages to alice instead of bob alice does not notice the exchange since mr o is getting really good at his impersonation he can even realize that some words are used in similar context and can be exchanged if needed can construct whole plausible sentences only that mr o has never seen the objects that alice and bob are talking about because their communication is kind of regular this is enough for the while when not much changes on alice's and bob's islands but one day alice says that she invented the coconut catapult and sends the building plan to bob mr o intercepts the message but can not rebuild the invention because he does not know what ropes or coconuts are but from earlier conversations he noticed that bob would react with very excited like this great job did he really understand what the coconut catapult is and what it is for no he did not grasp the communicative intent of alice because he misses important observations about her world happy with mr oath's reply impersonating bob alice asks what should i use the coconut catapult for should i use it for sending you coconuts to your island or should i use it to defend against this bear lurking in the bush well mr o has a problem now not knowing how to respond what is the friction coefficient of air how does gravity without the archimedes force works how far are the islands apart how much does bob want a coconut he would not even get this far with the questions because what is a coconut and what is a catapult in the first place all he knows is that coconuts are used in the context of food and catapults are used in the context of defending but can coconuts defend against a bear can coconuts fly from island to island mr o had a great time impersonating bob but now he has to admit his defeat the allegory of mr o was meant to clarify that language models are in mr oh's situation so what to do we have to keep in mind that all arguments so far hold only for language models so systems that are trained to predict missing strings in a sequence we can also think about new tasks and data sets that are augmented with meaning for example about those that try to ground language in a visual situation for more on this check out our video linked right now and also in the description below also that a language model does not require meaning does not mean that bird does not deliver amazing results this paper tries to set a warning sign that reveals what meaning is how far state-of-the-art systems are from meaning and how publications misuse the word meaning in a field where after gpt 2 comes gpt 3 with even more impressive results by only being trained on language modeling we have to take a break to think about what the system lets us think it can do and what it can really do because we didn't address all aspects of the paper miss coffee bean really recommends you to read the paper yourself or even have the authors read it out loud for you links to the paper into soundcloud are in the description below let us know in the comments what you think this is kind of a controversial and upsetting topic and it might need some time for digestion thanks for watching patiently until the end see you next time bye [Music]
Original Description
What is meaning? How far is state-of-the art NLP from meaning? Do publications misuse the word “meaning”? Can large language models such as ChatGPT understand?
Bender and Koller 2020 paper explained, about language models and understanding.
➡️ AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring.com/
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
🔥 Optionally, pay us a coffee to boost our Coffee Bean production! ☕
Patreon: https://www.patreon.com/AICoffeeBreak
Ko-fi: https://ko-fi.com/aicoffeebreak
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
📺 Transformer combining Vision and Language: https://youtu.be/dd7nE4nbxN0
📺 GPT-3 video: https://youtu.be/5fqxPOaaqi0
📺 BERTology meets Biology: https://youtu.be/pFf4PltQ9LY
Outline:
* 00:00 500 subs celebration!
* 00:53 Breathing pause
* 02:12 Misuse of “meaning”
* 04:22 What is meaning?
* 06:39 Language models and meaning
* 08:20 Octopus experiment
* 11:09 What to do?
📄 Paper explained: Bender, Emily M., and Alexander Koller. "Climbing towards NLU: On meaning, form, and understanding in the age of data." In Proc. of ACL. 2020. https://www.aclweb.org/anthology/2020.acl-main.463/
🎧 Audiopaper: https://soundcloud.com/emily-m-bender/climbingtowardsnlu-audiopaper/s-0ZT7112K1Ep
🔗 Links:
YouTube: https://www.youtube.com/AICoffeeBreak
Twitter: https://twitter.com/AICoffeeBreak
Reddit: https://www.reddit.com/r/AICoffeeBreak/
#AICoffeeBreak #MsCoffeeBean #ACL2020 #NLU
Video and thumbnail contain emojis designed by OpenMoji – the open-source emoji and icon project. License: CC BY-SA 4.0
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from AI Coffee Break with Letitia · AI Coffee Break with Letitia · 13 of 60
1
2
3
4
5
6
7
8
9
10
11
12
▶
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
AI Coffee Break - Channel Trailer
AI Coffee Break with Letitia
How to check if a neural network has learned a specific phenomenon?
AI Coffee Break with Letitia
A brief history of the Transformer architecture in NLP
AI Coffee Break with Letitia
Our paper at CVPR 2020 - MUL Workshop and ACL 2020 - ALVR Workshop
AI Coffee Break with Letitia
The Transformer neural network architecture EXPLAINED. “Attention is all you need”
AI Coffee Break with Letitia
Transformer combining Vision and Language? ViLBERT - NLP meets Computer Vision
AI Coffee Break with Letitia
Pre-training of BERT-based Transformer architectures explained – language and vision!
AI Coffee Break with Letitia
GPT-3 explained with examples. Possibilities, and implications.
AI Coffee Break with Letitia
Adversarial Machine Learning explained! | With examples.
AI Coffee Break with Letitia
BERTology meets Biology | Solving biological problems with Transformers
AI Coffee Break with Letitia
Can a neural network tell if an image is mirrored? – Visual Chirality
AI Coffee Break with Letitia
The ultimate intro to Graph Neural Networks. Maybe.
AI Coffee Break with Letitia
Can language models understand? Bender and Koller argument.
AI Coffee Break with Letitia
GANs explained | Generative Adversarial Networks video with showcase!
AI Coffee Break with Letitia
What nobody tells you about MULTIMODAL Machine Learning! 🙊 THE definition.
AI Coffee Break with Letitia
Multimodal Machine Learning models do not work. Here is why. Part 1/2 – The SYMPTOMS
AI Coffee Break with Letitia
Why Multimodal Machine Learning models do not work. Part 2/2 – The CAUSES
AI Coffee Break with Letitia
An image is worth 16x16 words: ViT | Vision Transformer explained
AI Coffee Break with Letitia
AI understanding language!? A roadmap to natural language understanding.
AI Coffee Break with Letitia
"What Can We Do to Improve Peer Review in NLP?" 👀
AI Coffee Break with Letitia
The curse of dimensionality. Or is it a blessing?
AI Coffee Break with Letitia
PCA explained with intuition, a little math and code
AI Coffee Break with Letitia
Data-efficient Image Transformers EXPLAINED! Facebook AI's DeiT paper
AI Coffee Break with Letitia
OpenAI's DALL-E explained. How GPT-3 creates images from descriptions.
AI Coffee Break with Letitia
Leaking training data from GPT-2. How is this possible?
AI Coffee Break with Letitia
OpenAI’s CLIP explained! | Examples, links to code and pretrained model
AI Coffee Break with Letitia
Transformers can do both images and text. Here is why.
AI Coffee Break with Letitia
UMAP explained | The best dimensionality reduction?
AI Coffee Break with Letitia
NVIDIA Jarvis (now NVIDIA Riva) meets Ms. Coffee Bean
AI Coffee Break with Letitia
Transformer in Transformer: Paper explained and visualized | TNT
AI Coffee Break with Letitia
[RANT] Adversarial attack on OpenAI’s CLIP? Are we the fools or the foolers?
AI Coffee Break with Letitia
Pattern Exploiting Training explained! | PET, iPET, ADAPET
AI Coffee Break with Letitia
Deep Learning for Symbolic Mathematics!? | Paper EXPLAINED
AI Coffee Break with Letitia
FNet: Mixing Tokens with Fourier Transforms – Paper Explained
AI Coffee Break with Letitia
Are Pre-trained Convolutions Better than Pre-trained Transformers? – Paper Explained
AI Coffee Break with Letitia
"Please Commit More Blatant Academic Fraud" – A fellow PhD student's response.
AI Coffee Break with Letitia
Scaling Vision Transformers? How much data can a transformer get? #Shorts
AI Coffee Break with Letitia
How cross-modal are vision and language models really? 👀 Seeing past words. [Own work]
AI Coffee Break with Letitia
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization +Tokenizer explained
AI Coffee Break with Letitia
Positional embeddings in transformers EXPLAINED | Demystifying positional encodings.
AI Coffee Break with Letitia
Adding vs. concatenating positional embeddings & Learned positional encodings
AI Coffee Break with Letitia
Self-Attention with Relative Position Representations – Paper explained
AI Coffee Break with Letitia
Saddle points vs. local minima in high dimensional spaces | ❓ #AICoffeeBreakQuiz #Shorts
AI Coffee Break with Letitia
What is the model identifiability problem? | Explained in 60 seconds! | ❓ #AICoffeeBreakQuiz #Shorts
AI Coffee Break with Letitia
Data leakage during data preparation? | Using AntiPatterns to avoid MLOps Mistakes
AI Coffee Break with Letitia
Is today's AI smarter than YOU? #Shorts
AI Coffee Break with Letitia
Convolution vs Cross-Correlation. How most CNNs do not compute convolutions. | ❓ #Shorts
AI Coffee Break with Letitia
Why do we care about cross-correlations vs convolutions | ❓ #AICoffeeBreakQuiz #Shorts
AI Coffee Break with Letitia
The convolution is not shift invariant. | Invariance vs Equivariance | ❓ #AICoffeeBreakQuiz #Shorts
AI Coffee Break with Letitia
How to increase the receptive field in CNNs? | #AICoffeeBreakQuiz #Shorts
AI Coffee Break with Letitia
What is tokenization and how does it work? Tokenizers explained.
AI Coffee Break with Letitia
Foundation Models | On the opportunities and risks of calling pre-trained models “Foundation Models”
AI Coffee Break with Letitia
How modern search engines work – Vector databases explained! | Weaviate open-source
AI Coffee Break with Letitia
Eyes tell all: How to tell that an AI generated a face?
AI Coffee Break with Letitia
Swin Transformer paper animated and explained
AI Coffee Break with Letitia
Data BAD | What Will it Take to Fix Benchmarking for NLU?
AI Coffee Break with Letitia
SimVLM explained | What the paper doesn’t tell you
AI Coffee Break with Letitia
Generalization – Interpolation – Extrapolation in Machine Learning: Which is it now!?
AI Coffee Break with Letitia
Do Transformers process sequences of FIXED or of VARIABLE length? | #AICoffeeBreakQuiz
AI Coffee Break with Letitia
The efficiency misnomer | Size does not matter | What does the number of parameters mean in a model?
AI Coffee Break with Letitia
More on: LLM Foundations
View skill →
🎓
Tutor Explanation
DeepCamp AI