Can language models understand? Bender and Koller argument.

AI Coffee Break with Letitia · Beginner ·🧠 Large Language Models ·5y ago

Skills: LLM Foundations90%

Key Takeaways

The video discusses the limitations of language models in understanding meaning, referencing the 2020 paper by Bender and Koller, and explains how language models are trained on string prediction tasks, not on understanding meaning, and can't learn meaning from text completion tasks.

Full Transcript

[Music] hey there nice of you to check out this video on the ai coffee break channel that just surpassed 500 subscribers crazy miss coffee bean and me still cannot believe it this channel that started first as a creative and playful spin-off to my semester of teaching in corona mode now has so many subscribers that follow the content regularly this is 20 times as many people than i have thought in one semester and on youtube we can explain things to the whole globe and do this even when i am sleeping because miss coffee bean does the whole explaining thank you all for joining us in our humble beginnings miss coffeebean will do her best not to disappoint you in the future okay enough of this we get to the topic of today's video this video is taking a breathing pause after miss coffee bean is quite overwhelmed with recent advances like gpt3 in machine learning especially in natural language processing or short nlp so if you feel a little overwhelmed too let's take a top-down theoretical view on the whole situation of data-driven language models in this video we will address three important questions what is meaning how far is state-of-the-art in nlp from meaning and do publications please use the word meaning especially we are discussing the paper of emily bender and alexander collar this is not only miss coffee bean's favorite paper of the month but it won the best themed paper award at acl 2020 for the theme of the conference taking stock of where we've been and where we're going even though this video is about nlp we think that there are important messages that everyone in ai should think about the paper delivers theoretical considerations about what the language model trained purely on text completion can learn about meaning what is meaning well wait for it we will come to that later now we ask how often have you heard academic publications news or even miss coffee bean say that large language models especially transformer architectures like gpt3 or bird understand natural language and learn meaning far too often to be honest and in miss coffee beans defense she also puts it like this from time to time trying to simplify things and deliver messages quickly otherwise she's afraid she might be spending too much time on defining appropriate terms or opening a philosophical discussion about how far we are on the ladder towards natural language understanding but now there is no more way around this it is time to discuss this with the help of this paper and at least warn that words like meaning and understanding can be very easily misused and misinterpreted when we are talking about language models like bird or gpd and this leads to an undesirable mismatch between what research tells it does and what it really does so the author's right we argue that the language modeling task because it only uses form as training data can not in principle lead to learning of meaning this is a very strong claim to accept especially if you have seen bird perform magic on biology data or seen gpt3 translate make copycat analogies and many other things but wait a little what does this claim even mean we will not be on the same page until we have explained the terminology so it is time for the terminology taser it is very important to define what a language model is with language models the authors quote refer to any system trained only on the task of string prediction whether it operates over characters words or sentences and sequentially or not end quote so bird predicting masked words or tokens in sentences is definitely falling into the category of a language model now what does meaning mean quote we take linguistic meaning to be the relation between a linguistic form and communicative intent wait what have we just explained the term with two other ones yeah but don't worry because we are at the terminology button here to get away from the range of the terminology taser we will now build an example based on alice and bob talking to each other alice is hungry and she wants bob to bring her food this is her communicative intent she wants him to get her food but for bob to get a message alice needs to realize her communicative intent into a form or into something in the real world this could be imperative air vibrations that bob ears perceive as alice's commanding voice get me food the trace of pen on paper the pixels of this video or the hands and the face in science languages now for bob to get a message he has to understand the meaning so he has to find out the right combination of all possible pairs of expressions and intents that alice could have notice that bob does not have access to the communicative intent but only to the form then if he gets the right meaning of the form without access to the communicative intent he has an understanding of the situation but he has to be careful because the meaning of alice's get me food expression is not to be confused with the many conventional meanings to alice's expression for example she might have said get me food as a request for something to eat or she might just have wanted bob to bring her their dog called food this is not a conventional meaning of food you might say well keep in mind that this is only the term for describing all potential meanings of an expression the conventional meaning does not have to be neither widely accepted nor in any dictionary think about you and your best friend inventing words or using words as a code this meaning is in no dictionary but you both know what that utterance means but just imagine the gap between form and meaning is bigger than one things because the number of communicative intents is huge we as humans cheat a little because we have access to the physical social or mental situation of our interlocutor for the language model so a system that was trained only on strings to do string completion does not and while humans given a form try really hard to make a mental model of the communicative intent of the interlocutor and understand the meaning a language model never does it gets the form and compute similarities and likelihoods to other forms that have to be the right response without building a model of the communicative intent and this one is crucial in the relation between form and meaning so the authors argue that quote the language modeling task because it only uses form as training data can not in principle lead to learning of meaning yes so what gpt 3 is still performing like it would know a lot about the world well it seems like quote their apparent ability to reason is sometimes a mirage built on leveraging artifacts in the training data so on form not meaning end quote all that the language model has seen is an enormous amount of utterances about the world but it does not seek clues from the communicative intent and it also doesn't try to model it to put it another way would you expect a child to learn what language means if it did not interact with the world at all but hear a recorded tape and trying to repeat the words to make this even more clear the authors propose an octopus thought experiment miss coffee bean retells it now with minor alterations once upon a later time we have again alice and bob now separated on two different islands because bob did not succeed in bringing alice food but because they still miss each other they are sending messages to each other through an underwater cable only that mr o this octopus is super intelligent and has tapped into their communication line of course mr o does not speak english but over time he learns patterns and can predict what bob's responses to alice are and start sending messages to alice instead of bob alice does not notice the exchange since mr o is getting really good at his impersonation he can even realize that some words are used in similar context and can be exchanged if needed can construct whole plausible sentences only that mr o has never seen the objects that alice and bob are talking about because their communication is kind of regular this is enough for the while when not much changes on alice's and bob's islands but one day alice says that she invented the coconut catapult and sends the building plan to bob mr o intercepts the message but can not rebuild the invention because he does not know what ropes or coconuts are but from earlier conversations he noticed that bob would react with very excited like this great job did he really understand what the coconut catapult is and what it is for no he did not grasp the communicative intent of alice because he misses important observations about her world happy with mr oath's reply impersonating bob alice asks what should i use the coconut catapult for should i use it for sending you coconuts to your island or should i use it to defend against this bear lurking in the bush well mr o has a problem now not knowing how to respond what is the friction coefficient of air how does gravity without the archimedes force works how far are the islands apart how much does bob want a coconut he would not even get this far with the questions because what is a coconut and what is a catapult in the first place all he knows is that coconuts are used in the context of food and catapults are used in the context of defending but can coconuts defend against a bear can coconuts fly from island to island mr o had a great time impersonating bob but now he has to admit his defeat the allegory of mr o was meant to clarify that language models are in mr oh's situation so what to do we have to keep in mind that all arguments so far hold only for language models so systems that are trained to predict missing strings in a sequence we can also think about new tasks and data sets that are augmented with meaning for example about those that try to ground language in a visual situation for more on this check out our video linked right now and also in the description below also that a language model does not require meaning does not mean that bird does not deliver amazing results this paper tries to set a warning sign that reveals what meaning is how far state-of-the-art systems are from meaning and how publications misuse the word meaning in a field where after gpt 2 comes gpt 3 with even more impressive results by only being trained on language modeling we have to take a break to think about what the system lets us think it can do and what it can really do because we didn't address all aspects of the paper miss coffee bean really recommends you to read the paper yourself or even have the authors read it out loud for you links to the paper into soundcloud are in the description below let us know in the comments what you think this is kind of a controversial and upsetting topic and it might need some time for digestion thanks for watching patiently until the end see you next time bye [Music]

Original Description

What is meaning? How far is state-of-the art NLP from meaning? Do publications misuse the word “meaning”? Can large language models such as ChatGPT understand? Bender and Koller 2020 paper explained, about language models and understanding. ➡️ AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring.com/ ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ 🔥 Optionally, pay us a coffee to boost our Coffee Bean production! ☕ Patreon: https://www.patreon.com/AICoffeeBreak Ko-fi: https://ko-fi.com/aicoffeebreak ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ 📺 Transformer combining Vision and Language: https://youtu.be/dd7nE4nbxN0 📺 GPT-3 video: https://youtu.be/5fqxPOaaqi0 📺 BERTology meets Biology: https://youtu.be/pFf4PltQ9LY Outline: * 00:00 500 subs celebration! * 00:53 Breathing pause * 02:12 Misuse of “meaning” * 04:22 What is meaning? * 06:39 Language models and meaning * 08:20 Octopus experiment * 11:09 What to do? 📄 Paper explained: Bender, Emily M., and Alexander Koller. "Climbing towards NLU: On meaning, form, and understanding in the age of data." In Proc. of ACL. 2020. https://www.aclweb.org/anthology/2020.acl-main.463/ 🎧 Audiopaper: https://soundcloud.com/emily-m-bender/climbingtowardsnlu-audiopaper/s-0ZT7112K1Ep 🔗 Links: YouTube: https://www.youtube.com/AICoffeeBreak Twitter: https://twitter.com/AICoffeeBreak Reddit: https://www.reddit.com/r/AICoffeeBreak/ #AICoffeeBreak #MsCoffeeBean #ACL2020 #NLU Video and thumbnail contain emojis designed by OpenMoji – the open-source emoji and icon project. License: CC BY-SA 4.0

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from AI Coffee Break with Letitia · AI Coffee Break with Letitia · 13 of 60

← Previous Next →

AI Coffee Break - Channel Trailer

AI Coffee Break - Channel Trailer

AI Coffee Break with Letitia

How to check if a neural network has learned a specific phenomenon?

How to check if a neural network has learned a specific phenomenon?

AI Coffee Break with Letitia

A brief history of the Transformer architecture in NLP

A brief history of the Transformer architecture in NLP

AI Coffee Break with Letitia

Our paper at CVPR 2020 - MUL Workshop and ACL 2020 - ALVR Workshop

Our paper at CVPR 2020 - MUL Workshop and ACL 2020 - ALVR Workshop

AI Coffee Break with Letitia

The Transformer neural network architecture EXPLAINED. “Attention is all you need”

The Transformer neural network architecture EXPLAINED. “Attention is all you need”

AI Coffee Break with Letitia

Transformer combining Vision and Language? ViLBERT - NLP meets Computer Vision

Transformer combining Vision and Language? ViLBERT - NLP meets Computer Vision

AI Coffee Break with Letitia

Pre-training of BERT-based Transformer architectures explained – language and vision!

Pre-training of BERT-based Transformer architectures explained – language and vision!

AI Coffee Break with Letitia

GPT-3 explained with examples. Possibilities, and implications.

GPT-3 explained with examples. Possibilities, and implications.

AI Coffee Break with Letitia

Adversarial Machine Learning explained! | With examples.

Adversarial Machine Learning explained! | With examples.

AI Coffee Break with Letitia

BERTology meets Biology | Solving biological problems with Transformers

BERTology meets Biology | Solving biological problems with Transformers

AI Coffee Break with Letitia

Can a neural network tell if an image is mirrored? – Visual Chirality

Can a neural network tell if an image is mirrored? – Visual Chirality

AI Coffee Break with Letitia

The ultimate intro to Graph Neural Networks. Maybe.

The ultimate intro to Graph Neural Networks. Maybe.

AI Coffee Break with Letitia

Can language models understand? Bender and Koller argument.

Can language models understand? Bender and Koller argument.

AI Coffee Break with Letitia

GANs explained | Generative Adversarial Networks video with showcase!

GANs explained | Generative Adversarial Networks video with showcase!

AI Coffee Break with Letitia

What nobody tells you about MULTIMODAL Machine Learning! 🙊 THE definition.

What nobody tells you about MULTIMODAL Machine Learning! 🙊 THE definition.

AI Coffee Break with Letitia

Multimodal Machine Learning models do not work. Here is why. Part 1/2 – The SYMPTOMS

Multimodal Machine Learning models do not work. Here is why. Part 1/2 – The SYMPTOMS

AI Coffee Break with Letitia

Why Multimodal Machine Learning models do not work. Part 2/2 – The CAUSES

Why Multimodal Machine Learning models do not work. Part 2/2 – The CAUSES

AI Coffee Break with Letitia

An image is worth 16x16 words: ViT | Vision Transformer explained

An image is worth 16x16 words: ViT | Vision Transformer explained

AI Coffee Break with Letitia

AI understanding language!? A roadmap to natural language understanding.

AI understanding language!? A roadmap to natural language understanding.

AI Coffee Break with Letitia

"What Can We Do to Improve Peer Review in NLP?" 👀

"What Can We Do to Improve Peer Review in NLP?" 👀

AI Coffee Break with Letitia

The curse of dimensionality. Or is it a blessing?

The curse of dimensionality. Or is it a blessing?

AI Coffee Break with Letitia

PCA explained with intuition, a little math and code

PCA explained with intuition, a little math and code

AI Coffee Break with Letitia

Data-efficient Image Transformers EXPLAINED! Facebook AI's DeiT paper

Data-efficient Image Transformers EXPLAINED! Facebook AI's DeiT paper

AI Coffee Break with Letitia

OpenAI's DALL-E explained. How GPT-3 creates images from descriptions.

OpenAI's DALL-E explained. How GPT-3 creates images from descriptions.

AI Coffee Break with Letitia

Leaking training data from GPT-2. How is this possible?

Leaking training data from GPT-2. How is this possible?

AI Coffee Break with Letitia

OpenAI’s CLIP explained! | Examples, links to code and pretrained model

OpenAI’s CLIP explained! | Examples, links to code and pretrained model

AI Coffee Break with Letitia

Transformers can do both images and text. Here is why.

Transformers can do both images and text. Here is why.

AI Coffee Break with Letitia

UMAP explained | The best dimensionality reduction?

UMAP explained | The best dimensionality reduction?

AI Coffee Break with Letitia

NVIDIA Jarvis (now NVIDIA Riva) meets Ms. Coffee Bean

NVIDIA Jarvis (now NVIDIA Riva) meets Ms. Coffee Bean

AI Coffee Break with Letitia

Transformer in Transformer: Paper explained and visualized | TNT

Transformer in Transformer: Paper explained and visualized | TNT

AI Coffee Break with Letitia

[RANT] Adversarial attack on OpenAI’s CLIP? Are we the fools or the foolers?

[RANT] Adversarial attack on OpenAI’s CLIP? Are we the fools or the foolers?

AI Coffee Break with Letitia

Pattern Exploiting Training explained! | PET, iPET, ADAPET

Pattern Exploiting Training explained! | PET, iPET, ADAPET

AI Coffee Break with Letitia

Deep Learning for Symbolic Mathematics!? | Paper EXPLAINED

Deep Learning for Symbolic Mathematics!? | Paper EXPLAINED

AI Coffee Break with Letitia

FNet: Mixing Tokens with Fourier Transforms – Paper Explained

FNet: Mixing Tokens with Fourier Transforms – Paper Explained

AI Coffee Break with Letitia

Are Pre-trained Convolutions Better than Pre-trained Transformers? – Paper Explained

Are Pre-trained Convolutions Better than Pre-trained Transformers? – Paper Explained

AI Coffee Break with Letitia

"Please Commit More Blatant Academic Fraud" – A fellow PhD student's response.

"Please Commit More Blatant Academic Fraud" – A fellow PhD student's response.

AI Coffee Break with Letitia

Scaling Vision Transformers? How much data can a transformer get? #Shorts

Scaling Vision Transformers? How much data can a transformer get? #Shorts

AI Coffee Break with Letitia

How cross-modal are vision and language models really? 👀 Seeing past words. [Own work]

How cross-modal are vision and language models really? 👀 Seeing past words. [Own work]

AI Coffee Break with Letitia

Charformer: Fast Character Transformers via Gradient-based Subword Tokenization +Tokenizer explained

Charformer: Fast Character Transformers via Gradient-based Subword Tokenization +Tokenizer explained

AI Coffee Break with Letitia

Positional embeddings in transformers EXPLAINED | Demystifying positional encodings.

Positional embeddings in transformers EXPLAINED | Demystifying positional encodings.

AI Coffee Break with Letitia

Adding vs. concatenating positional embeddings & Learned positional encodings

Adding vs. concatenating positional embeddings & Learned positional encodings

AI Coffee Break with Letitia

Self-Attention with Relative Position Representations – Paper explained

Self-Attention with Relative Position Representations – Paper explained

AI Coffee Break with Letitia

Saddle points vs. local minima in high dimensional spaces | ❓ #AICoffeeBreakQuiz #Shorts

Saddle points vs. local minima in high dimensional spaces | ❓ #AICoffeeBreakQuiz #Shorts

AI Coffee Break with Letitia

What is the model identifiability problem? | Explained in 60 seconds! | ❓ #AICoffeeBreakQuiz #Shorts

What is the model identifiability problem? | Explained in 60 seconds! | ❓ #AICoffeeBreakQuiz #Shorts

AI Coffee Break with Letitia

Data leakage during data preparation? | Using AntiPatterns to avoid MLOps Mistakes

Data leakage during data preparation? | Using AntiPatterns to avoid MLOps Mistakes

AI Coffee Break with Letitia

Is today's AI smarter than YOU? #Shorts

Is today's AI smarter than YOU? #Shorts

AI Coffee Break with Letitia

Convolution vs Cross-Correlation. How most CNNs do not compute convolutions. | ❓ #Shorts

Convolution vs Cross-Correlation. How most CNNs do not compute convolutions. | ❓ #Shorts

AI Coffee Break with Letitia

Why do we care about cross-correlations vs convolutions | ❓ #AICoffeeBreakQuiz #Shorts

Why do we care about cross-correlations vs convolutions | ❓ #AICoffeeBreakQuiz #Shorts

AI Coffee Break with Letitia

The convolution is not shift invariant. | Invariance vs Equivariance | ❓ #AICoffeeBreakQuiz #Shorts

The convolution is not shift invariant. | Invariance vs Equivariance | ❓ #AICoffeeBreakQuiz #Shorts

AI Coffee Break with Letitia

How to increase the receptive field in CNNs? | #AICoffeeBreakQuiz #Shorts

How to increase the receptive field in CNNs? | #AICoffeeBreakQuiz #Shorts

AI Coffee Break with Letitia

What is tokenization and how does it work? Tokenizers explained.

What is tokenization and how does it work? Tokenizers explained.

AI Coffee Break with Letitia

Foundation Models | On the opportunities and risks of calling pre-trained models “Foundation Models”

Foundation Models | On the opportunities and risks of calling pre-trained models “Foundation Models”

AI Coffee Break with Letitia

How modern search engines work – Vector databases explained! | Weaviate open-source

How modern search engines work – Vector databases explained! | Weaviate open-source

AI Coffee Break with Letitia

Eyes tell all: How to tell that an AI generated a face?

Eyes tell all: How to tell that an AI generated a face?

AI Coffee Break with Letitia

Swin Transformer paper animated and explained

Swin Transformer paper animated and explained

AI Coffee Break with Letitia

Data BAD | What Will it Take to Fix Benchmarking for NLU?

Data BAD | What Will it Take to Fix Benchmarking for NLU?

AI Coffee Break with Letitia

SimVLM explained | What the paper doesn’t tell you

SimVLM explained | What the paper doesn’t tell you

AI Coffee Break with Letitia

Generalization – Interpolation – Extrapolation in Machine Learning: Which is it now!?

Generalization – Interpolation – Extrapolation in Machine Learning: Which is it now!?

AI Coffee Break with Letitia

Do Transformers process sequences of FIXED or of VARIABLE length? | #AICoffeeBreakQuiz

Do Transformers process sequences of FIXED or of VARIABLE length? | #AICoffeeBreakQuiz

AI Coffee Break with Letitia

The efficiency misnomer | Size does not matter | What does the number of parameters mean in a model?

The efficiency misnomer | Size does not matter | What does the number of parameters mean in a model?

AI Coffee Break with Letitia

This video teaches the basics of language models and their limitations in understanding meaning, highlighting the key points from the Bender and Koller paper, and explaining how language models are trained and what they can and cannot do. It matters because it helps viewers understand the current state of NLP and the differences between human and machine understanding. The video provides a foundation for further learning about language models and their applications.

Key Takeaways

Understand the concept of meaning in language
Learn how language models are trained
Recognize the limitations of language models in understanding meaning
Read the Bender and Koller paper for more information
Experiment with language models to see their limitations firsthand

💡 Language models are limited in their ability to understand meaning and can be fooled by cleverly designed inputs, highlighting the need for further research and development in NLP.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

The 2026 AI Model Release Race: Every Major LLM Launch You Need to Know

Stay updated on the 2026 AI model release race, including major LLM launches like Claude Sonnet 5 and GPT-5.6, to leverage the latest advancements in AI technology

Call GPT, Claude, and Gemini from one API key — a 3-step setup

Access GPT, Claude, and Gemini through one API key with a 3-step setup using Modelishub

Your LLM Doesn’t Pick Stocks — It Remembers Them

Discover how LLMs remember stock picks rather than making actual predictions, and why this matters for AI-driven investment strategies

Medium · Machine Learning

Word Representation

Learn how word representation works in NLP and its importance in understanding human language, enabling applications like text classification and language translation

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)