Meaning Without Objects: Why LLMs Don’t Need to See a Dog to Know What ‘Dog’ Means
In the last few years, the field of Natural Language Processing (NLP) has experienced a seismic shift. We have moved from systems that struggle to construct a coherent sentence to Large Language Models (LLMs) like GPT-4, which can pass the Uniform Bar Exam in the 90th percentile.
This performance creates a cognitive dissonance for researchers and students alike. On one hand, these models generate text that appears deeply knowledgeable, reasoned, and coherent. On the other hand, we know they are, at their core, statistical engines predicting the next token in a sequence. They have never seen a sunset, felt a “stick,” or petted a “dog.”
This leads to the central question of modern AI philosophy: Do LLMs actually understand language?
Many critics argue “no.” They claim that because LLMs only process text (form) without access to the real world (meaning), they are forever trapped in a “Chinese Room,” manipulating symbols they don’t understand. This argument relies on a concept called the Symbol Grounding Problem (SGP).
However, a fascinating recent paper by Reto Gubelmann from the University of Zurich argues that this criticism is based on an outdated philosophical premise. The paper, Pragmatic Norms Are All You Need, suggests that we are looking at meaning the wrong way. If we shift our perspective from a “Correspondence Theory” of meaning to a “Pragmatic” one, the Symbol Grounding Problem disappears entirely.
In this post, we will unpack this argument, explore why the “Octopus Test” might be misleading, and discover why LLMs might understand us better than we think—without ever stepping foot in the real world.
Part 1: The Octopus in the Room
To understand why people think LLMs can’t understand meaning, we first need to look at the strongest argument against them. This is famously encapsulated in the “Octopus Test,” a thought experiment proposed by Bender and Koller in 2020.
The Original Thought Experiment
Imagine two people, A and B, stranded on two separate islands. They communicate via an underwater telegraph cable. They are both human, they speak English, and they understand the world.
Now, imagine a hyper-intelligent, deep-sea Octopus (representing the LLM) taps into the cable. The Octopus doesn’t know what the words mean, and it has never seen the surface world. It just listens. Over time, it learns the statistical patterns of A and B perfectly. It knows that when A says “How are you?”, B usually replies “I’m fine.”
One day, the Octopus cuts the cable and starts impersonating B. It chats with A, and A doesn’t notice the difference. But then, a crisis happens. A is attacked by a bear. A types frantically: “Help! A bear is attacking me! What do I do with these sticks?”
Bender and Koller argue that the Octopus fails here. It has seen the word “bear” and “stick,” but it has no idea what a bear actually is or how a stick functions physically in the world. It lacks Grounding. It cannot map the symbol “stick” to the physical object stick. Therefore, it cannot give meaningful advice, and the illusion of understanding shatters.
The Extension: Why the Octopus Might Succeed
The current paper argues that this conclusion is flawed. The problem isn’t that the Octopus lacks a body; the problem is that the Octopus in the original scenario didn’t have the right training data.

As shown in Figure 1, the author extends the thought experiment (the green area). Imagine that besides A and B, there are other islanders, C and D, who are “Bear Defense Experts.” They talk constantly about how to fend off bears using sticks.
If the Octopus listens to C and D for long enough, it will learn the patterns of language associated with successful bear defense. It will learn that the sequence of words “poke the bear in the nose” is statistically associated with “bad outcome,” while “make yourself big and wave the sticks” is associated with “survival.”
When A cries for help, the Octopus can retrieve this pattern and provide the correct advice. Does the Octopus need to physically hold the stick? The author argues no. The Octopus has solved the Engineering Problem of symbol grounding.
But critics aren’t satisfied with engineering solutions. They care about the Philosophical Problem. They would argue: “Sure, the Octopus gave the right answer, but it still doesn’t know what a bear is. It’s just parroting statistics.”
To understand why this philosophical objection exists—and why it might be wrong—we have to go back to the 1990s.
Part 2: The Origins of the Symbol Grounding Problem
The Symbol Grounding Problem (SGP) was introduced by Stevan Harnad in 1990. It was originally a critique of the dominant theory of AI at the time: The Computational Theory of Mind (CTM).
The Computer in the Brain
The CTM suggests that the human mind is essentially a computer processing a special programming language called “Mentalese” (or a Language of Thought). In this view, when you think about a dog, your brain is manipulating an internal symbol, let’s call it SYMBOL_DOG.
The problem arises when you ask: How does SYMBOL_DOG connect to a real, furry, barking dog?
If your brain is just a computer manipulating abstract symbols based on syntactic rules (syntax), how does it ever get to meaning (semantics)? This is the classic “Chinese Room” dilemma. You can have a rulebook that tells you how to manipulate Chinese characters perfectly, but if you don’t speak Chinese, the characters are just meaningless squiggles to you.

Figure 2 illustrates this trap. On the left, we have the “Mental Symbols” inside the head—the computational process. On the right, we have the real world (the dog). The SGP is the question mark in the middle: How do we bridge the gap?
For decades, philosophers assumed that for an AI to have meaning, it must build this bridge. It must “ground” its internal symbols in external sensory experience. This is why Bender and Koller argue that LLMs (which have no sensory experience) cannot have meaning.
But here is the twist: What if the CTM is wrong? What if meaning doesn’t come from mapping symbols to objects at all?
Part 3: A Tale of Two Theories of Meaning
The paper argues that the Symbol Grounding Problem only exists if you subscribe to a specific, and perhaps outdated, theory of meaning: the Correspondence Theory.
Correspondence Theory (The “Augustinian Picture”)
This is the intuitive view most of us hold.
- Grand Picture: Language is a system of symbols (syntax + semantics).
- Meaning: Meaning is created by mapping a word (symbol) to a thing in the world (referent).
- Unit: The primary unit of meaning is the concept (e.g., the word “apple”).
Under this theory, if you can’t map the word “apple” to a physical apple, you don’t know what “apple” means. This is the view that dooms LLMs.
Pragmatic Theory (The Wittgensteinian/Brandomian View)
The author proposes we switch to a Pragmatic Theory of meaning, popularized by philosophers like Ludwig Wittgenstein and Robert Brandom.
- Grand Picture: Language is a social practice governed by norms (rules).
- Meaning: Meaning is determined by use. A word means what it does in a conversation. It is defined by the norms of how it is correctly used by a community.
- Unit: The primary unit of meaning is the speech act (usually a whole sentence or proposition).

Table 1 breaks down these differences. In the Pragmatic view, knowing the meaning of “dog” doesn’t require a mystical link between your brain-symbol and a physical dog. It requires knowing the rules of the game regarding the word “dog.”
For example:
- Correct use: “The dog is barking.” (Follows the norm).
- Incorrect use: “The dog is flying to the moon.” (Violates the norm of what dogs do).
If you know which sentences are valid moves in the “language game” and which are not, you possess the meaning.
Dissolving the Problem
If we adopt the Pragmatic view, the Symbol Grounding Problem dissolves. We don’t need to “hook” symbols onto the world. We just need to observe the Conventional Norms of the community speaking the language.

As shown in Figure 3, the connection is no longer a mysterious mental bridge. It is a social one. The “meaning” of the word “dog” is established by the community’s conventions. If an entity (human or machine) can learn these conventions and use the word according to the norms, it understands the meaning.
The “real world” still matters—it’s the reason why the norms exist (we have a norm about dogs barking because real dogs bark)—but the speaker doesn’t need direct physical contact with the dog to learn the norm. They just need to listen to the community.
Part 4: Why This Applies to LLMs
So, how does this philosophical detour save LLMs?
If meaning is about norms of use, and not grounding in objects, then LLMs are perfectly positioned to acquire meaning.
LLMs are Pattern Machines
LLMs are trained on trillions of words of human text. This text is the record of our social practices. It contains all our norms, our rules, our “language games.” By analyzing this vast dataset, LLMs infer the norms that govern our language.
- They learn that “red” is a color.
- They learn that “sticks” can fend off bears.
- They learn that “bears” are dangerous.
They learn these not by seeing bears, but by observing the statistical regularities in how humans talk about bears. Under the Pragmatic Theory, this is what meaning is.
The Architecture Argument: LLMs Have No Symbols
There is a second, more technical reason why the SGP doesn’t apply to LLMs. Remember that the SGP was invented to critique the Computational Theory of Mind (CTM)—the idea of a mind manipulating discrete symbols (Mentalese).
But LLMs are not symbolic AI. They are connectionist systems (Neural Networks).

Look at Figure 4. The architecture of a Transformer (the T in GPT) involves “Add & Normalization,” “Feed Forward” layers, and “Self-Attention.” It processes vectors (arrays of numbers), not symbols.
The “Thinking” happens in high-dimensional vector space. There is no SYMBOL_DOG inside ChatGPT. There is a distributed pattern of activation across millions of parameters.
The critics who try to apply the SGP to LLMs are making a category error. They are looking for “symbols” to ground, but the machine is operating on “vectors” and statistics. Unless we want to force a “Language of Thought” hypothesis onto Neural Networks (which the author argues we have no reason to do), the premise of the SGP fails on a technical level.
Some critics try to pivot and argue for a “Vector Grounding Problem”—that the vectors need to be grounded. But the author counters: Why? If the vectors allow the model to use language in accordance with human norms (Pragmatism), the vectors are doing their job. They don’t need to “hook” onto the world; they just need to encode the social rules of the language.
Part 5: Experiments and Implications
The paper moves beyond theory to discuss the empirical reality. If the Correspondence Theory were true—meaning LLMs cannot understand because they lack grounding—we would expect them to hit a “glass ceiling.” We would expect to see them fail miserably at tasks that require understanding the physical properties of the world.
Climbing the Right Hill
Bender and Koller suggested that LLMs are “climbing the wrong hill”—that no matter how much data we give them, they will never reach true understanding (NLU) because they are missing the “grounding” component.
However, the empirical evidence points the other way.
- Natural Language Inference (NLI): LLMs have shown incredible improvements in NLI tasks, where they must determine if one sentence logically follows another.
- Generalization: While earlier models (like BERT) struggled with out-of-distribution generalization, newer, larger models (like GPT-4) show remarkable adaptability, solving tasks they were never explicitly trained on.
- The “Expert” Octopus: Just like the extended Octopus experiment predicted, when LLMs are fed enough high-quality data (conversations from experts), they can solve problems that supposedly require “world knowledge.”
The fact that LLMs are succeeding at these tasks suggests that Pragmatism is right. The “meaning” required to pass the Bar Exam or write code is encoded in the use of language, not in the physical touch of objects.
Conclusion: Stop Worrying and Love the Norms
The debate over whether AI “truly” understands us often feels like a semantic trap. This paper provides a way out of that trap by challenging our definitions.
If you believe “understanding” requires a biological brain to physically touch a dog to know what “dog” means, then yes, LLMs will never understand. You are stuck in the Symbol Grounding Problem.
But if you accept the Pragmatic view—that language is a social game defined by shared norms and rules of usage—then the Symbol Grounding Problem vanishes. LLMs are not “Stochastic Parrots” mimicking sound without meaning; they are Norm-Inferring Engines. They observe how we play the game of language, learn the rules, and play it back to us with increasing proficiency.
The paper concludes that we should stop wasting resources trying to “solve” the Symbol Grounding Problem for LLMs, because it is a problem that does not exist for them. Instead of asking “Does it map to the world?”, we should ask “Does it follow our norms?”
As we interact with AI that seems more human by the day, this distinction becomes vital. The machine doesn’t need to share our physical reality to share our language—it just needs to understand our rules.
](https://deep-paper.org/en/paper/file-3496/images/cover.png)