Comments, observations and thoughts from two bloggers on applied statistics, higher education and epidemiology. Joseph is an associate professor. Mark is a professional statistician and former math teacher.
Tuesday, August 12, 2025
Trip to the library
I'm a bit surprised I haven't posted this before.
Emily M. Bender is one of, perhaps the, leading critic of LLM from the theoretically side. (On the business and social impact side I think we'd have to give the title to Ed Zitron.) Though best known for coining the term "stochastic parrot," my favorite example of her work is this essay, in which she demonstrates that, even if the algorithms were intelligent, they still couldn't understand what they were saying.
[I've left out some context. If you think you've spotted a flaw in the logic, you should check out the origin before weighing in.]
To
try to bring the difference between form and meaning into focus, I like
to lead people through a thought experiment. Think of a language that
you do not speak which is furthermore written in a non-ideographic
writing system that you don’t read. For many (but by no means all)
people reading this post, Thai might fit that description, so I’ll use
Thai in this example.
Imagine you are in the National Library of Thailand (Thai wikipedia page).
You have access to all the books in that library, except any that have
illustrations or any writing not in Thai. You have unlimited time, and
your physical needs are catered to, but no people to interact with. Could you learn to understand written Thai? If so, how would you achieve that? (Please ponder for a moment, before reading on.)
I’ve had this conversation with many many people. Some ideas that have come up:
Look for an illustrated encyclopedia. [Sorry, I removed all books with photos, remember?]
Find
scientific articles which might have English loanwords spelled out in
English orthography. [Those are gone too. I was thorough.]
Patiently
collate a list of all strings, locating the most frequent ones, and
deduce that those are function words, like the equivalents of and, the, or to,
or whichever elements Thai grammaticalizes. [Thai actually doesn’t use
white space delimiters for words, so this strategy would be extra
challenging. If you succeeded, you’d be succeeding because you were
bringing additional knowledge to the situation, something which an LLM
doesn’t have. Also, the function words aren’t going to help you much in
terms of the actual content.]
Unlimited time and yummy Thai food? I’d just sit back and enjoy that. [Great! But also, not going to lead to learning Thai.]
Hunt
around until you find something that from its format is obviously a
translation of a book you already know well in another language. [Again,
bringing in external information.]
Look
at the way the books are organized in the library, and find words
(substrings) that appear disproportionate in each section (compared to
the others). Deduce that these are the words that have to do with the
topic of that section. [That would be an interesting way to partition
the vocabulary for sure, but how would you actually figure out what any
of the words mean?]
Without
any way to relate the texts you are looking at to anything outside
language, i.e. to hypotheses about their communicative intent, you can’t
get off the ground with this task. Most of the strategies above involve
pulling in additional information that would let you make those
hypotheses — something beyond the strict form of the language.
...
You could, if you didn’t get fed up, get really good as knowing what a
reasonable string of Thai “looks like”. You could maybe even write
something that a Thai speaker could make sense of. But this isn’t the
same thing as “knowing Thai”. If you wanted to learn from the knowledge
stored in that library, you still wouldn’t have access.
...
It doesn’t matter how “intelligent” [ChatGPT] is — it can’t get to meaning if all it has access to is form. But also: it’s not “intelligent”. Our only evidence for its “intelligence” is the apparent coherence of its output. But we’re the ones doing all the meaning making there, as we make sense of it.
I'd strongly recommend that anyone who cares about this debate read up on analytic philosophy to the point where they understand Quine's paradigm-busting paper, Two Dogmas of Empiricism:
Two Dogmas hoists human language by the same petard as Bender's trying to use to hoist LLMs. This one paper was the death of logical poistivism---it basically caused philosophers to give up on this notion that all language is either analytical (e.g., "a or not a" is true) or grounded in embodied observation. The more nuanced answer provided by Wittgenstein and articulated in devastating generality by Quine is that language is a web of associations. I'd also urge the same party to follow the so-called "linguistic turn" in philosophy to the pragmatism of Richard Rorty. The obstacle is that this is all embedded in the technical language (i.e., jargon) of philosophy, which is quite dense. But ultimately rewarding if you care about these issues.
"The more nuanced answer provided by Wittgenstein and articulated in devastating generality by Quine is that language is a web of associations."
Sure, but if there is one thing that LLMs can do better than humans, it is process language as a web of associations. In fact, that is pretty much all they can do at this stage of the game.
I can't connect the dots between the Thai library analogy and your comment.
I do have a comment of my own about the Thai library. If you have a human in there and you ask them what a sentence in one of the booksmeans, they don't stand a chance of answering in a satisfactory way, which is the point of the analogy. But an LLM does have a good chance of hitting the mark, simply by treating language as a web of associations and combining those associations into a coherent-sounding response. So while the analogy explicates what an LLM can't do - gain any kind of understanding of meaning - the same analogy shows us why LLMs are so engaging. They can SEEM to do things we cannot.
I'd strongly recommend that anyone who cares about this debate read up on analytic philosophy to the point where they understand Quine's paradigm-busting paper, Two Dogmas of Empiricism:
ReplyDeletehttps://en.wikipedia.org/wiki/Two_Dogmas_of_Empiricism
Two Dogmas hoists human language by the same petard as Bender's trying to use to hoist LLMs. This one paper was the death of logical poistivism---it basically caused philosophers to give up on this notion that all language is either analytical (e.g., "a or not a" is true) or grounded in embodied observation. The more nuanced answer provided by Wittgenstein and articulated in devastating generality by Quine is that language is a web of associations. I'd also urge the same party to follow the so-called "linguistic turn" in philosophy to the pragmatism of Richard Rorty. The obstacle is that this is all embedded in the technical language (i.e., jargon) of philosophy, which is quite dense. But ultimately rewarding if you care about these issues.
Bob wrote:
ReplyDelete"The more nuanced answer provided by Wittgenstein and articulated in devastating generality by Quine is that language is a web of associations."
Sure, but if there is one thing that LLMs can do better than humans, it is process language as a web of associations. In fact, that is pretty much all they can do at this stage of the game.
I can't connect the dots between the Thai library analogy and your comment.
I do have a comment of my own about the Thai library. If you have a human in there and you ask them what a sentence in one of the booksmeans, they don't stand a chance of answering in a satisfactory way, which is the point of the analogy. But an LLM does have a good chance of hitting the mark, simply by treating language as a web of associations and combining those associations into a coherent-sounding response. So while the analogy explicates what an LLM can't do - gain any kind of understanding of meaning - the same analogy shows us why LLMs are so engaging. They can SEEM to do things we cannot.