Tuesday, August 12, 2025

Trip to the library

I'm a bit surprised I haven't posted this before. 

Emily M. Bender is one of, perhaps the, leading critic of LLM from the theoretically side. (On the business and social impact side I think we'd have to give the title to Ed Zitron.) Though best known for coining the term "stochastic parrot," my favorite example of her work is this essay, in which she demonstrates that, even if the algorithms were intelligent, they still couldn't understand what they were saying.

[I've left out some context. If you think you've spotted a flaw in the logic, you should check out the origin before weighing in.]

 From Thought experiment in the National Library of Thailand

To try to bring the difference between form and meaning into focus, I like to lead people through a thought experiment. Think of a language that you do not speak which is furthermore written in a non-ideographic writing system that you don’t read. For many (but by no means all) people reading this post, Thai might fit that description, so I’ll use Thai in this example.

Imagine you are in the National Library of Thailand (Thai wikipedia page). You have access to all the books in that library, except any that have illustrations or any writing not in Thai. You have unlimited time, and your physical needs are catered to, but no people to interact with. Could you learn to understand written Thai? If so, how would you achieve that? (Please ponder for a moment, before reading on.)

I’ve had this conversation with many many people. Some ideas that have come up:

  1. Look for an illustrated encyclopedia. [Sorry, I removed all books with photos, remember?]
  2. Find scientific articles which might have English loanwords spelled out in English orthography. [Those are gone too. I was thorough.]
  3. Patiently collate a list of all strings, locating the most frequent ones, and deduce that those are function words, like the equivalents of and, the, or to, or whichever elements Thai grammaticalizes. [Thai actually doesn’t use white space delimiters for words, so this strategy would be extra challenging. If you succeeded, you’d be succeeding because you were bringing additional knowledge to the situation, something which an LLM doesn’t have. Also, the function words aren’t going to help you much in terms of the actual content.]
  4. Unlimited time and yummy Thai food? I’d just sit back and enjoy that. [Great! But also, not going to lead to learning Thai.]
  5. Hunt around until you find something that from its format is obviously a translation of a book you already know well in another language. [Again, bringing in external information.]
  6. Look at the way the books are organized in the library, and find words (substrings) that appear disproportionate in each section (compared to the others). Deduce that these are the words that have to do with the topic of that section. [That would be an interesting way to partition the vocabulary for sure, but how would you actually figure out what any of the words mean?]

Without any way to relate the texts you are looking at to anything outside language, i.e. to hypotheses about their communicative intent, you can’t get off the ground with this task. Most of the strategies above involve pulling in additional information that would let you make those hypotheses — something beyond the strict form of the language.

... 

You could, if you didn’t get fed up, get really good as knowing what a reasonable string of Thai “looks like”. You could maybe even write something that a Thai speaker could make sense of. But this isn’t the same thing as “knowing Thai”. If you wanted to learn from the knowledge stored in that library, you still wouldn’t have access.

...

It doesn’t matter how “intelligent” [ChatGPT] is — it can’t get to meaning if all it has access to is form. But also: it’s not “intelligent”. Our only evidence for its “intelligence” is the apparent coherence of its output. But we’re the ones doing all the meaning making there, as we make sense of it. 

2 comments:

  1. I'd strongly recommend that anyone who cares about this debate read up on analytic philosophy to the point where they understand Quine's paradigm-busting paper, Two Dogmas of Empiricism:

    https://en.wikipedia.org/wiki/Two_Dogmas_of_Empiricism

    Two Dogmas hoists human language by the same petard as Bender's trying to use to hoist LLMs. This one paper was the death of logical poistivism---it basically caused philosophers to give up on this notion that all language is either analytical (e.g., "a or not a" is true) or grounded in embodied observation. The more nuanced answer provided by Wittgenstein and articulated in devastating generality by Quine is that language is a web of associations. I'd also urge the same party to follow the so-called "linguistic turn" in philosophy to the pragmatism of Richard Rorty. The obstacle is that this is all embedded in the technical language (i.e., jargon) of philosophy, which is quite dense. But ultimately rewarding if you care about these issues.

    ReplyDelete
  2. Bob wrote:

    "The more nuanced answer provided by Wittgenstein and articulated in devastating generality by Quine is that language is a web of associations."

    Sure, but if there is one thing that LLMs can do better than humans, it is process language as a web of associations. In fact, that is pretty much all they can do at this stage of the game.

    I can't connect the dots between the Thai library analogy and your comment.

    I do have a comment of my own about the Thai library. If you have a human in there and you ask them what a sentence in one of the booksmeans, they don't stand a chance of answering in a satisfactory way, which is the point of the analogy. But an LLM does have a good chance of hitting the mark, simply by treating language as a web of associations and combining those associations into a coherent-sounding response. So while the analogy explicates what an LLM can't do - gain any kind of understanding of meaning - the same analogy shows us why LLMs are so engaging. They can SEEM to do things we cannot.

    ReplyDelete