We need to talk about tests (in the "I got an A" sense, not the RCT sense)

We are hearing all sorts of hype about large language models passing various exams, but before we can have that conversation, we need to have one about how those tests are supposed to work, the assumptions them, and why a good test of human understanding can be meaningless if approached in a different way.

 Here's a post I wrote a few years ago about a type of test I encounter back when I was getting my BFA. As you read over the description, think about how a LLM would approach this task, and about what (if anything) its performance would tell us.

"Of course, Shakespeare was much newer at the time"

Back when I was an undergrad I took a class in Shakespeare. I'm mentioning this because a couple of aspects came back to me recently while thinking about education. [The second aspect was covered in this later post -- MP] The first was the format of the tests the teacher used. They consisted of a list of quotes from the four plays we had covered since the last test. Each quote had a pronoun underlined which came with a two part question: who was the speaker and who was the antecedent?

I've never seen that format used in another class (even by the same teacher) and I always thought it was an interesting approach. I wouldn't necessarily recommend using it widely but I'm glad I had it in at least one course. It was a method that encouraged attentive reading (particularly useful with Shakespeare).

Experiencing different styles of teaching and evaluation are part of a well-rounded education. I've seen a wide range approaches. Some were successful. Some were not. Some successful as one-shots but weren't models I'd suggest routinely following, like the number theory class I took that didn't allow mathematical notation (all proofs had to be written out in grammatical sentences without abbreviations or symbols -- more or less the way Fermat would have done it). That pedagogical diversity has been of immense value.

A book on quality control I read a few years ago said that quality in a QC sense was equivalent to a lack of variation; quality meant all parts came out the same. Sometimes I'm afraid that the some in the education reform movement are starting to think of uniformity as an end to itself.


