West Coast Stat Views (on Observational Epidemiology and more): How I Learned to Stop Worrying and Love the LLM part 2 -- a proof-reader, not an editor

Monday, February 24, 2025

How I Learned to Stop Worrying and Love the LLM part 2 -- a proof-reader, not an editor

I have two dictation options which, being a horrible typist, I use frequently. The first is Dragon NaturallySpeaking on my laptop, which works fairly well. The second is dictating email to my phone, which does not. Capitalization rules seem to be based on some kind of random number generator. Homonyms are, of course, a problem, but so are misheard and missing words. Correcting these mistakes can eat up most, and sometimes all, of the time saved.

It is also tedious as hell.

I decided to let ChatGPT take a crack at it and see how well it worked. Here’s the prompt I used.

"Edit the following paragraphs with explanations in brackets after each change with explanations. : "

How did it work? It depends. On the part I was most interested in — homonyms, weird capitalization, and misheard or missing words — it caught almost everything I wanted it to. The other revisions it suggested weren’t particularly helpful. I believe I used just one of them, and that was because I had used the same word twice in the paragraph, not because of the reason given in the explanation

One of those unused suggestions struck me as a particularly interesting example of how differently ChatGPT "thinks." Here is the paragraph in question:

He used the windfall from the sale of PayPal along with funding from other investors to establish SpaceX, but the people actually in charge were highly respected aerospace veterans. They sometimes let the money guy wear an engineer’s cap and blow the whistle, but no one, including Musk himself, really thought he was running the train.

The only change the algorithm suggested was substituting "operation" for "train." Normally, I wouldn't have been that surprised that it didn't make the analogous choice—LLMs aren't really capable of creating true analogies—but I assumed it would associate the terms "engineer’s cap" and "blow the whistle" with the word "train."

The bigger point here is that large language models do represent an impressive advance in the way we talk to computers and they talk to us. While they come nowhere near living up to the hype, they can provide us with some genuinely useful tools, as long as we keep their limitations in mind.

So there. I’ve now said nice things about LLMs in two posts. I hope you’re satisfied.

2 comments:

Bob CarpenterFebruary 24, 2025 at 2:52 PM
Given what you said you wanted, I would ask GPT to proofread rather than edit. Let it know that you are correcting a speech recognizer. Then tell it what you told the blog reader you were most concerned about, attention to capitalization, misspellings, and homonyms. Then give it an example of the style you want answer marked up. In general, all of these steps (context, instructuons, one or more specific examples).

Also, which LLM did you use? 4o should be fine for this but o1 and o3 are a lot smarter. Also, having spent two days on the Claude jailbreak challenge, I have concluded it is much better with the English language.
ReplyDelete
Replies
David in TokyoFebruary 24, 2025 at 11:21 PM
The even-higher-volume-than I translators used to (and presumably still do) get great mileage from Dragon and other speech recognition systems. Here, I type reasonably well (I move my hands to put fingers over the key, I don't stretch the way some systems teach so I never had tendonitis or what-have-you problems; this isn't "typist" speed, but since I think about what I write (Really. I do.), it's fast enough.), so I never bothered. But I've been typing since 1970 (started on an IBM keypunch, which in those days had really great keyboards (I can't believe how bad the Teletypes that you got on every DEC computer were (My dad was a site engineer for one of the first DEC LINC machines and then a PDP-7). Sheesh.)) so I'm fussy about keyboards. I always wonder how people put up with laptop keyboards. Go.Figure.City. And my condolences on having to put up with a cell phone.)

But my big concern with actually "using" LLMs is the stupid interaction cycle of "please write XYZ" (outputs something) "No, you missed/messed up the Y part." "Sorry, I'll do better." (outputs something also wrong). Repeat ad nauseum. Combined with the occassional inane stupidity, you quickly get into a situation where "using an LLM" is slower than just doing the work yourself. Especially when the question is "Please summarize this article" and you actually have to read the article to figure out if the LLM got it anywhere near correct. (Someone the other day had the perfect comment on LLM summaries: "If you can't be bothered to read the article, I can't be bothered to read your LLM summary.")

But you seem to have found a real use case: since you know what you want to say and you've already said most of it, that means you instantly know when the LLM's off, and when it's being helpful.

Color me impressed.
ReplyDelete
Replies