Friday, March 13, 2026

When it comes to agentic AI, a Newhart Airline is the best is the best case scenario.

In a previous post, I discussed the idea of a steam airplane (an impressive technology that still represented a soon to be abandoned dead end) and a Newhart airline (a genuine breakthrough prematurely commercialized).

The Wright brothers' plane was the very opposite of a dead-end technology. The basic principles and design choices were all completely sound, and you can trace a fairly direct line from those first models to the passenger planes and military aircraft of two or three decades later.

That said, for all the excitement, no serious person looked at this and said this is commercially viable technology. As with Edison’s phonograph, which had also shocked the world 30 years earlier, while virtually everyone recognized this as a breakthrough, it was also clear that the technology would have to evolve considerably before it could be rolled out for widespread business or military applications.

On his seminal album The Button-Down Mind, Bob Newhart imagined a conversation between the Wright brothers and a post-war era corporation trying to monetize their breakthrough. The humor of the monologue came partly from the absurdity of trying to stack multiple passengers on the wing of the Wright Flyer or making a coast-to-coast trip taking off and landing every 105 ft, but much of it also came from the banality and shortsightedness of 60s-era corporate culture in the face of a stunning, world-altering step forward. It’s a comparison that’s, if anything, even sharper in the age of venture capitalism.

...

Are [LLMs]  Newhart’s airline—a viable and important technology that isn’t ready yet to support the commercial applications that people are trying to impose on it?

 

It is too early to say how LLM-based AI will play out, but I feel confident in saying that LLM-based agents are not ready for prime time. 
 

Julie Bort writing for TechCrunch

The now-viral X post from Meta AI security researcher Summer Yue reads, at first, like satire. She told her OpenClaw AI agent to check her overstuffed email inbox and suggest what to delete or archive.  

The agent proceeded to run amok. It started deleting all her email in a “speed run” while ignoring her commands from her phone telling it to stop. 

“I had to RUN to my Mac mini like I was defusing a bomb,” she wrote, posting images of the ignored stop prompts as receipts.  

... 

But Yue’s post serves as a warning. As others on X noted, if an AI security researcher could run into this problem, what hope do mere mortals have? 

“Were you intentionally testing its guardrails or did you make a rookie mistake?” a software developer asked her on X.  

“Rookie mistake tbh,” she replied. She had been testing her agent with a smaller “toy” inbox, as she called it, and it had been running well on less important email. It had earned her trust, so she thought she’d let it loose on the real thing. 

Yue believes that the large amount of data in her real inbox “triggered compaction,” she wrote. Compaction happens when the context window — the running record of everything the AI has been told and has done in a session — grows too large, causing the agent to begin summarizing, compressing, and managing the conversation.  

At that point, the AI may skip over instructions that the human considers quite important.  

...

The point of the tale is that agents aimed at knowledge workers, at their current stage of development, are risky. People who say they are using them successfully are cobbling together methods to protect themselves.

One day, perhaps soon (by 2027? 2028?), they may be ready for widespread use. Goodness knows many of us would love help with email, grocery orders, and scheduling dentist appointments. But that day has not yet come. 

  And we haven't even gotten into prompt injection.

No comments:

Post a Comment