Wednesday, March 5, 2025

Clive James's example may be god-awful, but what about the point he was trying to make?

 Picking up on our discussion of this claim:

The best Hitchcock film was directed by someone else. Charade would not be as good as it is if Hitchcock had not developed the genre it epitomises, but Hitchcock could never have created a film so meticulous, plausible, sensitive, light-footed and funny.

Everyone now seems to agree that "the best Hitchcock film" is, at best, quite a stretch, but what about the broader claim that a [blank] film/book/song might be the work of someone other than [blank]?

There are lots of examples where imitations were better than the original and where plagiarists, from Robin Williams to Shakespeare, put out work superior to what they were imitating, but that's not exactly what we're talking about here. 

In this context, the term "Hitchcock film" effectively defines a subgenre (think North by Northwest—with the important caveat that not every film Hitchcock made qualifies as a Hitchcock film by that standard.

Saying someone defines a subgenre is a bit of a left-handed compliment. Obviously, you have to be successful and influential to get there, but that success and influence largely almost always exist within a larger genre. It also suggests that someone else could do it. While Charade is a silly example, it’s not that difficult to imagine someone else theoretically making a better Hitchcock film than Hitchcock. I don’t think you could talk about a Kubrick film in the same way. That said, it is worth noting that countless enormously talented filmmakers—in some cases, arguably more talented than Hitchcock himself—have tried their hands at the subgenre and, as far as I can tell, have all fallen short. François Truffaut, Roman Polanski, and Brian De Palma all come to mind.

What about in other media? An Agatha Christie mystery would certainly qualify as one of these personal-brand subgenres, and we could probably find someone to argue that Ngaio Marsh wrote better Christie books than Christie did (I’m not taking a position on this one, I'm just saying someone might) but it's a difficult point to argue. I would be more than willing to make the case that Dorothy L. Sayers wrote better novels, but here we get into one of the big problems with "better [blank] than [blank]" claims: if you improve too much on the original, at some point, it ceases to be a [blank] work. (Tellingly, if probably unintentionally, when Kenneth Branagh wanted to make Hercule Poirot more modern and three-dimensional, he did so by giving him the backstory of Lord Peter Wimsey.) Sleuth also comes to mind. It plays with the conventions of an Agatha Christie story but mainly to subvert them.

If you're good enough to have a subgenre named after you, usually you are good enough to outshine your imitators, but I finally came up with an exception—one so obvious I don't know why it took me so long to think of it. A writer whose very name is a widely used adjective, arguably one of the most influential writers of the 20th century, and yet someone who was routinely outdone at his own game.

H.P. Lovecraft wasn’t a very good writer. There were good, even sometimes great, elements in his stories, but the stories themselves never rose above mildly inept. I went back and reread some Lovecraft, starting with Dagon, and with the exception of a few passages, it took me back to my days teaching junior high English.

We won’t even get into the racism and anti-Semitism.

Lovecraft's writing often comes across as a crude first draft of what could be a very good piece of fiction in the proper hands, which may be why we saw an extraordinary group of talented writers picking up his ideas and running with them—even as he was still writing.

Although the Mythos was not formalized or acknowledged between them, Lovecraft did correspond, meet in person, and share story elements with other contemporary writers including Clark Ashton Smith, Robert E. Howard, Robert Bloch, Frank Belknap Long, Henry Kuttner, Henry S. Whitehead, and Fritz Leiber—a group referred to as the "Lovecraft Circle".[16][17][18]

Everyone named in that paragraph was a much better writer than H.P. Lovecraft, and it is because of them—and the others who followed—that his works are better remembered today than The Great God Pan or the stories of Lord Dunsany.

 

Tuesday, March 4, 2025

A Blacker Black Box

From Matt Levine's newsletter:

There are two basic ways to use artificial intelligence to predict stock prices:

  1. You build a deep learning model to predict stock prices: You set up a deep neural net, you feed it tons of historical data about stocks, and you train it to figure out how that data predicts stock price returns. Then you run the model on current data, it predicts future returns, and you buy the stocks that it thinks will go up.
  2. You take some deep learning model that someone else built, a large language model, one that is good at predicting text. It is trained on a huge corpus of human language, and it is good at answering questions like “write a poem about a frog in the style of W.B. Yeats.” And you ask it questions like “write a report about whether I should buy Nvidia Corp. stock in the style of Warren Buffett.” And then it trains on the writing style of Warren Buffett, which reflects his thinking style, and its answer to your question — you hope — actually reflects what Buffett might say, or what he might say if he was a computer with a lot of time to think about the question. And because Warren Buffett is good at picking stocks, this synthetic version of him is useful to you. You read the report, and if robot Warren Buffett says “buy” you buy.

The first approach makes obvious intuitive sense and roughly describes what various quantitative investment firms actually get up to: There might be patterns in financial data that predict future returns, and deep learning is a statistical technique for finding them.

The second approach seems … sort of insane and wasteful and indirect? Yet also funny and charming? It is an approach to solving the problem by first solving a much harder and more general problem: Instead of “go through a ton of data to see what signals predict whether a stock goes up,” it’s “construct a robot that convincingly mimics human consciousness, and then train that robot to mimic the consciousness of a particular human who is good at picking stocks, and then give the robot some basic data about a stock, and then ask the robot to predict whether the human would predict that the stock will go up.” 

My impression is that there are people using the first approach with significant success — this is roughly, like Renaissance Technologies — and the second approach is mostly me making a joke. But not entirely. The second approach has some critical advantages:

  1. Somebody else — OpenAI or xAI or DeepSeek or whoever — already built the large language model for you, at great expense. If you are on the cutting edge of machine learning and can afford to pay for huge quantities of data and researchers and computing capacity, go ahead and build a stock-predicting model, but if you are just, say, an academic, using someone else’s model is probably easier. The large language model companies release their models pretty widely. The stock model companies do not. You can’t, like, pay $20 a month for Rennaissance’s stock price model.
  2. Because the large language model’s output is prose, its reasoning is explainable in a way that the stock model is not. The stock model is like “I have looked at every possible combination of 100,000 data time series and constructed a signal that is a nonlinear combination of 37,314 of them, and the signal says Nvidia will go up,” and if you ask why, the model will say “well, the 37,314 data sets.” You just have to trust it. Whereas robot Warren Buffett will write you a nice little report, with reasons you should buy Nvidia. The reasons might be entirely hallucinated, but you can go check. I wrote once: “One criticism that you sometimes see of artificial intelligence in finance is that the computer is a black box that picks stocks for reasons its human users can’t understand: The computer’s reasoning process is opaque, and so you can’t be confident that it is picking stocks for good reasons or due to spurious correlations. Making the computer write you an investment memo solves that problem!”
  3. I do think that the aesthetic and social appeal of typing in a little box to have a chat with your friend Robot Warren is different from the black box just giving you a list of stocks to buy. This probably doesn’t matter too much to rigorous quantitative hedge funds, but it must matter to someone. We talked last year about a startup that was launching “a chatbot that offers stock-picking advice” to retail brokerage customers, and it seemed like the goal of the project was not “the chatbot will always tell you stocks that will go up” but rather “the chatbot will offer a convincing simulacrum of talking to a human broker,” who also will not always tell you stocks that will go up. You call the broker anyway. Now you can text the chatbot instead.

And so we also talked last year about an exchange-traded-fund firm that would use large language models to simulate human experts — ones with characteristics of particular humans, like Buffett — to make stock picks. Why use LLMs rather than build a model to directly predict stock prices? Well, because the LLM is already there, and the data is already there, and the schtick is a little more human than “here’s our black box.”

Anyway here’s a paper on “Simulating the Survey of Professional Forecasters,” by Anne Lundgaard Hansen, John Horton, Sophia Kazinnik, Daniela Puzzello and Ali Zarifhonarvar:

Though Levine does a characteristically great job laying out the questions in a clear and insightful way, on at least one point, I think he's not just wrong, but the opposite of right. The LLM may appear to be less opaque, but it is actually the blacker black box.

Normally, when we use the term "black box model," we mean that we know the data that goes in and can see the scored data that comes out, but the process by which it is arrived at is so complex and computation-intensive that we can't say exactly what happened. However, in practice, that's not entirely true. We can analyze the output, identify the main drivers of the model, and flag potential biases and other problems. We can perturb the input data, leaving out certain parts, and observe how the output is affected. In most real-world cases I've seen, you can reverse-engineer the model, creating something remarkably close that uses a manageable and, more importantly, comprehensible dataset and series of calculations. This simpler, reverse-engineered model won't use the same data as the black box, but it will be transparent, will very likely use the same categories of data and generally capture the underlying relationships and sometimes perform almost as well.

I have never done anything related to stock prediction, but I have worked with models predicting consumer behavior, and I'm betting that the underlying process is somewhat similar. Let's take the example of a credit card company building a black-box model to predict which customers are likely to default on their debts. In addition to transaction and payment history, the company has access to a huge amount of data from credit bureaus, vendors such as Acxiom, publicly available information, and macroeconomic data. We're talking about tens of thousands of variables going into that model. It is not possible for a person or even a team to go through all of these fields one by one, but at a more general level, it is possible to know what kind of data is going in and to maintain some standard for quality and relevance.

If your training data is everything that can be scraped from the internet, it is effectively unknowable. In the traditional black-box scenario, we know the data and the output; only the middle part of the process is opaque. With large language models, however, everything before the final answer is shrouded in darkness.

Your training data may include the writings of Warren Buffett, the text of A Random Walk Down Wall Street, and the archives of The Wall Street Journal, but it can also contain horoscopes, blogs from "buy the dip" Robinhood day traders, and market analysis from crypto investors. The style and word choice might resemble those of the Oracle of Omaha, but the underlying ideas might come from the Rich Dad Poor Dad guy.

 

Monday, March 3, 2025

The Grandiosity/Contribution Ratio -- another newly relevant repost

One of the recurring threads at the blog for years now has been the Lords of Ithuvania, the way we have collectively treated people who stumbled upon huge fortunes in the tech industry as super-capable, often almost Messianic figures who can solve any problem and who go unchallenged when making even the most delusional boasts—like claiming they can cure all diseases. That myth is now proving extraordinarily costly.

Tuesday, January 23, 2018

The Grandiosity/Contribution Ratio

From Gizmodo [emphasis added]
Zuck and Priscilla laid out the schematics for this effort on Facebook Live. The plan will be part of the Chan Zuckerberg Initiative and will be called simply “Chan Zuckerberg Science.” The goal, Zuck said, is to “cure, prevent, or manage all diseases in our children’s lifetime.” The project will bring together a bunch of scientists, engineers, doctors, and other experts in an attempt to rid the world of disease.

“We want to dramatically improve every life in [our daughter] Max’s generation and make sure we don’t miss a single soul,” Chan said.

Zuck explained that the Chan Zuckerberg Initiative will work in three ways: bring scientists and engineers together; build tools to “empower” people around the world; and promote a “movement” to fund science globally. The shiny new venture will receive $3 billion in funds over the next decade.
...

“Can we cure prevent or manage all diseases in our children’s lifetime?” Zuck asked at one point. “This is a big goal,” he said soon after, perhaps answering his own question.

Obviously, any time we can get some billionaire to commit hundreds of millions of dollars a year to important basic research, that's a good thing. This money will undoubtedly do a tremendous amount of good and it's difficult to see a major downside.

In terms of the rhetoric, however, it's useful to step back and put this into perspective. In absolute terms $3 billion, even spaced out over a decade, is a great deal of money, but in relative terms is it enough to move us significantly closer to Zuckerberg's "the big goal"? Consider that the annual budget of the NIH alone is around $35 billion. This means that Zuckerberg's initiative is promising to match a little bit less than 1% of NIH funding over the next 10 years.

From a research perspective, this is still a wonderful thing, but from a sociological perspective, it's yet another example of the hype-driven culture of Silicon Valley and what I've been calling the magical heuristics associated with it. Two of the heuristics we've mentioned before were the magic of language and the magic of will. When a billionaire, particularly a tech billionaire, says something obviously, even absurdly exaggerated, the statement is often given more rather than less weight. The unbelievable claims are treated less as descriptions of the world as it is and more incantations to help the billionaires will a new world into existence.

Perhaps the most interesting part of Zuckerberg's language here is that it reminds us just how much the Titans of the Valley have bought into their own bullshit.

 

Friday, February 28, 2025

More thoughts on criticism -- critics and reviewers

 Both reviewer and critic are honorable professions, with, if anything, the former being more valuable. The best reviewers by a wide margin used to be the team that put together Maltin's Movie Guide. As far as I can tell, no one has stepped up since to fill the gap, another case of the internet crowding out quality with crap.

The purpose of criticism is to deepen our understanding of a work and explore its connection to larger themes, genres, social issues, politics, psychology, etc. The primary purpose of reviews is to let people know whether or not they might like a movie, book, restaurant, or whatever.

It follows that the target audience of reviews is people who have not seen or heard the work in question. This doesn't mean that people won't sometimes seek out reviews after watching, reading, or listening to the work in question. Many of us like to compare our reactions to those of people who get paid to do this, but the reviews themselves are virtually never written for this segment.

It also follows that criticism is almost always most meaningful when the reader knows at least something about the subject. Here too, there can be some gray areas, particularly when the work in question is widely known or when it connects to larger questions about other topics. The book and the movie The Grapes of Wrath would qualify under both of these criteria.

Pauline Kael was probably our best movie critic, and yes, I know I'll get some pushback on that one, but historically that pushback has mainly come from people who strongly disagree with her assessment of various movies, which is understandable since Kael was also a terrible reviewer. If you try to boil down her thoughts about a film to "this part was good, this part was bad," she would seem arbitrary and erratic. The rule of thumb for reading Kael is that long is almost always good and short is generally bad. For this reason, literally, the last book of Kael's you should read is 5,001 Nights at the Movies, which unfortunately seems to be her most popular title.

Though heavily influenced by Pauline Kael, Roger Ebert was mainly a reviewer. Almost all newspaper movie critics are. It is difficult to be anything else, given the space constraints that almost all of them work under. Bob Chipman is one of those rare examples of a good critic who is also a good reviewer. Denby is a better critic than reviewer. The new guy at the New Yorker is a competent reviewer and a lousy critic, but we'll get to him in a future post.

It is important for a reviewer to have good or perhaps even more to the point, predictable tastes. This is particularly true when the reviews break with what you'd expect. A show like The Crown getting good notices tells you virtually nothing — it was all but grown in a lab to push reviewers' buttons. On the other end of the spectrum, when Siskel and Ebert both gave thumbs up to the over-the-top sex and gore of Re-Animator or when pretty much every review singled out William Fichtner's performance in the Grindhouse homage Drive Angry, you pretty much know you've got something good.

(Fichtner is one of the most reliable and underrated actors in Hollywood, so this was always pretty much a safe bet, but seriously, he is wonderful as the coolest satanic emissary you'll ever regret running across.)

With criticism, predictable or compatible tastes are often completely unnecessary. The director Barry Sonnenfeld likes Jerry Lewis comedies. I find them difficult to choke down. I do, however, enjoy listening to Sonnenfeld explain why he admires Lewis, and hearing him discuss those films that I don't care for deepened my appreciation of a number of comedies I am very fond of, including Men in Black.

 

Thursday, February 27, 2025

Two completely unrelated news stories

From the Arkansas Times:

Bill deregulating raw dairy products passes Senate, heads to governor’s desk
 by Phillip Powell

A bill to deregulate the sale of raw, unpasteurized milk heads to Gov. Sarah Sanders’s desk after it was overwhelmingly passed by the state Senate on Monday.

If signed by the governor, House Bill 1048 would allow farmers who produce unpasteurized goat milk, sheep milk or whole milk to sell the product at farmer’s markets and to deliver the product to customers – greatly increasing consumer access to the product.

“The other side benefit of this bill is the goats whose milk is being drank might otherwise be executed, so it’s a PETA bill too, you’ll be saving goats,” the bill’s sponsor, Sen. Alan Clark (R-Lonsdale), said, referencing the People for the Ethical Treatment of Animals, ahead of Monday’s Senate vote.

The bill passed 33-1, with Sen. Stephanie Flowers (D-Pine Bluff) as the only “no” vote.

Farmers would still be limited to selling 500 gallons a month under a 2013 law, and they would still be required to properly label the product as unpasteurized, but buyers would assume liability for all illness that may occur after drinking the dairy product.

 

From Discover Magazine:

The first case of bird flu in cattle was reported on March 25, 2024. In less than a year, the virus has hit 973 herds, according to the U.S. Centers for Disease Control and Prevention.

The United States Department of Agriculture (USDA) first issued an order to test cows that farmers intended to move between states last April. Then, in December 2024, the agency issued a federal order for milk testing. The order stipulated that unpasteurized milk samples be collected from dairy processing facilities nationwide and tested, with the results being shared with the USDA.

Since then, the virus has been detected in 17 states: 747 herds in California, 64 in Colorado, 35 in Idaho, 31 in Michigan, 27 in Texas, 13 each in Iowa and Utah, nine each in Minnesota and New Mexico, seven each in Nevada and South Dakota, four in Kansas, two in Oklahoma, and one each in Arizona, North Carolina, Ohio, and Wyoming.

...

Complicating matters, it is unclear how many CDC employees are still employed to monitor the spread of the virus and how many USDA workers are still on the job to conduct the milk testing orders.

...

The virus has been spreading in humans as well — but not as rapidly as in poultry or dairy cows, according to the American Veterinary Medical Association. So far, 69 people in the U.S. have tested positive for the virus. Of those, 23 cases involved poultry farm workers, and 41 involved dairy farm workers.

 

 

Wednesday, February 26, 2025

"There seems to be a plague of jellyfish and really a plague of anything is usually not considered good."

As previously mentioned ("The city would naturally form a line as it tried to get away from itself."), Patrick Boyle is one of the smartest and funniest people on YouTube. I can recommend every video I've seen on his channel, but this survey of failed mega-projects has a special place in my heart, hitting on favorites like the hyperloop and Neom.

To give you a taste of the style (though unfortunately without Boyle's perfect deadpan delivery), I ran some excerpts from the YouTube transcript through ChatGPT which worked perfectly (raising the question why doesn't Google use its AI to clean up its transcriptions?).

BBC reporters who visited Forest City in 2023 described seeing a children's train that someone forgot to turn off, doing endless loops around the abandoned shopping mall, playing "Head, Shoulders, Knees and Toes" in Chinese despite there being no children inside. Residents told the reporters that they were desperate to escape the city, which they called a "lifeless husk" that quickly erodes the sanity of anyone trying to live there... They probably shouldn't put that in their real estate listing.

...

They’ve also built attractions like the stairway on the beach, which is supposed to be a tourist attraction. Apparently, it’s very Instagrammable, you know, the way tourists will travel miles out into the middle of nowhere to take a photo on the stairs and hang out on the crocodile-infested beach. Oh yeah, I forgot to mention the crocodiles, didn’t I? Yeah, there’s crocodiles. In fact, that sign on the beach near the Instagram stairs is there to warn you about the crocodiles. No wonder they call it the Stairway to Heaven—there’s a good chance that you’re not getting out of there alive. That photo might be your last Instagram upload. Sorry, we seem to have taken a bit of a dark turn here.

...

So next up, we have the Hyperloop. That'll be good—at least there won't be jellyfish and crocodiles everywhere. That's the good thing about the CGI-based mega projects; the designers can usually keep the wildlife out of them.








Tuesday, February 25, 2025

Thiel is Dr. Mabuse. Musk is Cobra Commander.

From Josh Marshall:

Against the backdrop of a month of chaos and destruction, something began to shift more or less in the middle of this week. I don’t want to overstate what it portends in the short term. Elon Musk remains firmly in the saddle. And even as many of Trump’s advisors grow concerned about the impact of Musk’s rampage, Donald Trump himself appears to be maintaining his support. The moment was captured yesterday at what are now the more or less constant CPACs where Steve Bannon tossed off a Nazi salute and Musk appeared in a “Dark MAGA” baseball cap sporting a chainsaw and basking in the adulation of the MAGA/CPAC faithful awash in the joy a certain kind of individual derives from destruction and pain. The picture itself is a key signpost in the story. Make a note of it. Musk himself posted it to Twitter, labeled with “The DogeFather” and flexing with the text: “This is a real picture.”

...

2025 might be the first time in human history where we have a genuine supervillain walking among us. Humanity has spawned numerous monsters, of course: Hitler, Stalin, Pol Pot. But I’m talking about the supervillain on the Gotham/Metropolis model. The glad-handing, fantastically rich, this-dial-goes-to-11 over-the-top weirdo with his raucous bevy of cheerleaders who is in fact evil and has a cartoonishly stupid but yet very real plan to take over the world. Look at that picture again. You can easily imagine running it over every 2025 political ad about the chaos and immiseration he unleashed on the country.



But what kind of supervillain? Peter Thiel is the shadowy, mysterious type—the one who manipulates everything from behind the scenes. Musk is the megalomaniac type, the kind who insists on telling the hero about his evil plan, with schemes so grandiose and badly thought out that they would probably foil themselves if no secret agent were available to step in.

With the possible exception of transgender rights, Peter Thiel is more ideologically extreme—sometimes much more extreme—across the board than Elon Musk (just read the infamous Cato essay where he gives his take on women's suffrage), but he is far less emotionally needy.

Musk has an insatiable craving for attention and adoration. He seeks out worshipful crowds. He tells improbable stories about punching out the high school bully (but only to those journalists unlikely to check out his stories). He pays people to play video games for him so he can go on Joe Rogan and claim to be one of the best in the world. (If he were a golfer, his partnership with Donald Trump would not survive their first game.)

Musk has limited impulse control, particularly when angry. Both men are petty and vindictive, but while Peter Thiel patiently waited years to destroy Gawker by secretly funding Hulk Hogan's lawsuit, Musk lashes out immediately to any perceived insult with public name calling, threats, and rage-firing. This last one drove managers at Tesla to advise employees to take roundabout routes around the building in order to avoid walking past the CEO.

The list of possible motivations for Elon Musk is long and complicated, but one thing you should keep in mind is that everything we've seen so far—the bullying, the erratic behavior, the certainty, the apparently deliberate dickishness—is all absolutely consistent with everything we've seen from him in the past.

If you are shocked by any of this, it just means you haven't been paying attention 


Monday, February 24, 2025

How I Learned to Stop Worrying and Love the LLM part 2 -- a proof-reader, not an editor

I have two dictation options which, being a horrible typist, I use frequently. The first is Dragon NaturallySpeaking on my laptop, which works fairly well. The second is dictating email to my phone, which does not. Capitalization rules seem to be based on some kind of random number generator. Homonyms are, of course, a problem, but so are misheard and missing words. Correcting these mistakes can eat up most, and sometimes all, of the time saved.

It is also tedious as hell.

I decided to let ChatGPT take a crack at it and see how well it worked. Here’s the prompt I used.

"Edit the following paragraphs with explanations in brackets after each change with explanations.  : "

How did it work? It depends. On the part I was most interested in — homonyms, weird capitalization, and misheard or missing words — it caught almost everything I wanted it to. The other revisions it suggested weren’t particularly helpful. I believe I used just one of them, and that was because I had used the same word twice in the paragraph, not because of the reason given in the explanation

One of those unused suggestions struck me as a particularly interesting example of how differently ChatGPT "thinks." Here is the paragraph in question:

He used the windfall from the sale of PayPal along with funding from other investors to establish SpaceX, but the people actually in charge were highly respected aerospace veterans. They sometimes let the money guy wear an engineer’s cap and blow the whistle, but no one, including Musk himself, really thought he was running the train.
The only change the algorithm suggested was substituting "operation" for "train." Normally, I wouldn't have been that surprised that it didn't make the analogous choice—LLMs aren't really capable of creating true analogies—but I assumed it would associate the terms "engineer’s cap" and "blow the whistle" with the word "train."

The bigger point here is that large language models do represent an impressive advance in the way we talk to computers and they talk to us. While they come nowhere near living up to the hype, they can provide us with some genuinely useful tools, as long as we keep their limitations in mind.

So there. I’ve now said nice things about LLMs in two posts. I hope you’re satisfied.

Friday, February 21, 2025

"What was Clive James thinking?" Maybe the question is when was he thinking it?

From Andrew Gelman:

From The Dreaming Swimmer (1992), one of Clive James’s classic essay collections:

    The best Hitchcock film was directed by someone else. Charade would not be as good as it is if Hitchcock had not developed the genre it epitomises, but Hitchcock could never have created a film so meticulous, plausible, sensitive, light-footed and funny.

Whaaaa? We saw Charade recently, and it was . . . really bad. I mean, sure, I’ve seen worse movies, and the acting was fine for what it was, but, no, I didn’t think it was “meticulous,” “plausible,” “sensitive,” “light-footed,” or “funny.” I’d describe it more as a movie constructed to have these attributes without ever actually achieving them.

So then this makes me wonder: What was Clive James thinking? And, more generally, how to react when someone you admire has different tastes than you?

 

James' claim here may be indefensible but it's not inexplicable.

Tastes are neither fixed nor independent. They evolve over time, and though we might not like to admit it, they are influenced by peers and authority figures. The idea that Charade is, by any stretch of the imagination, a better film than North by Northwest would seem so absurd that no self-respecting critic would dare to say it in public. However, if we go back to the mid-'60s and look at what other critics were saying at the time, Clive James's take is far easier to understand.

 Admittedly, James appears to have written these comments about a quarter-century after Charade came out, but as a '60s intellectual who was interested in the arts, he probably saw the movie in the theater when it was released and followed the critical discussion surrounding both Charade and the Hitchcock films that followed, such as Marnie, Topaz, and Torn Curtain. It seems reasonable to suggest that his opinions about the film were formed in the context of the 1960s, particularly given the limited options he would have had for re-watching the film in the first couple of decades after it was released.

There were many factors that predisposed viewers like James to like Charade in 1963. In style and execution, it was the kind of slick entertainment that was fashionable in 1963. The big-budget (for the time), glamorous European location shooting added an air of sophistication. The supporting cast was excellent. Matthau and Coburn were in the process of breaking big. George Kennedy was an up-and-comer, only four years away from his Cool Hand Luke Oscar.   Cary Grant was Cary Grant—still one of the world’s biggest stars and the unquestioned master of this type of role.

Then there was Audrey Hepburn. It is difficult to overstate how charmed audiences were by Hepburn during her relatively short career. Critics used lots of words like "luminous." I have to admit I never really got the full appeal of Hepburn's innocent waif act, but people at the time could not get enough, particularly when she was cast in romantic comedies with much older leading men (Bogart, Harrison, Astaire, and Grant). Of course this was much more common in movies of the time, the age differences weren't normally nearly as obvious. Lauren Bacall seemed to be 18 going on 40 while Hepburn seemed to be 25 going on 14.

Charade unsurprisingly got raves. None of Alfred Hitchcock's 1960s films received better than mixed reviews upon release. It took Psycho and The Birds years to win over the critics, while the three films that followed are still widely considered second-rate entries into the canon. Critic and general asshole John Simon made a disparaging comment about Hitchcock imitators, then added that with films like Topaz, Hitchcock himself was now one of that group.

When Charade came out to glowing reviews, Hitchcock was in the strange position of being both too old-fashioned and with films like Vertigo, Psycho and the Birds, too ahead of his time. Even the well-reviewed North by Northwest (1959)was generally treated as something of an affectionate self-parody, like an old rock star showing he could still get the crowd on their feet when he ran through the hits.

If you go back and read criticism from the decade, you will find lots of people arguing that Hitchcock had peaked years ago, perhaps even around the time he left Great Britain. These comments have aged badly, but to be completely fair, the last decade of the director's career does unquestionably show signs of decline. While there are lots of critics who will champion Frenzy, everything else he produced after The Birds was decidedly minor.

Calling Charade the best "Hitchcock" film seems crazy today, but the opinion wasn't all that unheard of back when films like it and Marnie were coming out. Still just as wrong, but wrong with lots of company.

 

Thursday, February 20, 2025

In the cold war, all the cool kids had toys with nuclear warheads

 And yes, the M65 atomic cannon was real.

"Ideal Atomic Cannon" - 1958




Most people blame boomer cynicism and distrust of authority on Vietnam. I wonder if it wasn't based on the memory of getting an Honor House Polaris nuclear submarine and comparing the real thing to the ad in the comic book.

Wednesday, February 19, 2025

A tale of two tweets

 This is Joseph.

There is a new claim that the US is filled with vampires by Elon Musk, who is concerned about a database query of social security numbers with the death field as being false showing Americans who are hundreds of years old:



And then this new post as to what happens if use a second field (in the same database) to refine the query and just find the people in the previous list who actually get benefits:


There was immediate skepticism about the magnitude of the claims here. It seems like the sort of thing that was unlikely to have been missed for decades. Even Megan McArdle noted that these older persons wouldn't have made any contributions and so wouldn't get benefits when Musk used the same chart in a second post:


So what can possibly be going on here? Well, it seems like the explanation is people using false social security numbers to pay into the fund but that these payments don't create any eligibility for benefits. So it is true that we've located fraudulent payments that reduce the cost of funding social security. Fortunately, this information was buried in a formal report from 2015 in which the SSA decided it wasn't worth the cost to update old databases that no expert would bungle. I suspect that they feel foolish now -- focusing on efficiency rather than clarity -- but getting a correct death date for long dead Americans seems like an expensive journey into old government archives (many of these deaths likely predate electronic recording as Deva Hazarika noted). 

Ironically, such a project might well have been a target of DOGE had it been approved in 2023:


But the real lesson here is that database work is complicated (just ask Mark about his work in really exciting databases) and the key to doing anything is deep, subject matter knowledge. So why don't we have subject matter experts engaged? Until quite recently, the idea was that civil servants should be professionals and provide this expertise. 

I am worried that the answer Josh Marshall provides is true as to the reasons for this approach and, hopefully, he will expand on these thoughts with further investigative journalism -- hopefully disproving them.

So kids, play safe with databases! 


Tuesday, February 18, 2025

Fifteen years ago at the blog (does anyone else feel old?)

My first degree was a BFA in creative writing. When asked why I was switching to statistics, I said I thought it would be easier to find math people who wanted to talk about literature than it would be to find lit people who wanted to talk about mathematics. At the time I thought I was joking.

 

Blockbusters, Franchises and Apostrophes

More on the economics of genre fiction

The story so far: last week Andrew Gelman had a post on a book that discussed the dominance of best seller lists and suggested that it was due to their increased quality and respectability. I argued that the quality and respectability had if anything decreased (here), posted some background information (here and here) then discussed how the economics of publishing from the late Nineteenth Century through the Post-War era had influenced genre fiction. The following closes with a look at where we are now and how the current state of the market determines what we're seeing at the bookstore.

As the market shrank in the last part of the Twentieth Century, the pay scale shifted to the feast and (mostly) famine distribution of today. (The century also saw a similar shift for musicians, artists and actors.) Non-paying outlets sprang up. Fan fiction emerged (non-licensed use of characters had, of course, been around for years -- Tiajuana bibles being a classic example -- but fan fiction was written for the author's enjoyment without any real expectation of payment). These changes are generally blamed on the internet but the conventional wisdom is at least a couple of decades off. All of these trends were well established by the Seventies.

With the loss of the short story market and the consolidation of publishing, the economics of writing on spec became brutal. Writing and trying to sell a novel represents a tremendous investment of time and energy with little hope of success. By comparison writing on spec in the Forties meant coming up with twelve to fifteen pages then sending them off to twenty or so potential markets. The best of these markets paid good money; the worst were hungry for anything publishable.

The shift from short story to novel also meant greater risk for the publisher (and, though we don't normally think of it in these terms, for the reader who also invested money and time). A back-pages story that most readers skipped over might hurt the sales and reputation of a magazine slightly but as long as the featured stories were strong, the effect would be negligible. Novels though are free-standing and the novel gets that gets skipped over is the novel that goes unsold.

When Gold Medal signed John. D. MacDonald they knew were getting a skilled, prolific writer with a track record artistically and commercially successful short fiction. The same could be said about the signing of Donald Westlake, Lawrence Block, Joe Gores and many others. Publishing these first time authors was a remarkably low risk proposition.

Unfortunately for publishers today, there are no potential first time authors with those resumes. Publishers now have to roll the dice on inexperienced writers of unknown talent and productivity. In response to that change, they have taken various steps to mitigate the risk.

One response was the rise of the marketable blockbuster. The earliest example I can think of is the book Lace by Shirley Conran. If memory serves, Lace got a great deal of attention in the publishing world for Conran's huge advance, her lack of fiction-writing experience, and the role marketing played in the process. The general feeling was that the tagline ("Which one of you bitches is my mother? ") came first while the book itself was merely an afterthought.

More recently we have Dexter, a marketer's dream ("He's a serial killer who kills serial killers... It's torture porn you can feel good about!"). The author had a few books in his resume but nothing distinguished. The most notable was probably a collaboration with Star Trek actor Michael Dorn. The first book in the series, Darkly Dreaming Dexter was so poorly constructed that all of the principals had to act completely out of character to resolve the plot (tip for new authors: when a character casually overlooks her own attempted vivisection, it's time for a rewrite*).

The problems with the quality of the novel had no apparent effect on sales, nor did it prevent the character from appearing in a successful series of sequels and being picked up by Showtime (The TV show was handled by far more experienced writers who managed to seal up almost all of the plot holes).

The point here is not that Darkly Dreaming Dexter was a bad book or that publishing standards have declined. The point is that the economics have changed. Experienced fiction writers are more rare. Marketable concepts and franchises are more valuable, as is synergy with other media. The markets are smaller. There are fewer players. And much of the audience has a troublesome form of brand loyalty.

Normally of course brand loyalty is a plus, but books are an unusual case. If you convince a Coke drinker to also to drink Sprite you probably won't increase his overall soda consumption; you'll just have cannibalization. But readers who stick exclusively with one writer are severely underconsuming. Convince James Patterson readers to start reading Dean Koontz and you could double overall sales.

When most readers got their fiction either through magazines or by leafing through paperback racks, it was easy to introduce them to new writers. Now the situation is more difficult. One creative solution has been apostrophe series such as Tom Clancy's Op Center. Other people are credited with actually writing the books but the name above the title is there for branding purposes.

Which all leads us back to the original question: Why did thrillers become so dominant?

They tend to be easily marketable.

They are compatible with franchises.

They lend themselves to adaptation as big budget action movies.

Their somewhat impersonal style makes them suitable for ghosting or apostrophe branding.

They are, in short, they are what the market is looking for. As for me, I'm looking for the next reprint from Hard Case, but I might borrow the latest Turow after you're done with it.


* "Is that a spoiler?"
"No, sir. It was spoiled when I got here."

p.s. I was going to tie in with a branding situation Slim Jim snacks faced a few years ago but this post is running a bit long. Maybe I'll get back to it later.

 

Monday, February 17, 2025

How I Learned to Stop Worrying and Love the LLM part 1-- coding

I used to ask, only half-jokingly, how any of us learned to code before Google.  I went to grad school in the mid-90s, so I'm old enough to remember that literal bookshelf of not-very-helpful SAS manuals.  Getting a straightforward answer to a basic coding question often seemed insurmountable.  You can imagine, then, the revolutionary impact of the internet and various online resources.

The advance represented by LLMs has been comfortable. While I would never consider using one in a situation where I needed the kind of background and understanding that comes from a textbook or course, in terms of straightforward "how do I code this" questions, I can no more imagine Googling the topic or turning to an online forum than I can imagine digging through that old stack of phone books (I just can't stop dating myself in this post).

Out of perhaps excessive caution, I never give ChatGPT any real data or metadata. The tables I rename something unimaginative like "A" or "B."  For the fields, I try to use something that would fall in the same general category.  For example, Buyer_ID might become SSN.  There are no doubt countless examples of social security numbers in the LLMs' training data, and pretty much all of them treat it as a unique identifier. 

It does have limitations and will get some examples wrong, particularly if you let things get too complex, but if you can keep things bite sized and be absolutely clear with your logic, the LLM performs remarkably well. I don't know if this makes up for the huge environmental cost of building these models and it certainly doesn't balance out the damage generative AI has done, but if used properly, these are remarkably useful and powerful tools.

Friday, February 14, 2025

Listen to the last three minutes of Thursday's Marketplace

Kai Ryssdal's closing statement is a template for how non-partisan mainstream journalists should frame this moment. Measured and objective, but refusing to flinch when accurately describing the situation we find ourselves in. 

Marketplace is one of the very few news organizations that emerged from the past few years with its reputation not just intact but enhanced. The list is embarrassingly small. Lazy and craven journalism has become the norm, which is especially tragic at a time when we really need journalists to step up and do their jobs..





Thursday, February 13, 2025

In defense of LLMs, they are considerably more polite than the typical 9th grader

 Andrew Gelman recently shared the following from Gary Smith:

I [Smith] tested OpenAI o1, Copilot, and Gemini Flash, along with DeepSeek, on a question about Tic-Tac-Toe . . .

I’ve deleted much of the superfluous fluff that LLMs tend to generate from the excerpts below:

Gary:  Id like to discuss a new version of tic-tac-toe called 90-degree tic-tac-toe.

OpenAI:  That sounds intriguing! What’s the twist in “90-degree Tic-Tac-Toe”? Does it involve rotating the board, or is there some other mechanic that changes the gameplay? Let me know the rules or your idea, and we can brainstorm or refine it together!

DeepSeek: Sure! 90-degree tic-tac-toe sounds like an interesting twist on the classic game. Could you explain the rules or mechanics of this new version?

Copilot:  That sounds intriguing! 90-degree tic-tac-toe adds a new twist to the classic game. Tell me more about it. Are the rules different? How do you rotate the board? I’m excited to hear how this version shakes things up!

Gemini:  I’d love to discuss 90-degree tic-tac-toe! To get us started, could you explain the basic rules of this new variation?…I’m excited to learn more about this unique twist on a classic game. Let’s dive in!

Gary: It is still a 3×3 grid but the grid is rotated 90-degrees to the right.

OpenAI:  Interesting concept! By rotating the grid 90 degrees to the right, you’re visually changing the perspective of the classic 3×3 Tic-Tac-Toe board. The gameplay itself might remain the same, but the rotated perspective could create a novel experience for players.

DeepSeek: Ah, I see! So in 90-degree tic-tac-toe, the 3×3 grid is rotated 90 degrees to the right….This could be a fun twist because the rotated grid might disorient players initially, making it harder to spot winning opportunities or block opponents.

Copilot:  That sounds like a fun twist on the classic game! By rotating the grid 90 degrees to the right, it gives players a fresh perspective and challenges them to think differently about their moves.

Gemini:  This rotation doesn’t change the core concept of getting three of your symbols in a row, but it does significantly alter the strategy and how players approach the game.

Gary:  Do you think it would be better to rotate the 3×3 grid to the right or to the left?

OpenAI:  If you want the game to feel approachable and intuitive, rotating to the right might be better. If you’re aiming for a twist that makes players rethink the game and adds a subtle layer of complexity, rotating to the left could be the way to go. Ultimately, it depends on your target audience and the experience you want to create. You could even experiment with both directions and see which feels more engaging during gameplay testing!

DeepSeek:  If the goal is to make the game feel fresh while keeping it accessible, rotating the grid to the right is likely the better choice. It maintains a natural flow for players while still offering a new visual perspective. However, if you want to introduce a bit more novelty or challenge, rotating to the left could be an interesting alternative.

Copilot:  Both options could bring their own unique challenges! Most people are right-handed. However, rotating it to the left could add an extra layer of difficulty and make the game more unpredictable.

Gemini:  Some players might find it more intuitive to mentally adjust to a rightward rotation, while others might prefer a leftward rotation. You could even experiment with both directions to see which one feels more natural or enjoyable to play!

Gary:  Which will be more difficult for expert players?

OpenAI:  To challenge expert players and make the game feel fresh and complex, a left rotation would likely be the better choice. It disrupts their ingrained patterns, encourages more active thinking, and raises the game’s difficulty level.

DeepSeek: For expert players, rotating the grid 90 degrees to the left (counterclockwise) is likely to be slightly more difficult than rotating it to the right. Here’s why:

Breaking Muscle Memory: Expert players are deeply familiar with the traditional grid layout and common winning patterns. Rotating the grid to the left disrupts their mental map more significantly than rotating it to the right, as the transformation is less intuitive.

Copilot:  For expert players, rotating the grid to the left might prove to be more challenging. Here’s why:

  • Unfamiliar Rotation: Left rotation is less common and could disrupt their established strategies, requiring more adaptability.
  • Strategic Complexity: It may force players to think several moves ahead, recalculating their strategies continuously.

Ready to test this theory in a game?

Gemini:  Breaking Habit: Long-time tic-tac-toe players might have deeply ingrained habits and strategies associated with the traditional grid orientation. Overcoming these ingrained patterns could require a brief period of readjustment.

One of the problems with the current large language model discussion is that, while there are loads of brilliant people in the conversation, very few of them have taught ninth-grade math. K-12 teachers (at least the better ones) are used to thinking about answers not just in terms of being right or wrong, but in terms of what we can infer about the underlying thought processes that produced an answer.

We previously discussed the example of the alarmed toddler who rushed to tell their parents "Anna hurt Anna-self!" when their sister fell down the stairs. The statement was wrong -- it showed confusion over when to use a noun versus a pronoun -- but it also demonstrated the child's correct grasp of how "self" worked.

By comparison, LLMs' mistakes are often notable for their out of nowhere quality, like the infamous "There are no countries in Africa that start with the letter 'K'. The closest is Kenya, which starts with a 'K' sound." This one is a bit like the evil, or in this case stupid twin to Anna-self. The words are strung together correctly but reveal a complete lack of underlying comprehension.

LLMs don't have "thought processes," and approaching them as if they did will not lead anywhere productive, but that doesn't mean we can't get some insights, perhaps even do a little reverse engineering, from those "wrong" responses.

Smith said that though these LLMs had curiously similar responses, they "almost surely" did not see tic-tac-toe rotation in their training. This is technically true, but they probably did see some discussion about similar modifications of the broader class of other abstract strategy games. The go-to example for this class is, of course, chess, and you'll notice that if you take the responses and substitute that game for tic-tac-toe, almost all of the answers seem perfectly reasonable.

[Assuming it hasn't been done already, it might be interesting to conduct some experiments on how changing perspectives on the chessboard (such as playing from the opponent's side) affects strategy and style of play. You probably also want to consider skill level and age, especially when looking at how quickly players adapted to changes in orientation. The great YouTube engineering channel Smarter Every Day built a bicycle which turned left when the handlebars turned right and vice versa. One of their findings was it could take months for an adult to master the reverse steering and afterwards it was extremely difficult to go back to normal bikes. Kids, on the other hand, could make the switch far more easily.]

Giving an appropriate for chess/absurd for tic-tac-toe answer  strongly suggests that when the training data is thin with respect to a particular prompt, the LLM will use data from a more general class to generate a response. Nothing especially surprising about this – – it's more or less what you'd expect – – but it does explain why this approach generated such absurd responses in this specific case.

This is a good moment to step back and remind ourselves of the constant effort we have to make to avoid anthropomorphizing AI in general and large language models in particular. LLMs can generate responses that seem uncannily human, but we have to keep reminding ourselves that the underlying processes are completely non-analogous. Any meaningful conversation about the applications and limitations of these systems has got to start and stop with that fact.

People think about the attributes, patterns, shapes of things but more importantly they think about things' purpose, causality, functionality, history, motivation, and the stories we can tell ourselves about them. Generative AI at its current level of development works almost entirely with the first group. Sometimes this allows them to come up with responses that seem to be "thinking" about the latter group, but these can be explained as regurgitations.

Most of the absurd answers we get from generative AI come from it not having any capacity to deal directly with how and why. You can find AI pictures of automobiles with wheels perpendicular to each other or of campfires inside tents because how a car works or a fire behaves is factored only indirectly into the process.

The algorithm only looks at patterns in the training data. They may have fixed it now, but recently at least one of the big LLMs would give you a detailed and very familiar sounding proof arguing that the square root of four was an irrational number. It's not difficult to see how this happened – – the training data must have countless examples of the wonderfully elegant demonstration that the square root of two is irrational. It's possibly the most famous proof of all time – – but no human who understood the proof would try to apply it to four.

Likewise, there are extensive discussions in the training data about how changing the orientation of something visual would affect the way people perceive and work with it. In all of those cases, however, the generally unstated reason for the change is that the thing being reoriented would look different after the transformation. If that's not true, then all subsequent discussion and analysis is absurd. It calls to mind the old joke about the coach who promises to turn his losing team around 360°.