West Coast Stat Views (on Observational Epidemiology and more)

Friday, February 28, 2025

More thoughts on criticism -- critics and reviewers

Both reviewer and critic are honorable professions, with, if anything, the former being more valuable. The best reviewers by a wide margin used to be the team that put together Maltin's Movie Guide. As far as I can tell, no one has stepped up since to fill the gap, another case of the internet crowding out quality with crap.

The purpose of criticism is to deepen our understanding of a work and explore its connection to larger themes, genres, social issues, politics, psychology, etc. The primary purpose of reviews is to let people know whether or not they might like a movie, book, restaurant, or whatever.

It follows that the target audience of reviews is people who have not seen or heard the work in question. This doesn't mean that people won't sometimes seek out reviews after watching, reading, or listening to the work in question. Many of us like to compare our reactions to those of people who get paid to do this, but the reviews themselves are virtually never written for this segment.

It also follows that criticism is almost always most meaningful when the reader knows at least something about the subject. Here too, there can be some gray areas, particularly when the work in question is widely known or when it connects to larger questions about other topics. The book and the movie The Grapes of Wrath would qualify under both of these criteria.

Pauline Kael was probably our best movie critic, and yes, I know I'll get some pushback on that one, but historically that pushback has mainly come from people who strongly disagree with her assessment of various movies, which is understandable since Kael was also a terrible reviewer. If you try to boil down her thoughts about a film to "this part was good, this part was bad," she would seem arbitrary and erratic. The rule of thumb for reading Kael is that long is almost always good and short is generally bad. For this reason, literally, the last book of Kael's you should read is 5,001 Nights at the Movies, which unfortunately seems to be her most popular title.

Though heavily influenced by Pauline Kael, Roger Ebert was mainly a reviewer. Almost all newspaper movie critics are. It is difficult to be anything else, given the space constraints that almost all of them work under. Bob Chipman is one of those rare examples of a good critic who is also a good reviewer. Denby is a better critic than reviewer. The new guy at the New Yorker is a competent reviewer and a lousy critic, but we'll get to him in a future post.

It is important for a reviewer to have good or perhaps even more to the point, predictable tastes. This is particularly true when the reviews break with what you'd expect. A show like The Crown getting good notices tells you virtually nothing — it was all but grown in a lab to push reviewers' buttons. On the other end of the spectrum, when Siskel and Ebert both gave thumbs up to the over-the-top sex and gore of Re-Animator or when pretty much every review singled out William Fichtner's performance in the Grindhouse homage Drive Angry, you pretty much know you've got something good.

(Fichtner is one of the most reliable and underrated actors in Hollywood, so this was always pretty much a safe bet, but seriously, he is wonderful as the coolest satanic emissary you'll ever regret running across.)

With criticism, predictable or compatible tastes are often completely unnecessary. The director Barry Sonnenfeld likes Jerry Lewis comedies. I find them difficult to choke down. I do, however, enjoy listening to Sonnenfeld explain why he admires Lewis, and hearing him discuss those films that I don't care for deepened my appreciation of a number of comedies I am very fond of, including Men in Black.

Thursday, February 27, 2025

Two completely unrelated news stories

From the Arkansas Times:

Bill deregulating raw dairy products passes Senate, heads to governor’s desk
by Phillip Powell

A bill to deregulate the sale of raw, unpasteurized milk heads to Gov. Sarah Sanders’s desk after it was overwhelmingly passed by the state Senate on Monday.
If signed by the governor, House Bill 1048 would allow farmers who produce unpasteurized goat milk, sheep milk or whole milk to sell the product at farmer’s markets and to deliver the product to customers – greatly increasing consumer access to the product.
“The other side benefit of this bill is the goats whose milk is being drank might otherwise be executed, so it’s a PETA bill too, you’ll be saving goats,” the bill’s sponsor, Sen. Alan Clark (R-Lonsdale), said, referencing the People for the Ethical Treatment of Animals, ahead of Monday’s Senate vote.
The bill passed 33-1, with Sen. Stephanie Flowers (D-Pine Bluff) as the only “no” vote.
Farmers would still be limited to selling 500 gallons a month under a 2013 law, and they would still be required to properly label the product as unpasteurized, but buyers would assume liability for all illness that may occur after drinking the dairy product.

From Discover Magazine:

The first case of bird flu in cattle was reported on March 25, 2024. In less than a year, the virus has hit 973 herds, according to the U.S. Centers for Disease Control and Prevention.
The United States Department of Agriculture (USDA) first issued an order to test cows that farmers intended to move between states last April. Then, in December 2024, the agency issued a federal order for milk testing. The order stipulated that unpasteurized milk samples be collected from dairy processing facilities nationwide and tested, with the results being shared with the USDA.
Since then, the virus has been detected in 17 states: 747 herds in California, 64 in Colorado, 35 in Idaho, 31 in Michigan, 27 in Texas, 13 each in Iowa and Utah, nine each in Minnesota and New Mexico, seven each in Nevada and South Dakota, four in Kansas, two in Oklahoma, and one each in Arizona, North Carolina, Ohio, and Wyoming.
...
Complicating matters, it is unclear how many CDC employees are still employed to monitor the spread of the virus and how many USDA workers are still on the job to conduct the milk testing orders.
...
The virus has been spreading in humans as well — but not as rapidly as in poultry or dairy cows, according to the American Veterinary Medical Association. So far, 69 people in the U.S. have tested positive for the virus. Of those, 23 cases involved poultry farm workers, and 41 involved dairy farm workers.

Wednesday, February 26, 2025

"There seems to be a plague of jellyfish and really a plague of anything is usually not considered good."

As previously mentioned ("The city would naturally form a line as it tried to get away from itself."), Patrick Boyle is one of the smartest and funniest people on YouTube. I can recommend every video I've seen on his channel, but this survey of failed mega-projects has a special place in my heart, hitting on favorites like the hyperloop and Neom.

To give you a taste of the style (though unfortunately without Boyle's perfect deadpan delivery), I ran some excerpts from the YouTube transcript through ChatGPT which worked perfectly (raising the question why doesn't Google use its AI to clean up its transcriptions?).

BBC reporters who visited Forest City in 2023 described seeing a children's train that someone forgot to turn off, doing endless loops around the abandoned shopping mall, playing "Head, Shoulders, Knees and Toes" in Chinese despite there being no children inside. Residents told the reporters that they were desperate to escape the city, which they called a "lifeless husk" that quickly erodes the sanity of anyone trying to live there... They probably shouldn't put that in their real estate listing.
...
They’ve also built attractions like the stairway on the beach, which is supposed to be a tourist attraction. Apparently, it’s very Instagrammable, you know, the way tourists will travel miles out into the middle of nowhere to take a photo on the stairs and hang out on the crocodile-infested beach. Oh yeah, I forgot to mention the crocodiles, didn’t I? Yeah, there’s crocodiles. In fact, that sign on the beach near the Instagram stairs is there to warn you about the crocodiles. No wonder they call it the Stairway to Heaven—there’s a good chance that you’re not getting out of there alive. That photo might be your last Instagram upload. Sorry, we seem to have taken a bit of a dark turn here.
...
So next up, we have the Hyperloop. That'll be good—at least there won't be jellyfish and crocodiles everywhere. That's the good thing about the CGI-based mega projects; the designers can usually keep the wildlife out of them.

Tuesday, February 25, 2025

Thiel is Dr. Mabuse. Musk is Cobra Commander.

From Josh Marshall:

Against the backdrop of a month of chaos and destruction, something began to shift more or less in the middle of this week. I don’t want to overstate what it portends in the short term. Elon Musk remains firmly in the saddle. And even as many of Trump’s advisors grow concerned about the impact of Musk’s rampage, Donald Trump himself appears to be maintaining his support. The moment was captured yesterday at what are now the more or less constant CPACs where Steve Bannon tossed off a Nazi salute and Musk appeared in a “Dark MAGA” baseball cap sporting a chainsaw and basking in the adulation of the MAGA/CPAC faithful awash in the joy a certain kind of individual derives from destruction and pain. The picture itself is a key signpost in the story. Make a note of it. Musk himself posted it to Twitter, labeled with “The DogeFather” and flexing with the text: “This is a real picture.”
...
2025 might be the first time in human history where we have a genuine supervillain walking among us. Humanity has spawned numerous monsters, of course: Hitler, Stalin, Pol Pot. But I’m talking about the supervillain on the Gotham/Metropolis model. The glad-handing, fantastically rich, this-dial-goes-to-11 over-the-top weirdo with his raucous bevy of cheerleaders who is in fact evil and has a cartoonishly stupid but yet very real plan to take over the world. Look at that picture again. You can easily imagine running it over every 2025 political ad about the chaos and immiseration he unleashed on the country.

But what kind of supervillain? Peter Thiel is the shadowy, mysterious type—the one who manipulates everything from behind the scenes. Musk is the megalomaniac type, the kind who insists on telling the hero about his evil plan, with schemes so grandiose and badly thought out that they would probably foil themselves if no secret agent were available to step in.

With the possible exception of transgender rights, Peter Thiel is more ideologically extreme—sometimes much more extreme—across the board than Elon Musk (just read the infamous Cato essay where he gives his take on women's suffrage), but he is far less emotionally needy.

Musk has an insatiable craving for attention and adoration. He seeks out worshipful crowds. He tells improbable stories about punching out the high school bully (but only to those journalists unlikely to check out his stories). He pays people to play video games for him so he can go on Joe Rogan and claim to be one of the best in the world. (If he were a golfer, his partnership with Donald Trump would not survive their first game.)

Musk has limited impulse control, particularly when angry. Both men are petty and vindictive, but while Peter Thiel patiently waited years to destroy Gawker by secretly funding Hulk Hogan's lawsuit, Musk lashes out immediately to any perceived insult with public name calling, threats, and rage-firing. This last one drove managers at Tesla to advise employees to take roundabout routes around the building in order to avoid walking past the CEO.

The list of possible motivations for Elon Musk is long and complicated, but one thing you should keep in mind is that everything we've seen so far—the bullying, the erratic behavior, the certainty, the apparently deliberate dickishness—is all absolutely consistent with everything we've seen from him in the past.

If you are shocked by any of this, it just means you haven't been paying attention

Monday, February 24, 2025

How I Learned to Stop Worrying and Love the LLM part 2 -- a proof-reader, not an editor

I have two dictation options which, being a horrible typist, I use frequently. The first is Dragon NaturallySpeaking on my laptop, which works fairly well. The second is dictating email to my phone, which does not. Capitalization rules seem to be based on some kind of random number generator. Homonyms are, of course, a problem, but so are misheard and missing words. Correcting these mistakes can eat up most, and sometimes all, of the time saved.

It is also tedious as hell.

I decided to let ChatGPT take a crack at it and see how well it worked. Here’s the prompt I used.

"Edit the following paragraphs with explanations in brackets after each change with explanations. : "

How did it work? It depends. On the part I was most interested in — homonyms, weird capitalization, and misheard or missing words — it caught almost everything I wanted it to. The other revisions it suggested weren’t particularly helpful. I believe I used just one of them, and that was because I had used the same word twice in the paragraph, not because of the reason given in the explanation

One of those unused suggestions struck me as a particularly interesting example of how differently ChatGPT "thinks." Here is the paragraph in question:

He used the windfall from the sale of PayPal along with funding from other investors to establish SpaceX, but the people actually in charge were highly respected aerospace veterans. They sometimes let the money guy wear an engineer’s cap and blow the whistle, but no one, including Musk himself, really thought he was running the train.

The only change the algorithm suggested was substituting "operation" for "train." Normally, I wouldn't have been that surprised that it didn't make the analogous choice—LLMs aren't really capable of creating true analogies—but I assumed it would associate the terms "engineer’s cap" and "blow the whistle" with the word "train."

The bigger point here is that large language models do represent an impressive advance in the way we talk to computers and they talk to us. While they come nowhere near living up to the hype, they can provide us with some genuinely useful tools, as long as we keep their limitations in mind.

So there. I’ve now said nice things about LLMs in two posts. I hope you’re satisfied.

Friday, February 21, 2025

"What was Clive James thinking?" Maybe the question is when was he thinking it?

From Andrew Gelman:

From The Dreaming Swimmer (1992), one of Clive James’s classic essay collections:

The best Hitchcock film was directed by someone else. Charade would not be as good as it is if Hitchcock had not developed the genre it epitomises, but Hitchcock could never have created a film so meticulous, plausible, sensitive, light-footed and funny.

Whaaaa? We saw Charade recently, and it was . . . really bad. I mean, sure, I’ve seen worse movies, and the acting was fine for what it was, but, no, I didn’t think it was “meticulous,” “plausible,” “sensitive,” “light-footed,” or “funny.” I’d describe it more as a movie constructed to have these attributes without ever actually achieving them.

So then this makes me wonder: What was Clive James thinking? And, more generally, how to react when someone you admire has different tastes than you?

James' claim here may be indefensible but it's not inexplicable.

Tastes are neither fixed nor independent. They evolve over time, and though we might not like to admit it, they are influenced by peers and authority figures. The idea that Charade is, by any stretch of the imagination, a better film than North by Northwest would seem so absurd that no self-respecting critic would dare to say it in public. However, if we go back to the mid-'60s and look at what other critics were saying at the time, Clive James's take is far easier to understand.

Admittedly, James appears to have written these comments about a quarter-century after Charade came out, but as a '60s intellectual who was interested in the arts, he probably saw the movie in the theater when it was released and followed the critical discussion surrounding both Charade and the Hitchcock films that followed, such as Marnie, Topaz, and Torn Curtain. It seems reasonable to suggest that his opinions about the film were formed in the context of the 1960s, particularly given the limited options he would have had for re-watching the film in the first couple of decades after it was released.

There were many factors that predisposed viewers like James to like Charade in 1963. In style and execution, it was the kind of slick entertainment that was fashionable in 1963. The big-budget (for the time), glamorous European location shooting added an air of sophistication. The supporting cast was excellent. Matthau and Coburn were in the process of breaking big. George Kennedy was an up-and-comer, only four years away from his Cool Hand Luke Oscar. Cary Grant was Cary Grant—still one of the world’s biggest stars and the unquestioned master of this type of role.

Then there was Audrey Hepburn. It is difficult to overstate how charmed audiences were by Hepburn during her relatively short career. Critics used lots of words like "luminous." I have to admit I never really got the full appeal of Hepburn's innocent waif act, but people at the time could not get enough, particularly when she was cast in romantic comedies with much older leading men (Bogart, Harrison, Astaire, and Grant). Of course this was much more common in movies of the time, the age differences weren't normally nearly as obvious. Lauren Bacall seemed to be 18 going on 40 while Hepburn seemed to be 25 going on 14.

Charade unsurprisingly got raves. None of Alfred Hitchcock's 1960s films received better than mixed reviews upon release. It took Psycho and The Birds years to win over the critics, while the three films that followed are still widely considered second-rate entries into the canon. Critic and general asshole John Simon made a disparaging comment about Hitchcock imitators, then added that with films like Topaz, Hitchcock himself was now one of that group.

When Charade came out to glowing reviews, Hitchcock was in the strange position of being both too old-fashioned and with films like Vertigo, Psycho and the Birds, too ahead of his time. Even the well-reviewed North by Northwest (1959)was generally treated as something of an affectionate self-parody, like an old rock star showing he could still get the crowd on their feet when he ran through the hits.

If you go back and read criticism from the decade, you will find lots of people arguing that Hitchcock had peaked years ago, perhaps even around the time he left Great Britain. These comments have aged badly, but to be completely fair, the last decade of the director's career does unquestionably show signs of decline. While there are lots of critics who will champion Frenzy, everything else he produced after The Birds was decidedly minor.

Calling Charade the best "Hitchcock" film seems crazy today, but the opinion wasn't all that unheard of back when films like it and Marnie were coming out. Still just as wrong, but wrong with lots of company.

Thursday, February 20, 2025

In the cold war, all the cool kids had toys with nuclear warheads

And yes, the M65 atomic cannon was real.

"Ideal Atomic Cannon" - 1958

Most people blame boomer cynicism and distrust of authority on Vietnam. I wonder if it wasn't based on the memory of getting an Honor House Polaris nuclear submarine and comparing the real thing to the ad in the comic book.

Wednesday, February 19, 2025

A tale of two tweets

This is Joseph.

There is a new claim that the US is filled with vampires by Elon Musk, who is concerned about a database query of social security numbers with the death field as being false showing Americans who are hundreds of years old:

And then this new post as to what happens if use a second field (in the same database) to refine the query and just find the people in the previous list who actually get benefits:

There was immediate skepticism about the magnitude of the claims here. It seems like the sort of thing that was unlikely to have been missed for decades. Even Megan McArdle noted that these older persons wouldn't have made any contributions and so wouldn't get benefits when Musk used the same chart in a second post:

So what can possibly be going on here? Well, it seems like the explanation is people using false social security numbers to pay into the fund but that these payments don't create any eligibility for benefits. So it is true that we've located fraudulent payments that reduce the cost of funding social security. Fortunately, this information was buried in a formal report from 2015 in which the SSA decided it wasn't worth the cost to update old databases that no expert would bungle. I suspect that they feel foolish now -- focusing on efficiency rather than clarity -- but getting a correct death date for long dead Americans seems like an expensive journey into old government archives (many of these deaths likely predate electronic recording as Deva Hazarika noted).

Ironically, such a project might well have been a target of DOGE had it been approved in 2023:

But the real lesson here is that database work is complicated (just ask Mark about his work in really exciting databases) and the key to doing anything is deep, subject matter knowledge. So why don't we have subject matter experts engaged? Until quite recently, the idea was that civil servants should be professionals and provide this expertise.

I am worried that the answer Josh Marshall provides is true as to the reasons for this approach and, hopefully, he will expand on these thoughts with further investigative journalism -- hopefully disproving them.

So kids, play safe with databases!

Tuesday, February 18, 2025

Fifteen years ago at the blog (does anyone else feel old?)

My first degree was a BFA in creative writing. When asked why I was switching to statistics, I said I thought it would be easier to find math people who wanted to talk about literature than it would be to find lit people who wanted to talk about mathematics. At the time I thought I was joking.

Blockbusters, Franchises and Apostrophes

More on the economics of genre fiction

The story so far: last week Andrew Gelman had a post on a book that discussed the dominance of best seller lists and suggested that it was due to their increased quality and respectability. I argued that the quality and respectability had if anything decreased (here), posted some background information (here and here) then discussed how the economics of publishing from the late Nineteenth Century through the Post-War era had influenced genre fiction. The following closes with a look at where we are now and how the current state of the market determines what we're seeing at the bookstore.

As the market shrank in the last part of the Twentieth Century, the pay scale shifted to the feast and (mostly) famine distribution of today. (The century also saw a similar shift for musicians, artists and actors.) Non-paying outlets sprang up. Fan fiction emerged (non-licensed use of characters had, of course, been around for years -- Tiajuana bibles being a classic example -- but fan fiction was written for the author's enjoyment without any real expectation of payment). These changes are generally blamed on the internet but the conventional wisdom is at least a couple of decades off. All of these trends were well established by the Seventies.

With the loss of the short story market and the consolidation of publishing, the economics of writing on spec became brutal. Writing and trying to sell a novel represents a tremendous investment of time and energy with little hope of success. By comparison writing on spec in the Forties meant coming up with twelve to fifteen pages then sending them off to twenty or so potential markets. The best of these markets paid good money; the worst were hungry for anything publishable.

The shift from short story to novel also meant greater risk for the publisher (and, though we don't normally think of it in these terms, for the reader who also invested money and time). A back-pages story that most readers skipped over might hurt the sales and reputation of a magazine slightly but as long as the featured stories were strong, the effect would be negligible. Novels though are free-standing and the novel gets that gets skipped over is the novel that goes unsold.

When Gold Medal signed John. D. MacDonald they knew were getting a skilled, prolific writer with a track record artistically and commercially successful short fiction. The same could be said about the signing of Donald Westlake, Lawrence Block, Joe Gores and many others. Publishing these first time authors was a remarkably low risk proposition.

Unfortunately for publishers today, there are no potential first time authors with those resumes. Publishers now have to roll the dice on inexperienced writers of unknown talent and productivity. In response to that change, they have taken various steps to mitigate the risk.

One response was the rise of the marketable blockbuster. The earliest example I can think of is the book Lace by Shirley Conran. If memory serves, Lace got a great deal of attention in the publishing world for Conran's huge advance, her lack of fiction-writing experience, and the role marketing played in the process. The general feeling was that the tagline ("Which one of you bitches is my mother? ") came first while the book itself was merely an afterthought.

More recently we have Dexter, a marketer's dream ("He's a serial killer who kills serial killers... It's torture porn you can feel good about!"). The author had a few books in his resume but nothing distinguished. The most notable was probably a collaboration with Star Trek actor Michael Dorn. The first book in the series, Darkly Dreaming Dexter was so poorly constructed that all of the principals had to act completely out of character to resolve the plot (tip for new authors: when a character casually overlooks her own attempted vivisection, it's time for a rewrite*).

The problems with the quality of the novel had no apparent effect on sales, nor did it prevent the character from appearing in a successful series of sequels and being picked up by Showtime (The TV show was handled by far more experienced writers who managed to seal up almost all of the plot holes).

The point here is not that Darkly Dreaming Dexter was a bad book or that publishing standards have declined. The point is that the economics have changed. Experienced fiction writers are more rare. Marketable concepts and franchises are more valuable, as is synergy with other media. The markets are smaller. There are fewer players. And much of the audience has a troublesome form of brand loyalty.

Normally of course brand loyalty is a plus, but books are an unusual case. If you convince a Coke drinker to also to drink Sprite you probably won't increase his overall soda consumption; you'll just have cannibalization. But readers who stick exclusively with one writer are severely underconsuming. Convince James Patterson readers to start reading Dean Koontz and you could double overall sales.

When most readers got their fiction either through magazines or by leafing through paperback racks, it was easy to introduce them to new writers. Now the situation is more difficult. One creative solution has been apostrophe series such as Tom Clancy's Op Center. Other people are credited with actually writing the books but the name above the title is there for branding purposes.

Which all leads us back to the original question: Why did thrillers become so dominant?

They tend to be easily marketable.

They are compatible with franchises.

They lend themselves to adaptation as big budget action movies.

Their somewhat impersonal style makes them suitable for ghosting or apostrophe branding.

They are, in short, they are what the market is looking for. As for me, I'm looking for the next reprint from Hard Case, but I might borrow the latest Turow after you're done with it.

* "Is that a spoiler?"
"No, sir. It was spoiled when I got here."

p.s. I was going to tie in with a branding situation Slim Jim snacks faced a few years ago but this post is running a bit long. Maybe I'll get back to it later.

Monday, February 17, 2025

How I Learned to Stop Worrying and Love the LLM part 1-- coding

I used to ask, only half-jokingly, how any of us learned to code before Google. I went to grad school in the mid-90s, so I'm old enough to remember that literal bookshelf of not-very-helpful SAS manuals. Getting a straightforward answer to a basic coding question often seemed insurmountable. You can imagine, then, the revolutionary impact of the internet and various online resources.

The advance represented by LLMs has been comfortable. While I would never consider using one in a situation where I needed the kind of background and understanding that comes from a textbook or course, in terms of straightforward "how do I code this" questions, I can no more imagine Googling the topic or turning to an online forum than I can imagine digging through that old stack of phone books (I just can't stop dating myself in this post).

Out of perhaps excessive caution, I never give ChatGPT any real data or metadata. The tables I rename something unimaginative like "A" or "B." For the fields, I try to use something that would fall in the same general category. For example, Buyer_ID might become SSN. There are no doubt countless examples of social security numbers in the LLMs' training data, and pretty much all of them treat it as a unique identifier.

It does have limitations and will get some examples wrong, particularly if you let things get too complex, but if you can keep things bite sized and be absolutely clear with your logic, the LLM performs remarkably well. I don't know if this makes up for the huge environmental cost of building these models and it certainly doesn't balance out the damage generative AI has done, but if used properly, these are remarkably useful and powerful tools.

Friday, February 14, 2025

Listen to the last three minutes of Thursday's Marketplace

Kai Ryssdal's closing statement is a template for how non-partisan mainstream journalists should frame this moment. Measured and objective, but refusing to flinch when accurately describing the situation we find ourselves in.

Marketplace is one of the very few news organizations that emerged from the past few years with its reputation not just intact but enhanced. The list is embarrassingly small. Lazy and craven journalism has become the norm, which is especially tragic at a time when we really need journalists to step up and do their jobs..

Thursday, February 13, 2025

In defense of LLMs, they are considerably more polite than the typical 9th grader

Andrew Gelman recently shared the following from Gary Smith:

I [Smith] tested OpenAI o1, Copilot, and Gemini Flash, along with DeepSeek, on a question about Tic-Tac-Toe . . .

I’ve deleted much of the superfluous fluff that LLMs tend to generate from the excerpts below:

Gary: I’d like to discuss a new version of tic-tac-toe called 90-degree tic-tac-toe.

OpenAI:  That sounds intriguing! What’s the twist in “90-degree Tic-Tac-Toe”? Does it involve rotating the board, or is there some other mechanic that changes the gameplay? Let me know the rules or your idea, and we can brainstorm or refine it together!

DeepSeek: Sure! 90-degree tic-tac-toe sounds like an interesting twist on the classic game. Could you explain the rules or mechanics of this new version?

Copilot:  That sounds intriguing! 90-degree tic-tac-toe adds a new twist to the classic game. Tell me more about it. Are the rules different? How do you rotate the board? I’m excited to hear how this version shakes things up!

Gemini: I’d love to discuss 90-degree tic-tac-toe! To get us started, could you explain the basic rules of this new variation?…I’m excited to learn more about this unique twist on a classic game. Let’s dive in!

Gary: It is still a 3×3 grid but the grid is rotated 90-degrees to the right.

OpenAI: Interesting concept! By rotating the grid 90 degrees to the right, you’re visually changing the perspective of the classic 3×3 Tic-Tac-Toe board. The gameplay itself might remain the same, but the rotated perspective could create a novel experience for players.

DeepSeek: Ah, I see! So in 90-degree tic-tac-toe, the 3×3 grid is rotated 90 degrees to the right….This could be a fun twist because the rotated grid might disorient players initially, making it harder to spot winning opportunities or block opponents.

Copilot:  That sounds like a fun twist on the classic game! By rotating the grid 90 degrees to the right, it gives players a fresh perspective and challenges them to think differently about their moves.

Gemini:  This rotation doesn’t change the core concept of getting three of your symbols in a row, but it does significantly alter the strategy and how players approach the game.

Gary: Do you think it would be better to rotate the 3×3 grid to the right or to the left?

OpenAI: If you want the game to feel approachable and intuitive, rotating to the right might be better. If you’re aiming for a twist that makes players rethink the game and adds a subtle layer of complexity, rotating to the left could be the way to go. Ultimately, it depends on your target audience and the experience you want to create. You could even experiment with both directions and see which feels more engaging during gameplay testing!

DeepSeek:  If the goal is to make the game feel fresh while keeping it accessible, rotating the grid to the right is likely the better choice. It maintains a natural flow for players while still offering a new visual perspective. However, if you want to introduce a bit more novelty or challenge, rotating to the left could be an interesting alternative.

Copilot:  Both options could bring their own unique challenges! Most people are right-handed. However, rotating it to the left could add an extra layer of difficulty and make the game more unpredictable.

Gemini:  Some players might find it more intuitive to mentally adjust to a rightward rotation, while others might prefer a leftward rotation. You could even experiment with both directions to see which one feels more natural or enjoyable to play!

Gary: Which will be more difficult for expert players?

OpenAI: To challenge expert players and make the game feel fresh and complex, a left rotation would likely be the better choice. It disrupts their ingrained patterns, encourages more active thinking, and raises the game’s difficulty level.

DeepSeek: For expert players, rotating the grid 90 degrees to the left (counterclockwise) is likely to be slightly more difficult than rotating it to the right. Here’s why:

Breaking Muscle Memory: Expert players are deeply familiar with the traditional grid layout and common winning patterns. Rotating the grid to the left disrupts their mental map more significantly than rotating it to the right, as the transformation is less intuitive.

Copilot:  For expert players, rotating the grid to the left might prove to be more challenging. Here’s why:

Unfamiliar Rotation: Left rotation is less common and could disrupt their established strategies, requiring more adaptability.
Strategic Complexity: It may force players to think several moves ahead, recalculating their strategies continuously.

Ready to test this theory in a game?

Gemini: Breaking Habit: Long-time tic-tac-toe players might have deeply ingrained habits and strategies associated with the traditional grid orientation. Overcoming these ingrained patterns could require a brief period of readjustment.

One of the problems with the current large language model discussion is that, while there are loads of brilliant people in the conversation, very few of them have taught ninth-grade math. K-12 teachers (at least the better ones) are used to thinking about answers not just in terms of being right or wrong, but in terms of what we can infer about the underlying thought processes that produced an answer.

We previously discussed the example of the alarmed toddler who rushed to tell their parents "Anna hurt Anna-self!" when their sister fell down the stairs. The statement was wrong -- it showed confusion over when to use a noun versus a pronoun -- but it also demonstrated the child's correct grasp of how "self" worked.

By comparison, LLMs' mistakes are often notable for their out of nowhere quality, like the infamous "There are no countries in Africa that start with the letter 'K'. The closest is Kenya, which starts with a 'K' sound." This one is a bit like the evil, or in this case stupid twin to Anna-self. The words are strung together correctly but reveal a complete lack of underlying comprehension.

LLMs don't have "thought processes," and approaching them as if they did will not lead anywhere productive, but that doesn't mean we can't get some insights, perhaps even do a little reverse engineering, from those "wrong" responses.

Smith said that though these LLMs had curiously similar responses, they "almost surely" did not see tic-tac-toe rotation in their training. This is technically true, but they probably did see some discussion about similar modifications of the broader class of other abstract strategy games. The go-to example for this class is, of course, chess, and you'll notice that if you take the responses and substitute that game for tic-tac-toe, almost all of the answers seem perfectly reasonable.

[Assuming it hasn't been done already, it might be interesting to conduct some experiments on how changing perspectives on the chessboard (such as playing from the opponent's side) affects strategy and style of play. You probably also want to consider skill level and age, especially when looking at how quickly players adapted to changes in orientation. The great YouTube engineering channel Smarter Every Day built a bicycle which turned left when the handlebars turned right and vice versa. One of their findings was it could take months for an adult to master the reverse steering and afterwards it was extremely difficult to go back to normal bikes. Kids, on the other hand, could make the switch far more easily.]

Giving an appropriate for chess/absurd for tic-tac-toe answer strongly suggests that when the training data is thin with respect to a particular prompt, the LLM will use data from a more general class to generate a response. Nothing especially surprising about this – – it's more or less what you'd expect – – but it does explain why this approach generated such absurd responses in this specific case.

This is a good moment to step back and remind ourselves of the constant effort we have to make to avoid anthropomorphizing AI in general and large language models in particular. LLMs can generate responses that seem uncannily human, but we have to keep reminding ourselves that the underlying processes are completely non-analogous. Any meaningful conversation about the applications and limitations of these systems has got to start and stop with that fact.

People think about the attributes, patterns, shapes of things but more importantly they think about things' purpose, causality, functionality, history, motivation, and the stories we can tell ourselves about them. Generative AI at its current level of development works almost entirely with the first group. Sometimes this allows them to come up with responses that seem to be "thinking" about the latter group, but these can be explained as regurgitations.

Most of the absurd answers we get from generative AI come from it not having any capacity to deal directly with how and why. You can find AI pictures of automobiles with wheels perpendicular to each other or of campfires inside tents because how a car works or a fire behaves is factored only indirectly into the process.

The algorithm only looks at patterns in the training data. They may have fixed it now, but recently at least one of the big LLMs would give you a detailed and very familiar sounding proof arguing that the square root of four was an irrational number. It's not difficult to see how this happened – – the training data must have countless examples of the wonderfully elegant demonstration that the square root of two is irrational. It's possibly the most famous proof of all time – – but no human who understood the proof would try to apply it to four.

Likewise, there are extensive discussions in the training data about how changing the orientation of something visual would affect the way people perceive and work with it. In all of those cases, however, the generally unstated reason for the change is that the thing being reoriented would look different after the transformation. If that's not true, then all subsequent discussion and analysis is absurd. It calls to mind the old joke about the coach who promises to turn his losing team around 360°.

Wednesday, February 12, 2025

In the aftermath of a disaster that took a great human and environmental toll, your first response should always be to loosen regulations and lower safety standards

[And yes, I realize that, despite being an urban apartment dweller who doesn't have a backyard, raising these issues will get me labeled a NIMBY, but as a former high school teacher, I've been called worse.]

From Mark Ryavec, president of the Venice Stakeholders Association, former legislative analyst for the city of Los Angeles, and Chief Deputy Assessor for Los Angeles County.

Governor Gavin Newsom is misguided in issuing an executive order to expedite rebuilding houses in the Pacific Palisades without first examining what happened there and applying lessons that may be learned to reform building codes and significantly increase the capacity of the local firefighting water system.

The governor recently issued orders to relax Coastal Commission permit requirements and environmental review for new construction as long as the replacement building is not more than 10% larger or taller than the original.

If the city of Los Angeles agrees, this will allow property owners to more quickly start rebuilding – with the same building materials and lax fire safety requirements that failed to protect over 10,000 homes.

I grew up in neighboring Santa Monica and know the community’s built environment from regularly hiking in the area. The majority of homes that were destroyed employed stucco for their exterior walls. Just a cursory online search reveals that standard stucco can only withstand extreme heat for about one hour. Thus, without a change in building codes, the governor is proposing to allow the use of the same building material that failed spectacularly.

There should be an examination by a city commission of experts to determine if standard stucco should be allowed, or limited to blocks that are a half mile or mile away from wilderness areas that are likely to be again overgrown in the next five to ten years with flammable grasses and chaparral.

In some of the few homes that survived, a fiberglass filament was added to the stucco mix, which boosts its ability to withstand heat. Another design element that was successfully deployed in some homes that withstood the fire are lumber and plywood that was coated with fire retardant during the framing stage. The city should consider making these mandatory.

There are other building materials, such as tilt-up concrete panels, full dimension 4-inch bricks (as opposed to thin brick veneer), 3-inch-thick stone walls, and metal, which when properly installed, withstand extreme heat for at least four hours, enough time for all surrounding foliage to burn out, leaving the house standing.

Asphalt roofing shingles, which contain a significant amount of petroleum, do not well withstand fire, and should be banned in favor of Class A fireproof installations like terracotta or concrete tiles or other fireproof materials. Similarly, the juncture of roof eaves with exterior walls is one of the locations most vulnerable to fire, so the city must adopt a requirement that these areas be properly sealed with fireproof material.

We saw this sort of thing after the Camp fire obliterated Paradise, an unexamined rush rebuild as if that could somehow undo the tragedy. These pushes are usually accompanied by calls for resolution and even defiance. "We won't let the fires beat us!" as if they were some terrorist organization trying to break our spirits, instead of a natural force not as predictable as the tides but every bit as inevitable. The forests of the Santa Monica Mountains will burn again and when they do, the successors of the politicians who insisted on doing nothing to address the underlying problems will give heartfelt speeches empathizing with the latest generation of victims and pledging to make mistakes all over again.

Tuesday, February 11, 2025

The rich and famous of Southern California have always loved living in isolated spots with gorgeous mountain views, and it has always been a bad idea

We'll be coming back to this excellent story from the LA Times on the history of catastrophic fires in the region and how we keep ignoring the lessons we ought to be learning from them.

‘Built to burn.’ L.A. let hillside homes multiply without learning from past mistakes by Jenny Jarvie

On a hot, dry November morning in 1961, flames from a trash pile on brushland north of Mulholland Drive were picked up by Santa Ana winds and swept across the canyons of one of Los Angeles’ wealthiest enclaves.
The apocalyptic scenes that played out — of Hollywood celebrities fleeing and clambering onto their roofs — captured the world’s attention like no urban conflagration in history. Actor Kim Novak and Richard Nixon, then a former vice president who moved to L.A. to practice law, wielded garden hoses to soak their wooden roof shingles. Actor Fred MacMurray enlisted studio workers from the set of “My Three Sons” to evacuate his family and help firefighters cut down brush around his Brentwood home.
When the blaze reached the mansions of Bel-Air, thermal heat lifted burning shingles high into the air and 50-mph winds hurled them more than a mile over to Brentwood. By nightfall, the Bel-Air fire had destroyed 484 homes, including those of actor Burt Lancaster, comedian Joe E. Brown and Nobel laureate chemist Willard Libby.
After firefighters extinguished the flames, socialite and actor Zsa Zsa Gabor, wearing white kitten heels and a string of pearls as she clutched a shovel, dug through the rubble of her Bellagio Place home for a safe with jewels.

The Bel-Air fire became known as the “the big one,” the event that forced everyone in Los Angeles to reckon with the dangers fire posed to their coveted hillsides.

In response, L.A. officials ushered in new fire safety measures, investing in more firefighting helicopters, new fire stations and a new reservoir. They also outlawed untreated wood shingles in high-fire-risk areas and initiated a brush clearance program to create defensible space around homes.

But they did not stop building on fire-prone ridges and canyons.

And there was no major push to radically rethink how they built. Over the next half a century, new housing tracts filled the wildland interface. And a succession of larger and more deadly fires swept through the region. But all the safety improvements prompted by the Bel-Air and subsequent fires could not outpace the escalating threat from new development and climate change.

The massive blazes that engulfed Los Angeles hillsides communities Jan. 7, destroying 16,000 structures and killing at least 29 people in and around Pacific Palisades and Altadena, have prompted a new reckoning on how so many L.A. homes came to be built on land so vulnerable to fire and how, or whether, they should be rebuilt.

It’s a crossroads the region has found itself at before when the power of fire left us reeling.

“California is built to burn — it’s not unique in that — but it’s built to burn on a large scale and explosively at times,” said Stephen Pyne, a fire historian and professor emeritus at Arizona State University.

“You can live in that landscape, but how you choose to live will affect whether that fire is something that just passes through like a big thunderstorm, or whether it is something that destroys whatever you’ve got.”

Monday, February 10, 2025

F&A costs and why they matter

This is Joseph.

So on Friday afternoon this new policy dropped, to take effect Monday morning. Like many things in the new world, the lack of any delay before implementation creates an immediate sense of crisis. It seems almost naive to think back to the days where the Administrative Procedures Act was applied to major new changes.

Now what this new policy means is a little complex. When the US government funds research, they attach the costs for facilities and administration as a negotiated percentage of every dollar of direct costs. This is a strange way to pay for enforcing federal regulations and building electricity, but it has the advantage of being both simple and easy to administer. It causes some oddness -- grants focusing on data analysis don't require animal research ethics infrastructure and so averaging these expenses across all grants can lead to some hard to explain corner cases (in both directions as the F&A costs on a primate center may seem shockingly low).

So the real impact of this measure is a large and profound cut to the universities that host research activities. So some of this might return in increased numbers of awards. I might not be surprised if there was a spike on off-campus research given the odd decision to make the level uniform and international institutions will be quite happy with the doubling of the current 8% F&A rate for international awards. The comparison to the much smaller foundation grants is also challenging, as very small amounts of funding can have less generous F&A terms so long as the funding is a small piece of a greater whole and there is no comparison of allowable expenses and/or regulations to be followed (which can make a large difference).

There is a lot of confusion as to whether this is a change that can be done via executive order. But the bottom line is that it is a large and immediate cut to institutes of higher education intended to maximize pain by eliminating the money with flexibility. Is it possible that the current F&A is excessive for at least one institution of higher education, somewhere in the United States? Sure, in some case or at some margins it is likely as the NIH is a big place. Is an immediate cut from ~60% to 15% (all at once without warning) a good policy? No. No it is not.