Tuesday, March 4, 2025

A Blacker Black Box

From Matt Levine's newsletter:

There are two basic ways to use artificial intelligence to predict stock prices:

  1. You build a deep learning model to predict stock prices: You set up a deep neural net, you feed it tons of historical data about stocks, and you train it to figure out how that data predicts stock price returns. Then you run the model on current data, it predicts future returns, and you buy the stocks that it thinks will go up.
  2. You take some deep learning model that someone else built, a large language model, one that is good at predicting text. It is trained on a huge corpus of human language, and it is good at answering questions like “write a poem about a frog in the style of W.B. Yeats.” And you ask it questions like “write a report about whether I should buy Nvidia Corp. stock in the style of Warren Buffett.” And then it trains on the writing style of Warren Buffett, which reflects his thinking style, and its answer to your question — you hope — actually reflects what Buffett might say, or what he might say if he was a computer with a lot of time to think about the question. And because Warren Buffett is good at picking stocks, this synthetic version of him is useful to you. You read the report, and if robot Warren Buffett says “buy” you buy.

The first approach makes obvious intuitive sense and roughly describes what various quantitative investment firms actually get up to: There might be patterns in financial data that predict future returns, and deep learning is a statistical technique for finding them.

The second approach seems … sort of insane and wasteful and indirect? Yet also funny and charming? It is an approach to solving the problem by first solving a much harder and more general problem: Instead of “go through a ton of data to see what signals predict whether a stock goes up,” it’s “construct a robot that convincingly mimics human consciousness, and then train that robot to mimic the consciousness of a particular human who is good at picking stocks, and then give the robot some basic data about a stock, and then ask the robot to predict whether the human would predict that the stock will go up.” 

My impression is that there are people using the first approach with significant success — this is roughly, like Renaissance Technologies — and the second approach is mostly me making a joke. But not entirely. The second approach has some critical advantages:

  1. Somebody else — OpenAI or xAI or DeepSeek or whoever — already built the large language model for you, at great expense. If you are on the cutting edge of machine learning and can afford to pay for huge quantities of data and researchers and computing capacity, go ahead and build a stock-predicting model, but if you are just, say, an academic, using someone else’s model is probably easier. The large language model companies release their models pretty widely. The stock model companies do not. You can’t, like, pay $20 a month for Rennaissance’s stock price model.
  2. Because the large language model’s output is prose, its reasoning is explainable in a way that the stock model is not. The stock model is like “I have looked at every possible combination of 100,000 data time series and constructed a signal that is a nonlinear combination of 37,314 of them, and the signal says Nvidia will go up,” and if you ask why, the model will say “well, the 37,314 data sets.” You just have to trust it. Whereas robot Warren Buffett will write you a nice little report, with reasons you should buy Nvidia. The reasons might be entirely hallucinated, but you can go check. I wrote once: “One criticism that you sometimes see of artificial intelligence in finance is that the computer is a black box that picks stocks for reasons its human users can’t understand: The computer’s reasoning process is opaque, and so you can’t be confident that it is picking stocks for good reasons or due to spurious correlations. Making the computer write you an investment memo solves that problem!”
  3. I do think that the aesthetic and social appeal of typing in a little box to have a chat with your friend Robot Warren is different from the black box just giving you a list of stocks to buy. This probably doesn’t matter too much to rigorous quantitative hedge funds, but it must matter to someone. We talked last year about a startup that was launching “a chatbot that offers stock-picking advice” to retail brokerage customers, and it seemed like the goal of the project was not “the chatbot will always tell you stocks that will go up” but rather “the chatbot will offer a convincing simulacrum of talking to a human broker,” who also will not always tell you stocks that will go up. You call the broker anyway. Now you can text the chatbot instead.

And so we also talked last year about an exchange-traded-fund firm that would use large language models to simulate human experts — ones with characteristics of particular humans, like Buffett — to make stock picks. Why use LLMs rather than build a model to directly predict stock prices? Well, because the LLM is already there, and the data is already there, and the schtick is a little more human than “here’s our black box.”

Anyway here’s a paper on “Simulating the Survey of Professional Forecasters,” by Anne Lundgaard Hansen, John Horton, Sophia Kazinnik, Daniela Puzzello and Ali Zarifhonarvar:

Though Levine does a characteristically great job laying out the questions in a clear and insightful way, on at least one point, I think he's not just wrong, but the opposite of right. The LLM may appear to be less opaque, but it is actually the blacker black box.

Normally, when we use the term "black box model," we mean that we know the data that goes in and can see the scored data that comes out, but the process by which it is arrived at is so complex and computation-intensive that we can't say exactly what happened. However, in practice, that's not entirely true. We can analyze the output, identify the main drivers of the model, and flag potential biases and other problems. We can perturb the input data, leaving out certain parts, and observe how the output is affected. In most real-world cases I've seen, you can reverse-engineer the model, creating something remarkably close that uses a manageable and, more importantly, comprehensible dataset and series of calculations. This simpler, reverse-engineered model won't use the same data as the black box, but it will be transparent, will very likely use the same categories of data and generally capture the underlying relationships and sometimes perform almost as well.

I have never done anything related to stock prediction, but I have worked with models predicting consumer behavior, and I'm betting that the underlying process is somewhat similar. Let's take the example of a credit card company building a black-box model to predict which customers are likely to default on their debts. In addition to transaction and payment history, the company has access to a huge amount of data from credit bureaus, vendors such as Acxiom, publicly available information, and macroeconomic data. We're talking about tens of thousands of variables going into that model. It is not possible for a person or even a team to go through all of these fields one by one, but at a more general level, it is possible to know what kind of data is going in and to maintain some standard for quality and relevance.

If your training data is everything that can be scraped from the internet, it is effectively unknowable. In the traditional black-box scenario, we know the data and the output; only the middle part of the process is opaque. With large language models, however, everything before the final answer is shrouded in darkness.

Your training data may include the writings of Warren Buffett, the text of A Random Walk Down Wall Street, and the archives of The Wall Street Journal, but it can also contain horoscopes, blogs from "buy the dip" Robinhood day traders, and market analysis from crypto investors. The style and word choice might resemble those of the Oracle of Omaha, but the underlying ideas might come from the Rich Dad Poor Dad guy.

 

1 comment:

  1. "The LLM may appear to be less opaque, but it is actually the blacker black box."

    Hmm. We know what's in an LLM: a lot of text and a random number generator. While the formal definition of what an LLM does is "process sequences of undefined tokens", you could also see it as a pattern extraction and random instantiation process. "Write me an article in the style of investor X." will give you a parody of investor X writing about some other stock, since what it has is just that text about past things, and what you get is something like that text with the nouns changed.

    So it's worse than a black box: it's a halucinating parody generator.

    ReplyDelete