West Coast Stat Views (on Observational Epidemiology and more)

Wednesday, April 13, 2011

More on deficit reduction

Following up on Joseph's previous post, the Center on Budget and Policy Priorities has (via DeLong) an analysis of the Ryan plan that argues he has greatly exaggerated the savings that would be produced by his recommendations. The part about defense spending was particularly instructive:

About $1.3 trillion of the claimed $5.8 trillion reduction in spending, however, comes simply from taking credit for spending less in future years for the wars in Iraq and Afghanistan, as a result of the already-planned drawdown in the number of troops fighting in those countries. While this accurately reflects the difference between spending for the wars in Ryan’s plan and spending for the wars projected in CBO’s baseline, it does not represent savings or deficit reduction resulting from any change in policy proposed by Ryan....

CBO follows the baseline rules established in the Budget Enforcement Act of 1990 (as subsequently modified). For taxes and mandatory spending, the baseline projections generally assume that there will be no changes in current laws governing taxes and mandatory programs. But for discretionary spending... assuming current law does not make sense.... [B]aseline rules require CBO to assume that for each account and activity, Congress will provide the same amount of funding in each year the baseline projections cover as it provided in the most recently enacted appropriation bills (adjusted for inflation). This generally serves as an adequate proxy.... There is, however, one large anomaly — funding for the wars in Iraq and Afghanistan — that causes the current baseline projections to vary significantly from what it will cost to continue current policies. Following the baseline rules, CBO projects that in every year from 2012 through 2021, appropriations for the wars will remain at the current annual funding level.... Yet a drawdown in troops is already well underway in Iraq and is planned for Afghanistan.... Chairman Ryan’s budget merely plugs in the CBO’s estimate of the war costs under the President’s proposal, without changing them.

This difference of about $1.05 trillion between the war costs in the Ryan budget and those in the CBO baseline thus does not represent new savings that result from Ryan’s budget proposals. Yet Ryan counts this $1.05 trillion, plus the $250 billion reduction in interest costs that such a $1.05 trillion spending reduction would produce, as $1.3 trillion in spending cuts and deficit reduction....

Ryan himself said in a February interview that savings in the Obama budget that come from the troop drawdown should not be considered real savings or deficit reduction. Ryan commented that the Obama budget showed savings of $1.1 trillion because the costs under the proposed withdrawal were compared to a baseline that assumed “they’re going to be in Afghanistan and Iraq at current levels for ten years,” and called these “phantom savings.” Ryan was correct to term these “phantom savings.” And if the phantom savings are not counted as real savings, the amount of spending cuts that Ryan’s proposals produce is $1.3 trillion less than Ryan claims...

If we're going to get anywhere with this debate... Hell, if we're going to get anywhere with any of the debates that are necessary for a functioning democracy, we have to hold to certain standards like consistency about classifications, appropriate weighting of authority such as the CBO and acceptance of shared axioms like the properties of real numbers (I so wish I were being sarcastic about that last one).

To be blunt as a ball-peen hammer, Paul Ryan and a large segment of others on the right have decided to trade intellectual standards for some short-term policy gains. This is a horrible mistake, not because their policies are all bad (I'm not prepared to make a blanket condemnation), but because no policy gain is worth that cost.

Deficits

This is a point that I have also argued and it bears repeating:

Once we understand that there’s no way that one Congress can hold a future Congress to a deficit deal, the discussion can stop right there. The last time there was a surplus, Republicans literally argued that the disappearance of the national debt was a problem that needed to be solved by massive upper-class tax cuts. There’s no reason to think it wouldn’t happen again

I would find the current discussion a lot more comforting if the early George W. Bush years had not begun with an argument that a balanced budget was a serious problem. I'd at least be more sympathetic to the current discussion if it began with a lot of "we were completely misguided . . . " retractions of past policy.

Tuesday, April 12, 2011

Catching up

As mentioned before, I've gotten in the habit of using the quit-and-save option when I run across something that I'd like to blog about later. Surprisingly, this procrastination-based method can lead to some problems so I occasionally just have to clean out the cache.

Here are some of those potential posts:

You know those stories where a foreign company moves manufacturing overseas but gets in trouble because it arrogantly refuses to adapt to the host country's management practices? This is not one of those stories.

I am very close to a big post on this topic.

Strictly in terms of sound business practices, I question the wisdom of the reliance on big-budget remakes.

Frances Woolley raises a lot of questions here.

Thank goodness Joel Klein is on the case.

This Felix Salmon piece on philanthropy got me thinking about proxy metrics and how they can go wrong. Maybe I should explain that one further in a future post.

Being a country boy by birth and blood, I am routinely offended by the coverage of rural and agricultural matters. This smart piece by Monica Potts is a deeply appreciated exception to the rule.

Spade = Spade time. Some people, particularly those of a Straussian bent, find independent, in-depth journalism inconvenient. Those people want to kill NPR.

California has arguably the world's best university system and we're on the verge of dismantling it. Aaron Brady has some interesting things to say on the subject.

And on the subject of things that are great about California but may not stay that way, despite years of mismanagement by the Tribune Company, the LA Times remains one of the country's best papers, still holding up well against that over-rated paper on the other coast. Felix Salmon apparently agrees.

While on the LA Times website, make sure to check out Michael Hiltzik.

A big recommendation and a small caveat for "The Test Generation"

Dana Goldstein has a great piece of education reporting over at the American Prospect. It's balanced and well-informed, though one section did make me a bit nervous:

Colorado politicians don't need to travel as far as Seoul, however, to get a look at education reform that prioritizes good teaching without over-relying on standardized testing or punitive performance-pay schemes. In 2009, in the southwest Denver neighborhood of Athmar Park -- a Latino area studded with auto-body repair shops, tattoo parlors, and check-cashing joints -- a group of union teachers opened the Math and Sciences Leadership Academy (MSLA), the only public school in Colorado built around peer evaluation. The elementary school borrows some of the cooperative professional development tools used in other countries: Every teacher is on a three-person "peer-review team" that spends an entire year observing one another's classrooms and providing feedback. The teachers are grouped to maximize the sharing of best practices; one team includes a second-year teacher struggling with classroom management, a veteran teacher who is excellent at discipline but behind the curve on technology, and a third teacher who is an innovator on using technology in the classroom.

Each teacher in the group will spend about five hours per semester observing her peer's teaching and helping him differentiate his instruction to reach each student. (MSLA is 92 percent Latino, and more than 97 percent of its students receive free or reduced-price lunch. Sixty percent of the student population is still learning basic English.) "It's kind of like medical rounds," explains Kim Ursetta, a kindergarten and first-grade English and Spanish literacy instructor who, as former president of the Denver Classroom Teachers Association, founded MSLA. "What's the best treatment for this patient?"

Peer review accounts for a significant portion of each MSLA teacher's evaluation score; the remainder is drawn from student-achievement data, including standardized test scores, portfolios of student work, and district and classroom-level benchmark assessments. MSLA is a new school, so the state has not yet released its test-score data, but it is widely considered one of the most exciting reform initiatives in Denver, a city that has seen wave after wave of education upheaval, mostly driven by philanthropists and politicians, not teachers. Alexander Ooms, an education philanthropist and blogger at the website Education News Colorado has written that MSLA "has more potential to change urban education in Denver than any other single effort."

When I visited MSLA in November, the halls were bright and orderly, the students warm and polite, and the teachers enthusiastic -- in other words, MSLA has many of the characteristics of high-performing schools around the world. What sets MSLA apart is its commitment to teaching as a shared endeavor to raise student achievement -- not a competition. During the 2009-2010 school year, all of the school's teachers together pursued the National Board for Professional Teaching Standards' Take One! program, which focuses on using curriculum standards to improve teaching and evaluate student outcomes. This year, the staff-wide initiative is to include literacy skills-building in each and every lesson, whether the subject area is science, art, or social studies.

Don't get me wrong. I think that MSLA is a great model for education reform but it's only one school (so the successes might due to an exceptional leaders like founder Ursetta or principal Nazareno) and it's new (so you have to figure in things like the Hawthorne effect and unsustainable practices).

Unlike many of its competitors, MSLA is based on sound ideas and I'd like to see more schools give these methods a try, but the history of education is filled with promising success stories that didn't pan out. Until a model is replicated and sustained, it should generally be approached with caution.

Conventional wisdom alert -- Who needs a graphing calculator?

Both Mike Croucher and John D. Cook have posts up questioning the use of graphing calculators in mathematics instruction. As Croucher puts it:

If you are into retro-computing then those specs might appeal to you but they leave me cold. They are slow with limited memory and the ‘high-resolution’ display is no such thing. For $100 dollars more than the NSpire CX CAS I could buy a netbook and fill it with cutting edge mathematical software such as Octave, Scilab, SAGE and so on. I could also use it for web browsing,email and a thousand other things.

I (and many students) also have mobile phones with hardware that leave these calculators in the dust. Combined with software such as Spacetime or online services such as Wolfram Alpha, a mobile phone is infinitely more capable than these top of the line graphical calculators.

They also only ever seem to be used in schools and colleges. I spend a lot of time working with engineers, scientists and mathematicians and I hardly ever see a calculator such as the Casio Prizm or TI NSpire on their desks. They tend to have simple calculators for everyday use and will turn to a computer for anything more complicated such as plotting a graph or solving equations.

One argument I hear for using these calculators is ‘They are limited enough to use in exams.‘ Sounds sensible but then I get to thinking ‘Why are we teaching a generation of students to use crippled technology?‘ Why not go the whole hog and ban ALL technology in exams? Alternatively, supply locked down computers for exams that limit the software used by students. Surely we need experts in useful technology, not crippled technology?

The only thing I'd add is the need for spreadsheet literacy. In the past twenty years, I have never seen anyone using a graphing calculator outside of a classroom and I have never come across a job, ranging from the executive to the janitorial, where Excel skills don't come in handy.

When I was teaching I taught all my kids to graph functions on Excel. If I were still teaching high school, I would strongly consider making something like Scilab or Python part of the curriculum (particularly for calculus), I might even consider requiring all students to have at least one programming based math class but I would still insist that all students knew their way around the graphics package of a spreadsheet.

In general though I think we're in general agreement that student should be taught technology that they are actually going to use.

Monday, April 11, 2011

"Do you know what 'synergy' is?"

The orientation for my first corporate job included a one-day ropes course. As an old country boy, I had spent much of boyhood in the tops of trees and had gone on to try some rock-climbing and repelling, so spending a spring day in the woods climbing towers was like being thrown back into the briar patch after four days of paperwork and jargon.

We were not, however, far enough from civilization to completely escape the business-speak. The medium in this case was a cheerful and bouncy instructor ('bouncy' in the literal sense -- she gave her entire memorized spiel springing from spot to spot on the balls of her feet).

With barely contained excitement she described the wonders of synergy, then asked, "Do you know what 'synergy' is?" Before anyone could answer, she continued, "It's centered energy."

[pause for comic effect]

Of the billions of dollars that corporations spend on consulting, training and motivation, a large chunk goes to what can only be described as scams. Collections of buzzwords, pseudo-science, faulty statistics and unsupportable claims that wouldn't fool the greenest mark are sold for astounding sums to CEOs who then make these programs central pillars of corporate culture.

It's difficult to estimate the true costs of these scams. The direct costs are not trivial but they are still probably smaller than what you get when you add up:

1. The opportunity costs of all these MBAs not studying something more useful;

2. The loss in productivity resulting from promotions and layoffs based partly on employees' ability to enthusiastically accept (or pretend to accept) absurd statements and claims;

3. The dangers of group-think (most of these programs emphasize teamwork, shared visions and positive attitude. This does not produce a conducive atmosphere for pointing out that the emperor has no clothes);

4. The influence these theories have had on non-corporate areas like education reform and policy making (where do you think the Bush administration got that make-your-own-reality stuff, not to mention those creepy backrubs?).

Along these lines, I recently came across this thoughtful 2006 article by Matthew Stewart, a former management consultant whose view of the field is, in some ways, darker than mine (thanks to Mike for the link). I've picked out a couple of interesting passages though you should really read the whole thing if you have the time.

The thing that makes modern management theory so painful to read isn’t usually the dearth of reliable empirical data. It’s that maddening papal infallibility. Oh sure, there are a few pearls of insight, and one or two stories about hero-CEOs that can hook you like bad popcorn. But the rest is just inane. Those who looked for the true meaning of “business process re-engineering,” the most overtly Taylorist of recent management fads, were ultimately rewarded with such gems of vacuity as “BPR is taking a blank sheet of paper to your business!” and “BPR means re-thinking everything, everything!”
Each new fad calls attention to one virtue or another—first it’s efficiency, then quality, next it’s customer satisfaction, then supplier satisfaction, then self-satisfaction, and finally, at some point, it’s efficiency all over again. If it’s reminiscent of the kind of toothless wisdom offered in self-help literature, that’s because management theory is mostly a subgenre of self-help. Which isn’t to say it’s completely useless. But just as most people are able to lead fulfilling lives without consulting Deepak Chopra, most managers can probably spare themselves an education in management theory.
...
If you believed our chief of recruiting, the consulting firm I helped to found represented a complete revolution from the Taylorist practices of conventional organizations. Our firm wasn’t about bureaucratic control and robotic efficiency in the pursuit of profit. It was about love.

We were very much of the moment. In the 1990s, the gurus were unanimous in their conviction that the world was about to bring forth an entirely new mode of human cooperation, which they identified variously as the “information-based organization,” the “intellectual holding company,” the “learning organization,” and the “perpetually creative organization.” “R-I-P. Rip, shred, tear, mutilate, destroy that hierarchy,” said über-guru Tom Peters, with characteristic understatement. The “end of bureaucracy” is nigh, wrote Gifford Pinchot of “intrapreneuring” fame. According to all the experts, the enemy of the “new” organization was lurking in every episode of Leave It to Beaver.

Many good things can be said about the “new” organization of the 1990s. And who would want to take a stand against creativity, freedom, empowerment, and—yes, let’s call it by its name—love? One thing that cannot be said of the “new” organization, however, is that it is new.

In 1983, a Harvard Business School professor, Rosabeth Moss Kanter, beat the would-be revolutionaries of the nineties to the punch when she argued that rigid “segmentalist” corporate bureaucracies were in the process of giving way to new “integrative” organizations, which were “informal” and “change-oriented.” But Kanter was just summarizing a view that had currency at least as early as 1961, when Tom Burns and G. M. Stalker published an influential book criticizing the old, “mechanistic” organization and championing the new, “organic” one. In language that eerily anticipated many a dot-com prospectus, they described how innovative firms benefited from “lateral” versus “vertical” information flows, the use of “ad hoc” centers of coordination, and the continuous redefinition of jobs. The “flat” organization was first explicitly celebrated by James C. Worthy, in his study of Sears in the 1940s, and W. B. Given coined the term “bottom-up management” in 1949. And then there was Mary Parker Follett, who in the 1920s attacked “departmentalized” thinking, praised change-oriented and informal structures, and—Rosabeth Moss Kanter fans please take note—advocated the “integrative” organization.
If there was a defining moment in this long and strangely forgetful tradition of “humanist” organization theory—a single case that best explains the meaning of the infinitely repeating whole—it was arguably the work of Professor Elton Mayo of the Harvard Business School in the 1920s. Mayo, an Australian, was everything Taylor was not: sophisticated, educated at the finest institutions, a little distant and effete, and perhaps too familiar with Freudian psychoanalysis for his own good.

A researcher named Homer Hibarger had been testing theories about the effect of workplace illumination on worker productivity. His work, not surprisingly, had been sponsored by a maker of electric lightbulbs. While a group of female workers assembled telephone relays and receiver coils, Homer turned the lights up. Productivity went up. Then he turned the lights down. Productivity still went up! Puzzled, Homer tried a new series of interventions. First, he told the “girls” that they would be entitled to two five-minute breaks every day. Productivity went up. Next it was six breaks a day. Productivity went up again. Then he let them leave an hour early every day. Up again. Free lunches and refreshments. Up! Then Homer cut the breaks, reinstated the old workday, and scrapped the free food. But productivity barely dipped at all.

Mayo, who was brought in to make sense of this, was exultant. His theory: the various interventions in workplace routine were as nothing compared with the new interpersonal dynamics generated by the experimental situation itself. “What actually happened,” he wrote, “was that six individuals became a team and the team gave itself wholeheartedly and spontaneously to cooperation … They felt themselves to be participating, freely and without afterthought, and were happy in the knowledge that they were working without coercion.” The lessons Mayo drew from the experiment are in fact indistinguishable from those championed by the gurus of the nineties: vertical hierarchies based on concepts of rationality and control are bad; flat organizations based on freedom, teamwork, and fluid job definitions are good.

On further scrutiny, however, it turned out that two workers who were deemed early on to be “uncooperative” had been replaced with friendlier women. Even more disturbing, these exceptionally cooperative individuals earned significantly higher wages for their participation in the experiment. Later, in response to his critics, Mayo insisted that something so crude as financial incentives could not possibly explain the miracles he witnessed. That didn’t make his method any more “scientific.”

Mayo’s work sheds light on the dark side of the “humanist” tradition in management theory. There is something undeniably creepy about a clipboard-bearing man hovering around a group of factory women, flicking the lights on and off and dishing out candy bars. All of that humanity—as anyone in my old firm could have told you—was just a more subtle form of bureaucratic control. It was a way of harnessing the workers’ sense of identity and well-being to the goals of the organization, an effort to get each worker to participate in an ever more refined form of her own enslavement.

So why is Mayo’s message constantly recycled and presented as something radically new and liberating? Why does every new management theorist seem to want to outdo Chairman Mao in calling for perpetual havoc on the old order? Very simply, because all economic organizations involve at least some degree of power, and power always pisses people off. That is the human condition. At the end of the day, it isn’t a new world order that the management theorists are after; it’s the sensation of the revolutionary moment. They long for that exhilarating instant when they’re fighting the good fight and imagining a future utopia. What happens after the revolution—civil war and Stalinism being good bets—could not be of less concern.

Sunday, April 10, 2011

Hamsters, fitness landscapes and an excuse for a repost

All Things Considered had an interesting little story today about the origin of the hamster. I was particularly intrigued by this part:

More troubles followed in the lab. There was more hamster cannibalism, and five others escaped from their cage — never to be found. Finally, two of the remaining three hamsters started to breed, an event hailed as a miracle by their frustrated caretakers.

Those Adam-and-Eve hamsters produced 150 offspring, Dunn says, and they started to travel abroad, sent between labs or via the occasional coat pocket. Today, the hamsters you see in pet stores are most likely descendants of Aharoni's litter.

Because these hamsters are so inbred, they typically have heart disease similar to what humans suffer. Dunn says that makes them ideal research models.

This reminded me of a post from almost a year ago on the subject of lab animals. It also reminded me that I still haven't gotten around to the follow-up I had in mind. Maybe all this reminding will translate into some motivating and I'll actually get the next post on the subject written.

In this post I discussed gradient searches and the two great curses of the gradient searcher, small local optima and long, circuitous paths. I also mentioned that by making small changes to the landscape being searched (in other words, perturbing it) we could sometimes (with luck) improve our search metrics without significantly changing the size and location of our optima.

The idea that you can use a search on one landscape to find the optima of a similar landscape is the assumption behind more than just perturbing. It is also the basis of all animal testing of treatments for humans. This brings genotype into the landscape discussion, but not in the way it's normally used.

In evolutionary terms, we look at an animal's genotype as a set of coordinates for a vast genetic landscape where 'height' (the fitness function) represents that animal's fitness. Every species is found on that landscape, each clustering around its own local maximum.

Genotype figures in our research landscape, but instead of being the landscape itself, it becomes part of the fitness function. Here's an overly simplified example that might clear things up:

Consider a combination of two drugs. If we use the dosage of each drug as an axis, this gives us something that looks a lot like our first example with drug A being north/south, drug B being east/west and the effect we're measuring being height. In other words, our fitness function has a domain of all points on our AB plane and a range corresponding to the effectiveness of that dosage. Since we expect genetics to affect the subjects reaction [corrected a small typo here] to the drugs, genotype has to be part of that fitness function. If we ran the test on lab rats we would expect a different result than if we tested it on humans but we would hope that the landscapes would be similar (or else there would be no point in using lab rats).

Scientists who use animal testing are acutely aware of the problems of going from one landscape to another. For each system studied, they have spent a great deal of time and effort looking for the test species that functions most like humans. The idea is that if you could find an animal with, say, a liver that functions almost exactly like a human liver, you could do most of your controlled studies of liver disease on that animal and only use humans for the final stages.

As sound and appealing as that idea is, there is another way of looking at this.

On a sufficiently high level with some important caveats, all research can be looked at as a set of gradient searches over a vast multidimensional landscape. With each study, researchers pick a point on the landscape, gather data in the region then use their findings [another small edit] and those of other researchers to pick their next point.

In this context, important similarities between landscapes fall into two distinct categories: those involving the positions and magnitudes of the optima; and those involving the search properties of the landscape. Every point on the landscape corresponds to four search values: a max; the number of steps it will take to reach that max; a min; and the number of steps it will take to reach that min. Since we usually want to go in one direction (let's say maximizing), we can generally reduce that to two values for each point, optima of interest and time to converge.

All of this leads us to an interesting and somewhat counterintuitive conclusion. When searching on one landscape to find the corresponding optimum of another, we are vitally interested in seeing a high degree of correlation between the size and location of the optima but given that similarity between optima, similarity in search statistics is at best unimportant and at worst a serious problem.

The whole point of repeated perturbing then searching of a landscape is to produce a wide range of search statistics. Since we're only keeping the best one, the more variability the better. (Best here would generally be the one where the global optimum is associated with the largest region though time to converge can also be important.)

The Anti-Hulu Hypothesis

In response to Joseph's comment about the availability of the Big Bang Theory online, I thought I'd mention that, for the moment, CBS is keeping the two most recent episodes up on its site.

CBS has long had the most complicated but (I suspect) best thought-out approach to online viewing, deciding whether or not to provide shows on a case-by-case basis. Of course, 'best thought-out' here is in respect to stockholders, not viewers. The interests of those two groups generally align when you're talking about quality of programming but have a way of diverging when you're talking about revenue streams and cannibalization.

Saturday, April 9, 2011

[OT] Bioware's new Star Wars RPG

Based on a perceptive review by Trollsymth, It seems like the new Star Wars multi-player online game captures the music and atmosphere of Star Wars well. But it focuses on combat as the means to gaining experience. I think that this is a very unfortunate decision as Star Wars has a strong history of using guile in the place of brute force.

I think that this would make for a more interesting game and could have easily been accomplished with goal based experience points. Focusing on killing makes sense for the Sith but Jedi and Rebels would benefit from minimizing casualties.

Health Care and Confusion

One of the most important underlying issues preventing efficient markets for health care is explained in a Dilbert cartoon.

Friday, April 8, 2011

Some perspective

A nice quote from the comments on Worthwhile Canadian Initiative:

I'm sure we can all agree that in any organization the size of the Canadian federal gov't ($175 billion per annum in revenue!), there must be inefficiency somewhere. But the issue at hand is not "is there inefficiency?" but rather "is there $10-12 billion of annual waste?".

I think that it really puts questions into perspective.

And if you don't follow WCI, it really does have some of the best commentary around (right up there with Marginal Revolution at its best)

It's not often epidemiologists get pandered to

A clip for everyone in our target audience except Andrew Gelman.

"Cathie Black's Departure: Totally Predictable. Plus: Meet Dennis Walcott"

Dana Goldstein has a good, pithy post on the shake-up in NYC.

Thursday, April 7, 2011

Another interesting Ryan graph from Krugman

I was listening to Talk of the Nation this evening (admittedly not NPR at its best or even second best) and I was struck by how different (and inferior) what I was hearing was to the wonk debate (Krugman, Thoma, Gelman, et al.). Journalists and pundits are arguing over whether Ryan is bold or extra bold while we should be arguing over the practicality and advisability of a recovery based on another housing boom.

Wednesday, April 6, 2011

Andrew Gelman buries the lede

As Joseph mentioned earlier, Andrew Gelman has a must-read post up at the Monkey Cage. The whole thing is worth checking out but for me the essential point came at the end:

Internal (probabilistic) vs. external (statistical) forecasts

In statistics we talk about two methods of forecasting. An internal forecast is based on a logical model that starts with assumptions and progresses forward to conclusions. To put it in the language of applied statistics: you take x, and you take assumptions about theta, and you take a model g(x,theta) and use it to forecast y. You don't need any data y at all to make this forecast! You might use past y's to fit the model and estimate the thetas and test g, but you don't have to.

In contrast, an external forecast uses past values of x and y to forecast future y. Pure statistics, no substantive knowledge. That's too bad, put the plus side is that it's grounded in data.

A famous example is the space shuttle crash in 1986. Internal models predicted a very low probability of failure (of course! otherwise they wouldn't have sent that teacher along on the mission). Simple external models said that in about 100 previous launches, 2 had failed, yielding a simple estimate of 2%.

We have argued, in the context of election forecasting, that the best approach is to combine internal and external approaches.

Based on the plausibility analysis above, the Beach et al. forecast seems to me to be purely internal. It's great that they're using real economic knowledge, but as a statistician I can see what happens whey your forecast is not grounded in the data. Short-term, I suggest they calibrate their forecasts by applying them to old data to forecast the past (this is the usual approach). Long-term, I suggest they study the problems with their forecasts and use these flaws to improve their model.

When a model makes bad predictions, that's an opportunity to do better.

All too often, we treat models like the ancient Greeks might have treated the Oracle of Delphi, an ultimate and unknowable authority. If we're going to use models in our debates, we also need to talk about where they come from, what assumptions go into them, how range-of-data concerns might affect them.