From Scientific American 1896
Comments, observations and thoughts from two bloggers on applied statistics, higher education and epidemiology. Joseph is an associate professor. Mark is a professional statistician and former math teacher.
Wednesday, November 7, 2018
Tuesday, November 6, 2018
Monday, November 5, 2018
This nearly century and a quarter old discussion about rapid transit has a remarkably contemporary feel to it, starting with the phrase "rapid transit."
I always assumed it was a 20th Century term, but...
It's this paragraph, however, that struck me as particularly modern:
It's this paragraph, however, that struck me as particularly modern:
Friday, November 2, 2018
You should be concerned about the quality of the polls, but it's likely voter models that should worry you the most.
I've been meaning to do a good, substantial, well reasoned piece on fundamental misunderstandings about political polling. This is not that post. Things have been, let us say, busy of late and I don't have time to get this right, but I do need to get it written. I really want to get this one out in the first five days of November.
So here's the short version.
When the vast majority of journalists (even most data journalists) talk about polls being wrong, they tend to screw up the discussion on at least two levels. First because they do not grasp the distinction between data and model and second because they don't understand how either is likely to go kerplooie (okay, how would you spell it?).
The term "polls of registered voters" describes more or less raw data. A complete and detailed discussion would at this point mention weighting, stratification, and other topics but – – as previously mentioned – – this is not one of those discussions. For now, we will treat those numbers you see in the paper as summary statistics of the data.
Of course, lots of things can go wrong in the collecting. Sadly, most journalists are only aware of the least worrisome issue, sampling error. Far more troubling are inaccurate/dishonest responses and, even more importantly, nonrepresentative samples (a topic we have looked into at some depth earlier). For most reporters, "inside the margin of error" translates to "revealed word of God" and when this misunderstanding leads to disaster, they conclude that "the polls were wrong."
The term "likely voter" brings in an entirely different concept, one which is generally even less well understood by the people covering it because now we are talking not just about data, but about models. [Quick caveat: all of my experience with survey data and response models has been on the corporate side. I'm working under the assumption that the same basic approaches are being used here, but you should always consult your physician or political scientist before embarking on prognostications of your own.]
First off, it's worth noting that the very designation of "likely" is arbitrary. A model has been produced that attempts to predict the likelihood that a given individual will vote in an upcoming election, but the cut off between likely and unlikely is simply a number that the people in the field decided was reasonable. There's nothing scientific, let alone magical about it.
Far more important, particularly in the upcoming election, is the idea of range of data. Certain concepts somehow managed to be both painfully obvious and frequently forgotten. Perhaps the best example in statistics is that a model only describes the relationships found in the sample. When we try to extrapolate beyond the range of data, we can only hope that the relationships will continue to hold.
By their very nature, this is always a problem with predictive modeling, but it becomes a reason for skepticism bordering on panic when the variables you included in or perhaps more to the point, left out of your model start taking on values far in excess of anything you saw on the sample. 2018 appears to be a perfect example.
Will the relationships we've seen in the past hold? If not, will the shift favor the Democrats? The Republicans? Or will the relationships break down in such a way that they cancel each other out? I have no intention of speculating. What I am saying is that we are currently so far out of the range of data on so many factors that I'm not sure it makes sense to talk about likely voters at all.
So here's the short version.
When the vast majority of journalists (even most data journalists) talk about polls being wrong, they tend to screw up the discussion on at least two levels. First because they do not grasp the distinction between data and model and second because they don't understand how either is likely to go kerplooie (okay, how would you spell it?).
The term "polls of registered voters" describes more or less raw data. A complete and detailed discussion would at this point mention weighting, stratification, and other topics but – – as previously mentioned – – this is not one of those discussions. For now, we will treat those numbers you see in the paper as summary statistics of the data.
Of course, lots of things can go wrong in the collecting. Sadly, most journalists are only aware of the least worrisome issue, sampling error. Far more troubling are inaccurate/dishonest responses and, even more importantly, nonrepresentative samples (a topic we have looked into at some depth earlier). For most reporters, "inside the margin of error" translates to "revealed word of God" and when this misunderstanding leads to disaster, they conclude that "the polls were wrong."
The term "likely voter" brings in an entirely different concept, one which is generally even less well understood by the people covering it because now we are talking not just about data, but about models. [Quick caveat: all of my experience with survey data and response models has been on the corporate side. I'm working under the assumption that the same basic approaches are being used here, but you should always consult your physician or political scientist before embarking on prognostications of your own.]
First off, it's worth noting that the very designation of "likely" is arbitrary. A model has been produced that attempts to predict the likelihood that a given individual will vote in an upcoming election, but the cut off between likely and unlikely is simply a number that the people in the field decided was reasonable. There's nothing scientific, let alone magical about it.
Far more important, particularly in the upcoming election, is the idea of range of data. Certain concepts somehow managed to be both painfully obvious and frequently forgotten. Perhaps the best example in statistics is that a model only describes the relationships found in the sample. When we try to extrapolate beyond the range of data, we can only hope that the relationships will continue to hold.
By their very nature, this is always a problem with predictive modeling, but it becomes a reason for skepticism bordering on panic when the variables you included in or perhaps more to the point, left out of your model start taking on values far in excess of anything you saw on the sample. 2018 appears to be a perfect example.
Will the relationships we've seen in the past hold? If not, will the shift favor the Democrats? The Republicans? Or will the relationships break down in such a way that they cancel each other out? I have no intention of speculating. What I am saying is that we are currently so far out of the range of data on so many factors that I'm not sure it makes sense to talk about likely voters at all.
Thursday, November 1, 2018
Our regular repost on drinking from the wrong pipe
From Josh Marshall:
All of the conspiracy theories around the caravan, particularly those involving George Soros and voter fraud, have a weird underwear gnomes quality to them. They make emotional sense for those deep in the conservative media bubble, but there's no way to make any kind of plausible argument for any of them.
It can be useful for the Republican Party if certain segments of the population believe these fantasies, even disseminate them as long as the discussion remains far enough on the recognized fringe to allow party leaders plausible deniability. It is not useful to have ranking politicians and influential conservative voices saying these things out loud on what are supposed to be respectable outlets.
Or as we said exactly two years ago...
I know we've been through all of this stuff about Leo Strauss and the conservative movement before so I'm not going to drag this out into great detail except to reiterate that if you want to have a functional institution that makes extensive use of internal misinformation, you have to make sure things move in the right direction.
With misinformation systems as with plumbing, when the flow starts going the wrong way, the results are seldom pretty. This has been a problem for the GOP for at least a few years now. A number of people in positions of authority, (particularly in the tea party wing) have bought into notions that were probably intended simply to keep the cannon-fodder happy. This may also partly explain the internal polling fiasco at the Romney campaign.
As always, though, it is Trump who takes things to a new level. We now have a Republican nominee who uses the fringier parts of the Twitter verse as briefings.
From Josh Marshall:
I managed to involve myself this weekend in a tiny eddy in the storm around the Pittsburgh synagogue massacre. As you can see below, early yesterday evening I happened upon this interview on Lou Dobbs’ Fox Business News show in which a guest, Chris Farrell, claimed the migrant caravan in southern Mexico was being funded and directed by the “Soros-occupied State Department.” This is, as I explained, straight out of The Protocols of the Elders of Zion, the foundational anti-Semitic tract, first circulated and perhaps authored by the Czarist secret police in the first years of the 20th century.
If you’re not familiar with this world, “ZOG” is a staple of white supremacist and neo-Nazi literature and websites. It stands for “Zionist Occupied Government” and is a shorthand for the belief that Jews secretly control the US government. Chris Farrell’s phrasing was no accident. All of this is straight out of the most rancid anti-Semitic propaganda. Rob Bowers, the shooter in the Pittsburgh massacre, appears to have been specifically inspired by this conspiracy theory. Indeed, Bowers had also reposted references to “ZOG” on his social media accounts.
All of the conspiracy theories around the caravan, particularly those involving George Soros and voter fraud, have a weird underwear gnomes quality to them. They make emotional sense for those deep in the conservative media bubble, but there's no way to make any kind of plausible argument for any of them.
It can be useful for the Republican Party if certain segments of the population believe these fantasies, even disseminate them as long as the discussion remains far enough on the recognized fringe to allow party leaders plausible deniability. It is not useful to have ranking politicians and influential conservative voices saying these things out loud on what are supposed to be respectable outlets.
Or as we said exactly two years ago...
Tuesday, November 1, 2016
In retrospect, it's surprising we don't use more sewage metaphors
A few stray thoughts on the proper flow of information (and misinformation) and a functional organization.I know we've been through all of this stuff about Leo Strauss and the conservative movement before so I'm not going to drag this out into great detail except to reiterate that if you want to have a functional institution that makes extensive use of internal misinformation, you have to make sure things move in the right direction.
With misinformation systems as with plumbing, when the flow starts going the wrong way, the results are seldom pretty. This has been a problem for the GOP for at least a few years now. A number of people in positions of authority, (particularly in the tea party wing) have bought into notions that were probably intended simply to keep the cannon-fodder happy. This may also partly explain the internal polling fiasco at the Romney campaign.
As always, though, it is Trump who takes things to a new level. We now have a Republican nominee who uses the fringier parts of the Twitter verse as briefings.
From Josh Marshall:
Here's what he said ...
Wikileaks also shows how John Podesta rigged the polls by oversampling democrats, a voter suppression technique. That's happening to me all the time. When the polls are even, when they leave them alone and do them properly, I'm leading. But you see these polls where they're polling democrats. How is Trump doing? Oh, he's down. They're polling democrats. The system is corrupt, rigged and broken. And we're going to change it. [ Cheers and applause ]Now this immediately this grabbed my attention because over the weekend I was flabbergasted to see this tweet being shared around the Trumposphere on Twitter.
Thank you, thank you. In an e-mail podesta says he wants oversamples for our polling in order to maximize what we get out of our media polling. It's called voter suppression because people will say, oh, gee, Trump's down. Folks, we're winning. We're winning. We're winning. These thieves and crook, the immediate, yeah not all of it, not all of it, but much of it -- they're the most crooked -- they're almost as crooked as Hillary. They may even be more crooked than Hillary because without the media, she would be nothing.
I don't know who Taylor Egly is. But he has 250,000 followers - so he has a big megaphone on Twitter. This tweet and this new meme is a bracing example of just how many of the "scoops" from the Podesta emails are based on people simply not knowing what words mean.Todays Wikileaks dump revealed the DNC works w/ pollsters to skew polls in their favor by over-polling Democrats & under-polling Republicans pic.twitter.com/tVA8K6n79T— Taylor Egly (@TaylorEgly) October 24, 2016
Trump had already mentioned 'over-sampling' earlier. But here he's tying it specifically to the Podesta emails released by Wikileaks. This tweet above is unquestionably what he's referring to.
There are several levels of nonsense here. Let me try to run through them.
...
More importantly, what Tom Matzzie is talking about is the campaign/DNC's own polls. Campaigns do extensive, very high quality polling to understand the state of the race and devise strategies for winning. These are not public polls. So they can't affect media polls and they can't have anything to do with voter suppression.
Now you may be asking, why would the Democrats skew their own internal polls? Well, they're not.
The biggest thing here is what the word 'oversampling' means. Both public and private pollsters will often over-sample a particular demographic group to get statistically significant data on that group.
... You need to get an 'over-sample' to get solid numbers.
Whether it's public or private pollsters, the 'over-sample' is never included in the 'topline' number. So if you get 4 times the number of African-American voters as you got in a regular sample, those numbers don't all go into the mix for the total poll. They're segmented out. The whole thing basically amounts to zooming in on one group to find out more about them. To do so, to zoom in, you need to 'over-sample' their group as what amounts to a break-out portion of the poll.
What it all comes down to is that you're talking about a polling concept the Trumpers don't seem to understand (or are relying on supporters not understanding), about polls that are by definition secret (campaign polls aren't shared) and about an election eight years ago.
Wednesday, October 31, 2018
Tuesday, October 30, 2018
We can't finish October without playing this at least once
The Danse Macabre (from the French language), also called the Dance of Death, is an artistic genre of allegory of the Late Middle Ages on the universality of death: no matter one's station in life, the Dance Macabre unites all.
The Danse Macabre consists of the dead or a personification of death summoning representatives from all walks of life to dance along to the grave, typically with a pope, emperor, king, child, and laborer. They were produced as mementos mori, to remind people of the fragility of their lives and how vain were the glories of earthly life. Its origins are postulated from illustrated sermon texts; the earliest recorded visual scheme was a now-lost mural at Holy Innocents' Cemetery in Paris dating from 1424 to 1425.
Monday, October 29, 2018
I think it's important to define subsidized journalism more broadly than just native advertising.
[This started out as a reply to Andrew Gelman's post "The Axios Turing test and the heat death of the journalistic universe," but when you break 500 words...]
First, there is the age-old problem of advertisers rewarding/punishing publications. If memory serves, Politico back in the Mike Allen days was notorious for questionable ethics along these lines.
Far more subtle and dangerous is the quid pro quo associated with access. While most news organizations have rules in place to prevent editorial influence from advertisers and generally avoid even the appearance of impropriety, giving favorable coverage to sources (often even to the extent of letting them set the narrative and distort the facts) is so widespread that many journalists don't even see the ethical problem. Politico and Axios both have bad reputations in this regard, but the worst offender may well be the New York Times.
This is both a carrot and a stick process. When Disney found itself the target of an LA Times exposé about its dealings with the city of Anaheim, they responded by publicly cutting off the paper's access to the studio's talent, even though that was an entirely different department of the paper dealing with entirely different Disney business lines. The good news is that, now that Disney has become by far the biggest and most powerful entertainment company in the world with the acquisition of Fox, I'm sure they will be much less inclined to abuse their position.
Particularly in fields such as entertainment, companies often go beyond merely providing journalists with the raw material and actually provide the stories themselves. Sometimes this is done by sending out press releases that can be repackaged as features with a minimum of work. Other times you have what can only be described as ghostwriting. Someone with the studio or one of the PR firms it employs will send a reporter an email about an upcoming project. The next day it will appear almost verbatim under the reporter's byline.
In addition to supplying money and content, companies can also provide an even more valuable service as promotion partners. That exclusive you just published about a new movie will get far more traffic because of the hundred million dollars the studio has spent on marketing the project, not to mention the effects of the SEO push and the social media blitz. If you write a story that the studios want to promote, you can literally see millions of PR dollars spent on getting you eyeballs.
While none of this is by any means new, Netflix and to a lesser extent the other streaming services have pushed things to a new level, spending billions on marketing and PR, even green lighting hundred million dollar documentaries and art-house projects for no other apparent reason than that they will generate lots of coverage and might snag a few awards. This unprecedented amount of money has distorted the narrative to such an extent that it is impossible to gauge the cultural impact or commercial viability of Netflix, but most of the journalists covering this story (virtually all of those on the East Coast) remain oblivious to this aspect, perhaps because ignorance of this particular detail makes their lives much easier.
First, there is the age-old problem of advertisers rewarding/punishing publications. If memory serves, Politico back in the Mike Allen days was notorious for questionable ethics along these lines.
Far more subtle and dangerous is the quid pro quo associated with access. While most news organizations have rules in place to prevent editorial influence from advertisers and generally avoid even the appearance of impropriety, giving favorable coverage to sources (often even to the extent of letting them set the narrative and distort the facts) is so widespread that many journalists don't even see the ethical problem. Politico and Axios both have bad reputations in this regard, but the worst offender may well be the New York Times.
This is both a carrot and a stick process. When Disney found itself the target of an LA Times exposé about its dealings with the city of Anaheim, they responded by publicly cutting off the paper's access to the studio's talent, even though that was an entirely different department of the paper dealing with entirely different Disney business lines. The good news is that, now that Disney has become by far the biggest and most powerful entertainment company in the world with the acquisition of Fox, I'm sure they will be much less inclined to abuse their position.
Particularly in fields such as entertainment, companies often go beyond merely providing journalists with the raw material and actually provide the stories themselves. Sometimes this is done by sending out press releases that can be repackaged as features with a minimum of work. Other times you have what can only be described as ghostwriting. Someone with the studio or one of the PR firms it employs will send a reporter an email about an upcoming project. The next day it will appear almost verbatim under the reporter's byline.
In addition to supplying money and content, companies can also provide an even more valuable service as promotion partners. That exclusive you just published about a new movie will get far more traffic because of the hundred million dollars the studio has spent on marketing the project, not to mention the effects of the SEO push and the social media blitz. If you write a story that the studios want to promote, you can literally see millions of PR dollars spent on getting you eyeballs.
While none of this is by any means new, Netflix and to a lesser extent the other streaming services have pushed things to a new level, spending billions on marketing and PR, even green lighting hundred million dollar documentaries and art-house projects for no other apparent reason than that they will generate lots of coverage and might snag a few awards. This unprecedented amount of money has distorted the narrative to such an extent that it is impossible to gauge the cultural impact or commercial viability of Netflix, but most of the journalists covering this story (virtually all of those on the East Coast) remain oblivious to this aspect, perhaps because ignorance of this particular detail makes their lives much easier.
Friday, October 26, 2018
More spooky stuff (In no way chosen as filler because I had a busy week)
Who better than Goldsmith and Hermann to send us off.
Thursday, October 25, 2018
A Mercury Theatre Halloween
[repost]
The debut production of the Mercury Theatre of the Air, Dracula.
And, of course, the Mercury production of War of the Worlds.
While we're at it, here's a tour de force from Welles' favorite, Agnes Moorehead (don't let the corny intro turn you off) Sorry, Wrong Number.
The debut production of the Mercury Theatre of the Air, Dracula.
And, of course, the Mercury production of War of the Worlds.
While we're at it, here's a tour de force from Welles' favorite, Agnes Moorehead (don't let the corny intro turn you off) Sorry, Wrong Number.
Wednesday, October 24, 2018
We'll largely skip over the author's fixation on casual nudity.
That said, it's a mistake to ignore them entirely, both because of the close relationship between the scientific and the science-fiction community, and because of the influence SF has played on the way we think about technology today either directly through books, film, television and indirectly through writers who alternated between science fact and science fiction (Asimov, Clarke, and to a degree, Willy Ley and Carl Sagan).
This mid century essay by Robert Heinlein on his predictions for the year 2000 is worth a look for a number of reasons. First, people did tend to take the man seriously in the postwar era. Though his standing has arguably declined somewhat at least relative to contemporaries like Asimov, at his peak, he was the best known and best respected hard science fiction writer among mainstream audiences. (Bradberry also had significant mainstream following, but even when writing about spaceships and aliens, his work tended to fall more in the category of fantasy).
Second, this essay is of particular value because the author not only makes a great number of detailed predictions (including a notable amount of time spent on the appeal of socially acceptable nudity), he also explicitly spells out the assumptions that underlie much of the period's attitudes toward the future. He even states his axioms and provides a handy graph of human advancement.
As a serious attempt at describing the rate of progress, this picture is fatally flawed. The year 1900 came at the end of a huge technological and scientific spike. Extending it back a couple of decades would have completely thrown off the curve. (Interestingly, you actually can justify an exponential curve describing progress in the 19th century.) Furthermore, it is difficult to argue a steady acceleration from the naughts to the teens, the teens to the 20s, and the 20s to the 30s.
This graph, however, is tremendously revealing when it comes to the ways people in the 1950s thought about progress. Like the end of the 19th century, the postwar era was a period when conditions lined up to cause a number of very steep S curves to cluster together. The result was a time of explosive, ubiquitous change. There was also, as mentioned before, a tendency to look at the two world wars and the interval between (particularly the Great Depression) as anomalous. It was natural for people in the postwar era to see themselves as living on an exponential slope that was on the verge of shooting past the comprehensible.
Tuesday, October 23, 2018
Cult of the CEO
This is Joseph
This is a revealing symptom of the cult of the CEO:
Irreplaceable men often beget disasters. Great leaders, like Napoleon, often get cocky and make grave mistakes that end up costing a great deal despite the attributes that made them successful for a long time.
I think that this line of thinking isn't ideal.
This is a revealing symptom of the cult of the CEO:
If you have a CEO this dead to rights on securities fraud, why let him continue as CEO? According to the SEC, Musk was indispensable. In a statement, SEC Chair Jay Clayton said “holding individuals accountable is important and an effective means of deterrence,” but that he must take the interests of investors into account, and “the skills and support of certain individuals may be important to the future success of a company.”One of the most pernicious myths is that of the irreplaceable man. We know that nobody is really irreplaceable, because in the end we are all replaced by the natural force of mortality. But it is a terrible sign of a society when it sees the importance of a person to a business enterprise as an excuse for leniency for poor conduct. The pressure to cheat to reach the top has to be high because the stakes are so incredibly meaningful in terms of wealth and status. That suggests more scrutiny, not less.
Irreplaceable men often beget disasters. Great leaders, like Napoleon, often get cocky and make grave mistakes that end up costing a great deal despite the attributes that made them successful for a long time.
I think that this line of thinking isn't ideal.
Monday, October 22, 2018
It is always useful to go back and read the contemporary accounts.
One of the nails I've pounded flush to the board recently (apologies to long-suffering regular readers) is that much of the standard 21st-century narrative of technology consists of things that were at best sometimes true in the past and are almost entirely false now. The best example is probably the idea that the advances of old invariably came as a thief in the night with almost no one imagining the magnitude of their impact and what now seem obvious applications going undiscovered for years.
There are, of course, technological developments that caught people off guard or that moved in unexpected directions, but in most cases, if you go back and read early speculations about the potential of breakthrough technologies in the late 19th/early 20th centuries or the postwar era, you'll generally find that people had a pretty good sense of what was likely to come.
The same can be said for the dawn of the personal computing era
.
Friday, October 19, 2018
In case the aerospace allusions are getting a bit obscure, here's a week in video recommendation.
The Mouse on the Moon is an easy film to overlook. Between Peter Sellers' spectacular turn in the Mouse that Roared and the general tendency of the time to look at sequels as second-class cinematic citizens, particularly when none of the original stars made a return appearance), it is easy to think of the 1963 film as "the other one."
That's too bad, because the second film can easily hold its own. It's sharp and funny and like its predecessor. Both get 3 1/2 stars in the Leonard Maltin guide. What's more, it features the direction of a young Richard Lester just before he broke through with Hard Days Night.
From Wikipedia:
The Mouse on the Moon is a 1963 British comedy film, the sequel to The Mouse That Roared. It is an adaptation of the 1962 novel The Mouse on the Moon by Irish author Leonard Wibberley, and was directed by Richard Lester. In it, the people of the Duchy of Grand Fenwick, a microstate in Europe, attempt space flight using wine as a propellant. It satirises the space race, Cold War and politics.
Thursday, October 18, 2018
Continuing the visionary aerospace thread, there's almost a "mouse on the moon" quality to India's avatar.
No disrespect meant for India here. Quite the opposite. I think there's long been a tendency to underestimate the country and its extraordinary intellectual capital. Here's one of the projects I would definitely keep an eye on.
From Wikipedia:
:
The idea is to develop a spaceplane vehicle that can take off from conventional airfields. Its liquid air cycle engine would collect air in the atmosphere on the way up, liquefy it, separate oxygen and store it on board for subsequent flight beyond the atmosphere. The Avatar, a reusable launch vehicle, was first announced in May 1998 at the Aero India 98 exhibition held at Bangalore.Avatar seems to have, if you'll pardon the metaphor, stalled out (recent tests don't seem to involve any of the really cutting-edge stuff). It could be that the technology actually has hit a wall. That would hardly be surprising for something this ambitious. There's another possibility, however, that is both more encouraging and depressing at the same time, namely that it simply hasn't gotten the funding it needs. Depressing because that would mean we have unnecessarily delayed important advances. Encouraging because it suggests that we still might get this plane flying.
There's a lot of money floating around out there in the vanity aerospace industry and it would be nice to see it go to something ambitious and important. With all due respect to the recently departed, if Paul Allen had taken the money spent on 60 year old visions of space travel and poured it into something forward thinking, his greater legacy might've been what he did after Microsoft.
Subscribe to:
Posts (Atom)