West Coast Stat Views (on Observational Epidemiology and more): Examining the rope –

Friday, October 3, 2014

Examining the rope – – Rotten Tomatoes edition

[You can find the origin of the metaphor here]

Our last Rotten Tomatoes post may have come out a little harsher than I intended. I probably focused too much on the specific glitch and not enough on the larger point, namely that metrics almost never entirely capture what they claim to. Identifying and fixing problems is important, but we also have to acknowledge our imitations.

If we are stuck with imperfections then we will just have to learn to live with them. A big part of that is trying to figure out when our metrics can be relied upon and when they are likely to blowup in our faces.

Let's take Rotten Tomatoes for example. In many ways, the website provides an excellent tool for quantitatively measuring the critical reaction to a movie. It is broad-based, consistent, and as objective as we can reasonably hope for.

But is it the best possible measure in all conceivable circumstances? If not, when does it break down?

When you see a 60% fresh rating that means that 60% of the reviews examined were considered positive. You will notice that is a binary variable. The most enthusiastic of reviews is put in the same category as the mildly favorable. The inevitable result is that sometimes a film will rank lower on this binary average then it would have on a straight average of star rankings.

Just to be clear, there are some definite advantages to this yes/no approach. As anyone who has dealt with satisfaction scales knows, you can get into all sorts of trouble making interval assumptions about that one through five.

Can knowing their binary foundation help us make better use of the Rotten Tomatoes scores?

If we can make certain assumptions about the distribution of scores, we can tell a lot about which films are likely to be favored. Keep in mind that a good review counts the same as a great one. Therefore a film that is liked by everybody will do better than a film that is loved by most but leaves a few indifferent or hostile.

Without getting into relative merits (all are great films), consider Philadelphia Story and the big three from Martin Scorsese, Taxi Driver/Raging Bull/Goodfellas. By many measures, such as the influential Sight & Sound poll (according to Ebert "by far the most respected of the countless polls of great movies--the only one most serious movie people take seriously."), all three Scorsese pictures are among the most critically hailed movies ever. All three have very good scores on the "Tomatometer" but none have a perfect score. The same goes for films like Bonnie and Clyde, The Magnificent Ambersons, and Bicycle Thieves.

Philadelphia Story, on the other hand, is much less likely to get nominated as greatest film ever, but it is a movie that virtually everyone likes. It's an excellent film, skillfully directed, starring three of the most charming actors ever to come out of Hollywood. Not surprisingly, it has a perfect score on Rotten Tomatoes.

This is not to say that Sight & Sound is better than Rotten Tomatoes. Every scoring system is arbitrary, sometimes plays favorites and never exactly captures what we expect it to measure. The lesson here is that, if you want to use a metric in an argument, you need to know how that metric was derived and what its strengths and weaknesses. You can't find a perfect metric but you can have a pretty good idea where the imperfections are.

West Coast Stat Views (on Observational Epidemiology and more)

Friday, October 3, 2014

Examining the rope – – Rotten Tomatoes edition

No comments:

Post a Comment