We’re all familiar with the critiques of standardized tests and other common measures used for high-stakes decisions. Recently, somebody in my circle has started going on about measures of “grit” and their predictive power. I am willing to believe that “grit” is an excellent predictor of all sorts of things. But I wonder if much of the predictive power of “grit” comes from the fact that these measures are currently low-stakes, so people have few incentives to game them.I really think that this is the heart of the measurement problem. Insofar as there is a way to do better on a test, in a way that is less work than just be really good at it, then it is probable that much of your signal will be gaming. Studying the form of the question, for example, is likely to improve performance (by less confusion, if nothing else) but access to these approaches may vary by context.
Even worse, some of the test prep may have nothing to do with the underlying measure. So the score starts to measure things like "willingness to sacrifice learning time for test prep time".
This is a very good insight and likely to be eternally problematic in education.