(Back from two weeks in Milan, where I attended the 2015 IAAE conference, visited Expo 2015, took some time off, and saw friends I had not seen in a long time. This week’s ‘Metrics Monday is a bit different in that it is not so much about econometrics, but about the consumers of results generated by econometrics, and the need for better statistical education at an early age.)
In a conversation on this blog’s Facebook page a few weeks ago about a new working paper, a friend asked “Does this support [X] or not?” Given that the paper under discussion looked at a number of outcomes and presented a mixed bag of results, and given that the results were not causally identified, I responded: “There is no simple answer to that question. There is a little bit of everything for everyone here. Read the limitations section, too.” My friend then said she had done that but was still confused, and that “We all know it’s much easier to lie with statistics than tell the truth.”
Is it? Or is it just much easier to be misled by statistics than it is to lie with them?
What I mean by this is that we live in an age where we are increasingly being confronted with statistical data in the stories that we are told. When I worked as a reporter in the mid 90s, no one had ever heard of data-driven journalism, yet that is increasingly what editors ask from reporters at respectable media outlets, and I sat through job talks for a position focusing on exactly that area when I taught at a policy school. And the phenomenon shows no sign of going away: with the sharply decreasing costs of data collection and the consequent rise of big data, more and more of the information we will be presented with daily will have some kind of statistical component.
This means that education systems the world over will have to adjust in order to teach statistical literacy earlier in their respective curricula. The first time I was exposed to the notion of probability, I was in the eighth grade. The first time I took an actual statistics course, I was in college. And the first time I actually learned causal inference, I was an assistant professor–up until that point, my econometrics training had largely been about properly modeling data-generating processes, and whatever discussion of causation there was simply stated “Now remember, those parameters don’t indicate causal relationships; correlation is not causation,” and stopped short of a discussion of what was required to identify causal relationships.
I am convinced that it is possible to introduce these notions at an intuitive level (i.e., without the use of math) at an early age, by providing a number of examples. It should be relatively simple, for example, to adapt the Linda problem in order to teach its lesson to elementary school-aged children. If you’re not familiar with the Linda problem, it goes as follows:
Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations … The respondents are asked to rank in order of likelihood various scenarios: Linda is (1) an elementary school teacher, (2) active in the feminist movement, (3) a bank teller, (4) an insurance salesperson, or (5) a bank teller also active in the feminist movement … The remarkable finding is that (now generations of) respondents deem scenario (5) more likely than scenario (3), even though (5) is a special case of (3). The finding thus violates the most basic laws of probability theory. Not only do many students get the Linda problem wrong, but some object, sometimes passionately, after the correct answer is explained.
Likewise, it should be relatively simple to convince children that correlation is not causation, that in order to be reasonably certain that an estimated relationship is causal certain conditions have to be met, and to teach them that a lot of the relationships they are presented with should be confronted with the gold standard of a randomized experiment.
Given that (i) the role of any education system should be to form responsible citizens, (ii) critical thinking skills are a prerequisite for responsible citizenship, and (iii) the ability not to be hoodwinked by the statistics one is presented with is a core component of critical thinking, education has to be reformed to teach those things earlier.
Some people might say that the state already does too much, and that we should not expect it to teach statistical literacy on top of everything else. Sadly, this is not one of those areas where we should just let people choose to educate themselves; how often do people actually willingly choose to learn some math-related skill on their own, in their spare time? I think there is a clear case to be made that statistical literacy generates a positive externality (more responsible government via politicians who can less easily get away with fallacies and misleading statistics, for one), and so it is fully within the purview of the state to pay for it.
In my statement of teaching philosophy, which I have used for various purposes over the years, I wrote:
The core of my teaching philosophy is my belief in the important role college plays in forming responsible citizens. … I often tell my students that one the most important critical thinking skills–if not the most important such skill–is the ability to question the causal statements one is presented with. … We often hear it said that correlation is not causation, but my goal is to get students to understand that knowing whether X actually causes Y is difficult and requires a great deal of thinking.
The only thing I have changed my mind about in recent years is the age at which those things should be taught.