Skip to content

‘Metrics Monday: Hypothesis Testing in Theory and in Practice

Last updated on September 6, 2015

Suppose you want to study the demand for a given good. If you want your work to be grounded in theory, you probably want to start with primitives. That is, you will want to start with (i) consumer preferences, as represented by a utility function U(.) defined over the consumption x of the good, (ii) the price p of the good whose demand you want to study (for ease of notation, I am ignoring the prices of other goods, whether they are substitutes or complements), and (ii) consumer income w.

With that information, you can then maximize the consumer’s utility U(x) by choosing x such that px = w (the constraint will hold with equality if you assume that the consumer’s preferences are monotonic, i.e., consumers derive greater well-being for greater amounts of x). This yields x(p,w), the consumer’s Marshallian demand (some prefer to call it a Walrasian demand) for the good whose demand you are studying when price is equal to p and income is equal to w. From x(p,w), you can calculate how consumer demand changes as price increases or as income increases, which you would respectively denote dx/dp and dx/dw. (Yes, I am abusing notation by using d to denote partial derivatives; bear with me.)

Hypotheses in Theory

Suppose you want to test that the Law of Demand holds. That is, you want to test that the demand curve is downward sloping, i.e., as the price of a good increases, the quantity demand of that good decreases, everything else equal. So you hypothesize that dx/dp < 0. Or maybe you want to test the hypothesis that the good you are studying is a normal good. That is, you want to test the hypothesis that as consumers get wealthier, the quantity demanded of that good increases. So you hypothesize that dx/dw > 0.

Hypotheses in Practice

To test the hypotheses above, you would want to randomly select a sample of consumers and collect information on (i) how much of the good each consumer purchases, (ii) at what price, and (iii) what is each consumer’s income. Assuming there is enough variation in the price at which the good is purchased, you could estimate

x = a + bp + cw + e,

wherein b and c would respectively be estimates of dx/dp and dx/dw. (Again, bear with me as I assume away all kinds of issues which normally arise in empirical work.)

How would you then go about testing your theoretical hypotheses? Let’s focus only on dx/dp, since the reasoning in what follows is the same for dx/dw, but with the signs flipped. To test that the Law of Demand holds in your data in the usual way, you would specify the null hypothesis H0: b = 0 versus the alternative hypothesis HA: b ≠ 0.

But notice the difference between how your theoretical and statistical hypotheses are specified: In the theoretical case, the hypothesis you want to test assumes that a relationship is of a specific sign, i.e., dx/dp < 0. In the statistical case, the hypothesis you want to test assumes that there is no relationship, i.e., b = 0.

Because dx/dp = b, the discrepancy lies in the fact that in one case, you hypothesize a negative relationship; in the other case, you hypothesize that the same relationship is zero.

(Note: This discussion is about exact hypotheses. The case of inexact hypotheses—which encompass one-sided tests, but generally cover hypotheses about ranges of values—is obviously different, but one-sided tests are rarely conducted in empirical economics, most likely because they tend to over-reject the null compared to two-sided tests, because we are all the intellectual heirs of Ronald Fisher, or both.)

Is the discrepancy between theoretical and statistical hypothesis testing only prima facie? I’m not sure. Indeed, for a hypothesis to be scientific, it has to be falsifiable. That is, it has to be the case that it can be rejected on the basis of data. Both the theoretical and statistical hypotheses above are falsifiable: For the theoretical hypothesis above, a rejection would entail dx/dp ≥ 0; for the statistical hypothesis, a rejection would entail b < 0 or b > 0. To recapitulate by mixing the theoretical and the statistical together:

  1. Finding that b = 0 would be a failure to reject the null (statistical) hypothesis, but a rejection of the theoretical hypothesis.
  2. Finding that b < 0 would be a rejection of the null (statistical) hypothesis, but support in favor of the theoretical hypothesis, and
  3. Finding that b > 0 would be a rejection of both the null (statistical) hypothesis and of the theoretical hypothesis.

But Case 1 does not really constitute a rejection of the theoretical hypothesis, simply because depending on the level of confidence of your test, you would expect to fail to reject the null in 90, 95, or 99 percent of cases, and so such “null results” are not very convincing.

Case 3 is probably the clearest case for rejection of a hypothesis broadly defined, since both the theoretical and statistical directions of the test agree and go against the theoretical hypothesis.

Case 2 is what most people are after. A rejection of the statistical hypothesis is unlikely to be due to change (i.e., depending on the level of confidence of your test, it would only be due to chance in 10, 5, or 1 percent of cases), and this rejection is also in a direction which agrees with the theoretical hypothesis.

This is one of those rare cases where I am not sure if I have managed to clarify the issues on this or whether this has been more confusing than anything. For me, the scientific notion of “falsifiability” should really push us (as a profession, that is, and not necessarily as individuals) to want to publish Case 3-type studies wherein the identification is solid and a compelling alternative theoretical explanation is offered. Yet most empirical studies I know of are Case 2-type studies, with only a sprinkling of Case 1-type studies (i.e., null findings) in the literature. Unfortunately, as Spanos (1986), quoted by Kennedy (2008), wrote: “No economic theory was ever abandoned because it was rejected by some empirical econometric test, nor was a clear cut decision between competing theories made in light of the evidence of such a test.”