I have been thinking about writing a post on nonlinear relationships ever since my colleague Jason Kerwin mentioned the Stata command -utest- at one of our development seminars.
As its name suggests, -utest- allows testing for the presence of a U-shaped relationship between your dependent variable and one of your explanatory variables.
For example, let [math]y[/math] denote an individual’s asset holdings and [math]x[/math] denote her age. With a sample of working-age adults, and without bringing in any additional variables, you might want to test the hypothesis that there is an inverse U-shaped relationship between an individual’s asset holdings and her age.
Indeed, few of us had any assets at 18. But as we go through our working life, our stock of assets grow–we purchase cars, buy homes, save some money for retirement, and so on. So at least in early adulthood, the relationship between asset holdings and age should be positive. Conversely, when we retire, we start selling off assets to maintain a certain standard of living, so after retirement, the relationship between asset holdings and age should be negative.
If you are interested in the effect of some other variable [math]D[/math] on individuals’ asset holdings in the context of a regression that also includes age, such that
[math]y = \alpha + \beta{x} + \gamma{D} + \epsilon[/math],
then whether you include just [math]x[/math] or both [math]x[/math] and [math]x^2[/math] does not really matter–what matters is to get [math]E(y|D,x)[/math] right, and provided you control for [math]x[/math] at all, you get that; I seem to recall that Angrist and Pischke discuss this briefly in Mostly Harmless Econometrics. But there are cases where you might be genuinely curious about whether there is a nonlinear relationship between [math]x[/math] and [math]y[/math], so you would estimate
[math]y = \alpha + \beta_1{x} +\beta_2{x^2} + \gamma{D} + \epsilon[/math],
and you’d then look at whether [math]\beta_2[/math] is significantly different from zero. If it is and it is positive (negative), then there is a(n inverse) U-shaped relationship between [math]x[/math] and [math]y[/math].
The usefulness of the -utest- command comes from the fact that it can determine whether there is actually a U-shaped relationship (whether inverse or not) and report a p-value for the hypothesis that there is not such relationship, and that it also reports the extremum (i.e., the maximum in case of an inverse U-shaped relationship and the minimum in case of a U-shaped relationship) and whether that extremum is within the range of your [math]x[/math] variable.
If you are interested in reading more on testing for U-shaped relationships and for the theory behind -utest-, see this cleverly titled (“With or Without U”) article by Lind and Mehlum (OBES, 2009).
In future installments, I’m hoping to cover the use of splines and other procedures to look at whether the relationship between two variables is nonlinear of a higher order, i.e., “more nonlinear” than what a simple second-order polynomial or quadratic function can uncover.
Update: A point which I should have made clearer in the original post is that -utest- allows determining whether you have a(n inverse) U-shaped relationship, and not just a monotonic relationship that is also be convex (concave). In other words, the main takeaway here is that you can’t just look at [math]\beta_2[/math], check whether it is significant, and conclude that there is indeed a U-shaped relationship. This, too, was new to me. I thank Simon Savard for making me realize that this should be clarified.
* Little-known fact: Though “Nothing Compares 2 U” was made popular by Sinead O’Connor, it was written and composed Prince, who was about as local an artist as can be around the Twin Cities (with the potential exception of Garrison Keillor).