{"id":11634,"date":"2016-01-18T05:00:10","date_gmt":"2016-01-18T11:00:10","guid":{"rendered":"http:\/\/marcfbellemare.com\/wordpress\/?p=11634"},"modified":"2016-01-17T12:05:42","modified_gmt":"2016-01-17T18:05:42","slug":"metrics-monday-the-tobit-temptation","status":"publish","type":"post","link":"https:\/\/marcfbellemare.com\/wordpress\/11634","title":{"rendered":"&#8216;Metrics Monday: The Tobit Temptation"},"content":{"rendered":"<blockquote><p>And because thou wast acceptable to God, it was necessary that temptation should prove thee. And now the Lord hath sent me to heal thee &#8230; &#8212; Tobit 12:13.<\/p><\/blockquote>\n<p>This week I wanted to\u00a0discuss tobit estimators.\u00a0In case you are not familiar with it, Wiki <a href=\"https:\/\/en.wikipedia.org\/wiki\/Tobit_model\">describes<\/a> the tobit estimators (people say tobit &#8220;models,&#8221; but I don&#8217;t like calling estimators models, which confuses theory with empirics a bit too much for my taste) as<\/p>\n<blockquote><p>a statistical model proposed by James Tobin (1958) to describe the relationship between a non-negative dependent variable Y\u00a0and an independent variable X. The term tobit was derived from Tobin&#8217;s name by truncating and adding -it by analogy with the probit model.<\/p>\n<p>The model supposes that there is a latent (i.e. unobservable) variable Y*. This variable linearly depends on X\u00a0via a parameter (vector) b\u00a0which determines the relationship between the independent variable (or vector) X\u00a0and the latent variable Y*\u00a0(just as in a linear model). In addition, there is a normally distributed error term U\u00a0to capture random influences on this relationship. The observable variable Y\u00a0is defined to be equal to the latent variable whenever the latent variable is above zero and zero otherwise.<\/p><\/blockquote>\n<p>There are many types of tobits&#8211;Wiki lists five, which are such that<!--more--><\/p>\n<ol>\n<li>Type I Tobit: &#8220;a special case of a censored regression model, because the latent variable Y*\u00a0cannot always be observed while the independent variable X\u00a0is observable.&#8221; This would be the case, for example, if you observe age and income, but incomes below $30,000 per year are censored, but you observe age for everyone. The censoring can also occur above a certain threshold, or both above and below specific thresholds.<\/li>\n<li>Type II Tobit: &#8220;Heckman (1987) falls into the Type II Tobit. In Type I Tobit, the latent variable absorb both the process of participation and &#8216;outcome&#8217; of interest. Type II Tobit allows the process of participation\/selection and the process of &#8216;outcome&#8217; to be independent, conditional on x.&#8221; This would be the case, for example, if you want to account for selection into a specific thing. Taking an example from my own work, you might want to account for whether a farmer\u00a0participates\u00a0in contract farming when studying whether participation in contract farming increases welfare, since participation in contract farming\u00a0is not randomly sprinkled across farmers.<\/li>\n<li>Type III Tobit:\u00a0This is the bivariate version of the tobit, i.e., it simultaneously estimates two tobits.<\/li>\n<li>Type IV Tobit: This is the trivariate version of the tobit, i.e., it simultaneously estimates three tobits.<\/li>\n<li>Type V Tobit: &#8220;Similar to type II, in type V we only observe the sign of Y*.&#8221;<\/li>\n<\/ol>\n<p>My goal with this post is simply to discuss the temptation\u00a0among some people to control for selection with a type II tobit, also known as a Heckman selection model or a heckit (once again, following the tradition to add &#8220;-it&#8221; at the end of those limited and discrete-choice ML estimators).<\/p>\n<p>Indeed, my view is this: Assuming you have a decent variable that you can exclude from the equation of interest to explain selection into treatment, why go through the trouble of estimating a heckit when you can estimate a plain-old 2SLS?<\/p>\n<p>(And\u00a0I say this as someone who in a past, more structural life, was also tempted\u00a0by\u00a0heckits and wrote\u00a0a likelihood\u00a0function that involves something similar and slapped the label of &#8220;<a href=\"http:\/\/ajae.oxfordjournals.org\/content\/88\/2\/324.short\">ordered tobit<\/a>&#8221; on it&#8211;proof that <a href=\"http:\/\/marcfbellemare.com\/wordpress\/11394\">fads and fashions<\/a> are definitely a thing in econometrics as with almost everything else.)<\/p>\n<p>Why should you go for the 2SLS instead of the heckit? Simply because the current preference is to keep it simple, and because the 2SLS does just that relative to the heckit. Indeed, both the 2SLS and the heckit estimate two equations. The first equation attempts to purge treatment of its correlation with the second-equation error term due to selection by using a plausibly exogenous variable to do so,* and the second equation estimates a\u00a0treatment effect on the basis of this purged-of-endogeneity version of the treatment variable.**<\/p>\n<p>There is a difference, of course. Whereas 2SLS will just use\u00a0the exogenized version of your treatment variable as a regressor of interest, the heckit will transform said treatment variable through something called the inverted Mills ratio (IMR), i.e.,\u00a0&#8220;the ratio of the probability density function to the cumulative distribution function of a distribution,&#8221; per <a href=\"https:\/\/en.wikipedia.org\/wiki\/Inverse_Mills_ratio\">Wiki<\/a>. But this imposes quite a bit of structure on the <a href=\"http:\/\/marcfbellemare.com\/wordpress\/11349\">ecosystem<\/a>\u00a0(and it makes a distributional form assumption, usually a Gaussian one), which is unnecessary. Not only is it unnecessary, it can lead to identification because of the specific functional form (i.e., the IMR) assumed. All of this leads to a clear case for estimating a relatively simple linear setup like 2SLS rather than a heckit.<\/p>\n<p>* And this is in those cases where the heckit setup follows that of 2SLS, viz. cases\u00a0the variable of interest is instrumented by your IV and other variables serve as instruments for themselves, or cases where the controls are exactly the same across both equations. The heckit accommodates cases where the variables on the RHS are not the same across the two equations, which seems to me like it can lead to serious data mining.<\/p>\n<p>** This is assuming, of course, that you are interested in the effect of treatment itself, and not in controlling for treatment while\u00a0studying\u00a0the effect of some other variable.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>And because thou wast acceptable to God, it was necessary that temptation should prove thee. And now the Lord hath sent me to heal thee<\/p>\n<div class=\"more-link-wrapper\"><a class=\"more-link\" href=\"https:\/\/marcfbellemare.com\/wordpress\/11634\">Continue reading<span class=\"screen-reader-text\">&#8216;Metrics Monday: The Tobit Temptation<\/span><\/a><\/div>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[1],"tags":[],"class_list":["post-11634","post","type-post","status-publish","format-standard","hentry","category-uncategorized","entry"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p1gPg8-31E","_links":{"self":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/11634","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/comments?post=11634"}],"version-history":[{"count":9,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/11634\/revisions"}],"predecessor-version":[{"id":11643,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/posts\/11634\/revisions\/11643"}],"wp:attachment":[{"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/media?parent=11634"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/categories?post=11634"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/marcfbellemare.com\/wordpress\/wp-json\/wp\/v2\/tags?post=11634"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}