Last week I talked about what to do what to do with an obviously endogenous control variable. This week, I answer a question received via email:
… [Y]ou should consider publishing a blog post about how you handle various types of missing data when you are working with secondary data. … I come across data with a lot of [missing] values when analyzing managing household data. I get confusing and contradicting responses when I search on Google as well as when I ask my peers about how to treat missing values. I feel how we handle missing values affects the reproducibility of one’s results hence I wanted to learn if you have any suggestions on how to manage missing values. I am of the view that I may not be the only one who can benefit from learning how you handle this issue when analyzing data for your various research projects.
That is a good question, and its object is something which is not discussed often in econometrics classes, where students are often presented with data sets that have been cleaned and have no missing values. As the email indicates, real-world data is often much messier.