It often happens in the course of doing empirical work that we wish study the relationship between some variable of interest D and some outcome Y, but that we don’t have access to a good measure of D. Rather, what we have instead is a proxy for D, which Wikipedia defines as
a variable that is not in itself directly relevant, but that serves in place of an unobservable or immeasurable variable. In order for a variable to be a good proxy, it must have a close correlation, not necessarily linear or positive, with the variable of interest.
For example, we may observe a dummy variable for whether one has started a business as a proxy for entrepreneurial ability. Or we may observe one’s IQ as a proxy for intellectual ability. Or we may observe the frequency of elections as a proxy for democracy. The possibilities here are endless.
For the sake of argument, then let’s denote our proxy variable–what we actually observe in lieu of D–as D*, so that