IV Estimation (Two Stage Least Squares)
The reg command can also be used to estimate models by IV (2SLS). After specifying the dependent variable and the explanatory variables -- which presumably include at least one explanatory variable that is correlated with the error -- we then list all of the exogenous variables as instruments in parentheses. Naturally, the list of instruments does not contain any endogenous variables (variables correlated with the errors can’t be used as instruments). 公卫人
An example of a 2SLS command is
reg lwage educ exper expersq married (motheduc fatheduc exper expersq married)
This produces 2SLS esimates, standard errors, t statistics, and so on. By looking at this command, we see that educ is an endogenous explanatory variable in the log(wage) equation while exper, expersq, and married are assumed to be exogenous explanatory variables. The variables motheduc and fatheduc are assumed to be additional exogenous variables that do not appear in the log(wage) structural equation but should have some correlation with educ. These appear in the instrument list along with the exogenous explanatory variables. The order in which we order the instruments is not important. The necessary condition to obtain the 2SLS estimates is that the number of terms in parentheses is at least as large as the total number of explanatory variables. In this example, the count is five to four, and so we can obtain estimates. Heteroskedasticity-robust inference is obtained by appending robust (after a comma) to the end of the command.
Allowing for more than one endogenous explanatory variable is also easy. Suppose caloric consumption (calories) and protein consumption (protein) are endogenous in a wage equation for people in developing countries. However, we have regional prices on five commodity groups, say price1, ..., price5, to use as instruments. The Stata command for 2SLS might look like 公卫人
reg lwage educ exper male protein calories (educ exper male price1 price2 price3 price4 price5) if year == 1990
if the analysis is restricted to data for 1990. Note that educ, exper, and male are taken to be exogenous here. There are enough instruments to estimate the model (8 > 5).
After 2SLS, we can test multiple restrictions using the test command, just as with OLS.
Recent versions of Stata have a special command for instrumental variables estimation, ivreg. The syntax of the ivreg a bit less intuitive than the reg command. For the example where parents’ education is used as IVs for own education, the command is 公卫家园
ivreg lwage exper expersq married (educ = motheduc fatheduc)
Note how educ does not (and should not) appear among the first set of variables – only exogenous explanatory variables do. Plus, the exogenous explanatory variables appearing in the model need not appear in the list of instruments; only those coming from outside the model need appear. As it turns out, whether you add the variables exper, expersq, and married to the instrument list does not change the estimation. As before, heteroskedasticity-robust inference is obtained by appending robust (after a comma) to the end of the command. 公卫论坛