Robust Estimates of Variance in Linear Regression by using Stata

The standard errors and hypothesis tests that accompany ordinary regression (such as regress or anova) assume that errors follow independent and identical distributions. If this assumption is untrue, those standard errors probably will understate the true sample-to-sample variation, and yield unrealistically narrow confidence intervals or too-low test probabilities. To cope with a common problem called heteroskedasticity, regress and some other model fitting commands have an option that estimates standard errors without relying on the strong and sometimes implausible assumptions of independent, identically distributed errors. This option uses an approach derived independently by Huber, White and others that is sometimes referred to as a sandwich estimator of variance. Type help vce option, or see vce_option in the Stata Reference Manual, for technical details.

The previous section ended with a regression of logco2 on three predictors: loggdpO, urbanO and their product urb_gdp. To repeat this same regression but with robust standard errors, we just add the vce(robust) option: . regress logco2 loggdpO urbanO urb_gdp, vce(robust)

Descriptive aspects of the regression — the coefficients and R2 — are identical with or without robust standard errors. On the other hand the robust standard errors themselves, along with confidence intervals, t and F tests, differ from their non-robust counterparts seen earlier. The differences here are slight, however. The basic results for this example do not depend on assuming errors that are independent and identically distributed across values of predictors.

The rationale underlying these robust standard-error estimates is explained in the User’s Guide. Briefly, we give up on the classical goal of estimating true population parameters (P’s) for a model such as

Instead, we pursue the less ambitious goal of simply estimating the sample-to-sample variation that our b coefficients might have, if we drew many random samples and applied OLS repeatedly to calculate b values for a model such as

We do not assume that these b estimates will converge on some “true” population parameter. Confidence intervals formed using the robust standard errors therefore lack the classical interpretation of having a certain probability (across repeated sampling) of containing the true value of p. Rather, the robust confidence intervals have a certain probability (across repeated sampling) of containing b, defined as the value upon which sample b estimates converge. Thus, we pay for relaxing the identically-distributed-errors assumption by settling for a less impressive conclusion.

Another robust-variance option, vce(cluster clustervar), allows us to relax the independent- errors assumption in a limited way, when errors are correlated within subgroups or clusters of the data. For example, in the cross-national data we have seen substantial differences in variation by region. Adding the option vce(cluster region) obtains robust standard errors across clusters defined by region.

Again, the regression coefficients and R2 are identical to those in the earlier models, but the standard errors, confidence intervals and hypothesis tests have changed. The clustered standard errors are substantially larger than those in the earlier models, resulting in smaller t statistics and higher probabilities. Using vce(robust) earlier brought much less change, indicating no particular problem with assuming that errors are independent and identically distributed with respect to predictors in the model. Using vce(cluster region), however, brought larger changes indicating that, as we suspected, errors are not independent and identically distributed with respect to region. Consequently, the vce(cluster region) estimates are more plausible, and should be reported in place of the default estimates if we were writing up these results as research.

Source: Hamilton Lawrence C. (2012), Statistics with STATA: Version 12, Cengage Learning; 8th edition.

Leave a Reply

Your email address will not be published. Required fields are marked *