When a researcher estimates the parameters of a regression function using information on all 50 states in the United States, or information on all visits to a website, what is the interpretation of the standard errors? Researchers typically report standard errors that are designed to capture sampling variation, based on viewing the data as a random sample drawn from a large population of interest, even in applications where it is difficult to articulate what that population of interest is and how it differs from the sample. In this paper we explore alternative interpretations for the uncertainty associated with regression estimates. As a leading example we focus on the case where some parameters of the regression function are intended to capture causal effects. We derive standard errors for causal effects using a generalization of randomization inference. Intuitively, these standard errors capture the fact that even if we observe outcomes for all units in the population of interest, there are for each unit missing potential outcomes for the treatment levels the unit was not exposed to. We show that our randomization-based standard errors in general are smaller than the conventional robust standard errors, and provide conditions under which they agree with them. More generally, correct statistical inference requires precise characterizations of the population of interest, the parameters that we aim to estimate within such population, and the sampling process. Estimation of causal parameters is one example where appropriate inferential methods may differ from conventional practice, but there are others.
HT: Bill Sundstrom