Stepping back from the immediate question of whether the CCES in fact shows a low rate of voting among non-citizens, our analysis carries a much broader lesson and caution about the analysis of big databases to study low frequency characteristics and behaviors. Very low levels of measurement error are easily tolerated in samples of 1,000 to 2,000 persons. But in very large sample surveys, classification errors in a high-frequency category can readily contaminate a low-frequency category, such as non-citizens. As a result, researchers may draw incorrect inferences concerning the behavior of relatively rare individuals in a population when there is even a very low level of misclassification.
Source: The Perils of Cherry Picking Low Frequency Events in Large Sample Surveys | CCES