The Controversy of Correction to effect Sizes in Meta-Analysis

There is some controversy about correcting effect sizes used in meta-analyses for methodological artifacts. In this section I describe arguments for and against correction, and then attempt to reconcile these two positions.

1. Arguments for Artifact Correction

Probably the most consistent advocates of correcting for study artifacts are John Hunter (now deceased) and Frank Schmidt (see Hunter & Schmidt, 2004; Schmidt & Hunter, 1996; as well as, e.g., Rubin, 1990). Their argu­ment, in a simplified form, is that individual primary studies report effect sizes among imperfect measures of constructs, not the constructs themselves. These imperfections in the measurement of constructs can be due to a variety of sources including unreliability of the measures, imperfect validity of the measures, or imperfect ways in which the variables were managed in primary studies (e.g., artificial dichotomization). Moreover, individual studies contain not only random sampling error (due to their finite sample sizes), but often biased samples that do not represent the population about which you wish to draw conclusions.

These imperfections of measurement and sampling are inherent to every primary study and provide a limiting frame within which you must inter­pret the findings. For instance, a particular study does not provide a perfect effect size of the association between X and Y, but rather an effect size of the association between a particular measure of X with a particular measure of Y within the particular sample of the study. The heart of the argument for artifact correction is that we are less interested in these imperfect effect sizes found in primary studies and more interested in the effect sizes between latent constructs (e.g., the correlation between construct X and construct Y).

The argument seems reasonable and in fact provides much of the impetus for the rise of such latent variable techniques as confirmatory factor analysis (e.g., Brown, 2006) and structural equation modeling (e.g., Kline, 2005) in pri­mary research. Our theories that we wish to evaluate are almost exclusively about associations among constructs (e.g., aggression and rejection), rather than about associations among measures (e.g., a particular self-report scale of aggression and a particular peer-report method of measuring rejection). As such, it makes sense that we would wish to draw conclusions from our meta­analyses about associations among constructs rather than associations among imperfect measures of these constructs reported in primary studies; thus, we should correct for artifacts within these studies in our meta-analyses.

A corollary to the focus on associations among constructs (rather than imperfect measures) is that artifact correction results in the variability among studies being more likely due to substantively interesting differences rather than methodological differences. For example, studies may differ due to a variety of features, with some of these differences being substantively inter­esting (e.g., characteristics of the sample such as age or income, type of inter­vention evaluated) and others being less so (e.g., the use of a reliable versus unreliable measure of a variable). Correction for these study artifacts (e.g., unreliability of measures) reduces this variability due to likely less interest­ing differences (i.e., noise), thus allowing for clearer illumination of differ­ences between studies that are substantively interesting through moderator analyses (Chapter 9).

2. Arguments against Artifact Correction

Despite the apparent logic supporting artifact correction in meta-analysis, there are some who argue against these corrections. Early descriptions of meta-analysis described the goal of these efforts as integrating the findings of individual studies (e.g., Glass, 1976); in other words, the synthesis of results was reported in primary studies. Although one might argue that these early descriptions simply failed to appreciate the difference between the associa­tions between measures and constructs (although this seems unlikely given the expertise Glass had in measurement and factor analysis), some modern meta-analysts have continued to oppose artifact adjustment even after the arguments put forth by Hunter and Schmidt. Perhaps most pointedly, Rosen­thal (1991) argues that the goal of meta-analysis “is to teach us better what is, not what might some day be in the best of all possible worlds” (p. 25, italics in original). Rosenthal (1991) also cautions that these corrections can yield inaccurate effect sizes, such as when corrections for unreliability yield cor­relations greater than 1.0.

Another, though far weaker, argument against artifact correction is sim­ply that such corrections add another level of complexity to our meta-analytic procedures. I agree that there is little value in making these procedures more complex than is necessary to best answer the substantive questions of the meta-analysis. Furthermore, additional data-analytic complexity often requires lengthier explanation when reporting meta-analyses, and our focus in most of these reports is typically to explain information relevant to our content-based questions rather than data-analytic procedures. At the same time, simplicity alone is not a good guide to our data-analytic techniques. The more important question is whether the cost of additional data-analytic complexity is offset by the improved value of the results yielded.

3. Reconciling Arguments Regarding Artifact Correction

Many of the critical issues surrounding the controversy of artifact correc­tion can be summarized in terms of whether meta-analysts prefer to describe associations among constructs (those for correction) or associations as found among variables in the research (those against correction). In most cases, the questions likely involve associations among latent constructs more so than associations among imperfectly measured variables. Even when questions involve measurement (e.g., are associations between X and Y stronger when X is measured in certain ways than when X is measured in other ways?), it seems likely that one would wish to base this answer on the differences in associations among constructs between the two measurement approaches rather than the magnitudes of imperfections that are common for these mea­surement approaches. Put bluntly, Hunter and Schmidt (2004) argue that attempting to meta-analytically draw conclusions about constructs without correcting for artifacts “is the mathematical equivalent of the ostrich with its head in the sand: It is a pretense that if we ignore other artifacts then their effects on study outcomes will go away” (p. 81). Thus, if you wish to draw conclusions about constructs, which is usually the case, it would appear that correcting for study artifacts is generally valuable.

At the same time, one must consider the likely impact of artifacts on the results. If one is meta-analyzing a body of research that consistently uses reliable and valid measures within representative samples, then the benefits of artifact adjustment are likely small. In these cases, the additional complex­ity of artifact adjustment is likely not warranted. To adapt Rosenthal’s (1991) argument quoted earlier, if what is matches closely with what could be, then there is little value in correcting for study artifacts.

In sum, although I do not believe that all, or even any, artifact adjust­ments are necessary in every meta-analysis, I do believe it is valuable to always consider each of the artifacts that could bias effect sizes. In meta-analyses in which these artifacts are likely to have a substantial impact on at least some of the included primary studies, it is valuable to at least explore some of the following corrections.

Source: Card Noel A. (2015), Applied Meta-Analysis for Social Science Research, The Guilford Press; Annotated edition.

Leave a Reply

Your email address will not be published. Required fields are marked *