Perhaps no statement is more true, and humbling, than this offered as the opening of Harris Cooper’s editorial in Psychological Bulletin (and likely stated in similar words by many others): “Scientists have yet to conduct the flawless experiment” (Cooper, 2003, p. 3). I would extend this conclusion further to point out that no scientist has yet conducted a flawless study, and even further by stating that no meta-analyst has yet performed a flawless review. Each approach to empirical research, and indeed each application of such approaches within a particular field of inquiry, has certain limits to the contributions it can make to our understanding. Although full consideration of all of the potential threats to drawing conclusions from empirical research is beyond the scope of this section, I next highlight a few that I think are most useful in framing consideration of the most salient limits of primary research and meta-analysis—those of study design, sampling, methodological artifacts, and statistical power.

**1. Limits of Study Design**

Experimental designs allow inferences of causality but may be of questionable ecological validity. Certain features of the design of experimental (and quasi-experimental) studies dictate the extent to which conclusions are valid (see Shadish, Cook, & Campbell, 2002). Naturalistic (a.k.a. correlational) designs are often advantageous in providing better ecological validity than experimental designs and are often useful when variables of interest cannot, or cannot ethically, be manipulated. However, naturalistic designs cannot answer questions of causality, even in longitudinal studies that represent the best nonexperimental attempts to do so (see, e.g., Little, Card, Preacher, & McConnell, 2009).

Whatever limits due to study design that exist within a primary study (e.g., problems of internal validity in suboptimally designed experiments, ambiguity in causal influence in naturalistic designs) will also exist in a metaanalysis of those types of studies. For example, meta-analytically combining experimental studies that all have a particular threat to internal validity (e.g., absence of double-blind procedures in a medication trial) will yield conclusions that also suffer this threat. Similarly, meta-analysis of concurrent correlations from naturalistic studies will only tell you about the association between X and Y, not about the causal relation between these constructs. In short, limits to the design that are consistent across primary studies included in a meta-analysis will also serve as limits to the conclusions of the metaanalysis.

**2. Limits of Sampling**

Primary studies are also limited in that researchers can only generalize the results to populations represented by the sample. Findings from studies using samples homogeneous with respect to certain characteristics (e.g., gender, ethnicity, socioeconomic status, age, settings from which the participants are sampled) can only inform understanding of populations with characteristics like the sample. For example, a study sampling predominantly White, middle- and upper-class, male college students (primarily between 18 and 22 years of age) in the United States cannot draw conclusions about individuals who are ethnic minority, lower socioeconomic status, females of a different age range not attending college, and/or not living in the United States.

These limits of generalizability are well known, yet widespread, in much social science research (e.g., see Graham, 1992, for a survey of ethnic and socioeconomic homogeneity in psychological research). One feature of a well- designed primary study is to sample intentionally a heterogeneous group of participants in terms of salient characteristics, especially those about which it is reasonable to expect findings potentially to differ, and to evaluate these factors as potential moderators (qualifiers) of the findings. Obtaining a heterogeneous sample is difficult, however, in that the researcher must typically obtain a larger overall sample, solicit participants from multiple settings (e.g., not just college classrooms) and cultures (e.g., not just in one region or country), and ensure that the methods and measures are appropriate for all participants. The reality is that few if any single studies can sample the wide range of potentially relevant characteristics of the population about which we probably wish to draw conclusions.

These same issues of sample generalizability limit conclusions that we can draw from the results of meta-analyses. If all primary studies in your meta-analysis sample a similar homogeneous set of participants, then you should only generalize the results of meta-analytically combining these results to that homogeneous population. However, if you are able to obtain a collection of primary studies that are diverse in terms of sample characteristics, even if the studies themselves are individually homogeneous, then you can both (1) evaluate potential differences in results based on sample characteristics (through moderator analyses; see Chapter 9) and (2) make conclusions that are generalizable to this more heterogeneous population. In this way, meta-analytic reviews have the potential to draw more generalizable conclusions than are often tractable within a primary study, provided you are able to obtain studies collectively consisting of a diverse range of participants. However, you should keep in mind the limits of the samples of studies included in your meta-analysis and be cautious not to extrapolate beyond these limits. Most meta-analyses contain some limits—intentional (specified by inclusion/exclusion criteria; see Chapter 3) or unintentional (required by the absence or unavailability—e.g., written in a language that you do not know—of primary research with some populations)—that limit the generalizability of conclusions.

**3. Limits of Methodological Artifacts**

Researchers planning and conducting primary studies do not intentionally impose methodological artifacts, but these often arise. These artifacts, described in detail in Chapter 6, can arise from imperfect measures (imperfect reliability or validity), sampling homogeneity (resulting in direct or indirect restriction of ranges among variables of interest), or poor data-analytic choices (e.g., artificial dichotomization of continuous variables). These artifacts typically^{2} attentuate, or diminish, the effect sizes estimated in primary studies. This attenuation leads to lower statistical power (higher rates of type II error) and underestimation of the magnitude—and potentially the impor- tance—of the results.

These artifacts can be corrected in the sense that it is possible to estimate the magnitude of “true” effect sizes disattenuated for these artifacts. In primary studies, this is rarely done, with the exception of those using latent variable analyses to correct for unreliability (see, e.g., Kline, 2005). This correction for attenuation of effect sizes is more common in meta-analyses, though the practice is somewhat controversial and varies across disciplines (see Chapter 6). Whether or not you correct for certain artifacts in your own meta-analyses should guide the extent to which you view these artifacts as potential limits (by attenuating your effect sizes and potentially introducing less meaningful heterogeneity).

**4. Limits of Statistical Power**

Statistical power refers to the probability of concluding that an effect exists when it truly does. The converse of statistical power is type II error, or failing to conclude that an effect exists when it does. Although this concept of statistical power is rooted in the Null Hypothesis Significance Testing framework (which is problematic, as I describe in Chapter 5), statistical power is also relevant in other frameworks such as reliance on point estimates and confidence intervals in describing results (i.e., low statistical power leads to large confidence intervals).

The statistical power of a primary study depends on several factors, including the type I error rate (i.e., a) set by the researcher, the type of analysis performed, and the magnitude of the effect size within the population. However, because these other factors are typically out of the researcher’s control,^{3} statistical power is dictated primarily by sample size, where larger sample sizes yield greater statistical power. When planning primary studies, researchers should conduct power analyses to guide the number of participants needed to have a certain probability (often .80) of detecting an effect size of a certain magnitude (for details see, e.g., Cohen, 1969; Kraemer & Thiemann, 1987; Murphy & Myors, 2004).

Despite the potential for power analysis to guide study design, there are many instances when primary studies are underpowered. This might occur because the power analysis was based on an unrealistically high expectation of population effect size, because it was not possible to obtain enough participants due to limited resources or scarcity of appropriate participants (e.g., when studying individuals with rare conditions), or because the researcher failed to perform a power analysis in the first place. In short, although inadequate statistical power is not a problem inherent to primary research, it is plausible that in many fields a large number of existing studies do not have adequate statistical power to detect what might be considered a meaningful magnitude of effect (see, e.g., Halpern, Karlawish, & Berlin, 2002; Maxwell, 2004).

When a field contains many studies that fail to demonstrate an effect because they have inadequate statistical power, there is the danger that readers of this literature will conclude that an effect does not exist (or that it is weak or inconsistent). In these situations, a meta-analysis can be useful in combining the results of numerous underpowered studies within a single analysis that has greater statistical power.^{4} Although meta-analyses can themselves have inadequate statistical power, they will generally^{5 }have greater statistical power than the primary studies comprising them (Cohn & Becker, 2003). For this reason, meta-analyses are generally less impacted by inadequate statistical power than are primary studies (but see Hedges & Pigott, 2001, 2004 for discussion of underpowered metaanalyses).

Source: Card Noel A. (2015), *Applied Meta-Analysis for Social Science Research*, The Guilford Press; Annotated edition.

25 Aug 2021

25 Aug 2021

25 Aug 2021

25 Aug 2021

24 Aug 2021

25 Aug 2021