Critiques of Meta-Analysis: When Are They Valid and When Are They Not?

As I outlined in Chapter 1, attention to meta-analysis emerged in large part with the attention received by Smith and Glass’s (1977) meta-analysis of psychotherapy research (though others developed techniques of meta-analysis at about the same time; e.g., Rosenthal & Rubin, 1978; Schmidt & Hunter, 1977). The controversial nature of this meta-analysis drew criticisms, both of the particular paper and of the process of meta-analysis itself. Although these criticisms were likely motivated more by dissatisfaction with the results than the approach, there has been some persistence of these criticisms toward meta-analysis since its early years. The result of this extensive criticism, and efforts to address these critiques, is that meta-analysis as a scientific process of reviewing empirical literature has a deeper appreciation of its own limits; so this criticism was in the end fruitful.

In the remainder of this section, I review some of the most common criticisms of meta-analysis (see also, e.g., Rosenthal & DiMatteo, 2001; Sharpe, 1997). I also attempt to provide an objective consideration of the extent, and under what conditions, these criticisms are valid. At the end of this section, I place these criticisms in perspective by noting that many apply to any literature review.

1. Amount of Expertise Needed to conduct and understand

Although not necessarily a critique, I think it is important first to address a common misperception I encounter: that meta-analysis requires extensive statistical expertise to conduct. Although very advanced, complex methods exist for various aspects of meta-analysis, most meta-analyses do not require especially complicated analyses. The techniques might seem rather obscure or complex when one is first reading meta-analyses; I believe that this is primarily because most of us received considerable training in primary analysis during our careers, but have little if any exposure to meta-analysis. However, performing a basic yet sound meta-analysis requires little more expertise than that typically acquired in a research-oriented graduate social science program, such as the ability to compute means, variances, and perhaps perform an analysis of variance (ANOVA) or regression analysis, albeit with some small twists in terms of weighting and interpretation.⁶

Although I do not view the statistical expertise needed to conduct a sound meta-analysis as especially high, I do feel obligated to make clear that metaanalyses are not easy. The time required to search adequately for and code studies is substantial (see Chapters 3-7). The analyses, though not requiring an especially high level of statistical complexity, must be performed with care and by someone with the basic skills of meta-analysis (such as provided in Chapters 8-11). Finally, the reporting of a meta-analysis can be especially difficult given that you are often trying to make broad, authoritative statements about a field (see Chapters 13-14). My intention is not to scare anyone away from performing a meta-analysis, but I think it is important to recognize some of the difficulty in this process. However, needing a large amount of statistical expertise is not one of these difficulties for most meta-analyses you will want to perform.

2. Quantitative Analysis May Lack “Qualitative Finesse” of Evaluating Literature

Some complain that meta-analyses lack the “qualitative finesse” of a narrative review, presumably meaning that it fails to make creative, nuanced conclusions about the literature. I understand this critique, and I agree that some meta-analysts can get too caught up in the analyses themselves at the expense of carefully considering the studies. However, this tendency is certainly not inherent to meta-analysis, and there is certainly nothing to preclude the meta-analyst from engaging in this careful consideration.

To place this critique in perspective, I think it is useful to consider the general approaches of qualitative and quantitative analysis in primary research. Qualitative research undoubtedly provides rich, nuanced information that has contributed substantially to understanding in nearly all areas of social sciences. At the same time, scientific progress would be limited if we did not also rely on quantitative methods and on methods of analyzing these quantitative data. Few scientists would collect quantifiable data from dozens or hundreds of individuals, but would instead use a method of analysis consisting of looking at the data and “somehow” drawing conclusions about central tendency, variability, and co-occurrences of individual differences. In sum, there is substantial advantage to conducting primary research using both qualitative and quantitative analyses, or a combination of both.

Extending this value of qualitative and quantitative analyses in primary research to the process of research synthesis, I do not see careful, nuanced consideration of the literature and meta-analytic techniques to be mutually exclusive processes. Instead, I recommend that you rely on the advantages of meta-analysis in synthesizing vast amounts of information and aiding in drawing probabilistic inferential conclusions, but also using your knowledge of your field where these quantitative analyses fall short. Furthermore, metaanalytic techniques provide results that are statistically justifiable (e.g., there is an effect size of a certain range of magnitude; some type of studies provide larger effect sizes than another type), but it is up to you to connect these findings to relevant theories in your field. In short, a good meta-analytic review requires both quantitative methodology and “qualitative finesse.”

3. The “Apples and Oranges” Problem

The critique known as the “apples and oranges problem” was first used as a critique against Smith and Glass’s (1977) meta-analytic combination of studies using diverse methods of psychotherapy in treating a wide range of problems among diverse samples of people (see Sharpe, 1997). Critics charge that including such a diverse range of studies in a meta-analysis yields meaningless results.

I believe that this critique is applicable only to the extent that the metaanalyst wants to draw conclusions about apples or oranges; if you want to draw conclusions only about a narrowly defined population of studies (e.g., apples), then it is problematic to include studies from a different population (e.g., oranges). However, if you wish to make conclusions about a broad population of studies, such as all psychotherapy studies of all psychological disorders, then it is appropriate to combine a diverse range of studies. To extend the analogy: combining apples and oranges is appropriate if you want to draw conclusions about fruit; in fact, if you want to draw conclusions about fruit you should also include limes, bananas, figs, and berries! Studies are rarely identical replications of one another, so including studies that are diverse in methodology, measures, and sample within your metaanalysis has the advantage of improving the generalizability of your conclusions (Rosenthal & DiMatteo, 2001). So, the apples and oranges critique is not so much a critique about meta-analysis; rather, it just targets whether or not the meta-analyst has considered and sampled studies from an appropriate level of analysis.

In considering this critique, it is useful to consider the opportunities for considering multiple levels of analysis through moderator analysis in metaanalysis (see Chapter 9). Evoking the fruit analogy one last time: A metaanalysis can include studies of all fruit and report results about fruit; but then systematically compare apples, oranges, and other fruit through moderator analyses (i.e., do results involving apples and oranges differ?). Further moderator analyses can go further by comparing studies involving, for example, McIntosh, Delicious, Fuji, and Granny Smith apples. The possibility of including diverse studies in your meta-analysis and then systematically comparing these studies through moderator analyses means that the apples and oranges problem is easily addressable.

4. The “File Drawer” Problem

The “file drawer” problem is based on the possibility that the studies included in a meta-analysis are not representative of those that have been conducted because studies that fail to find significant or expected results are hidden away in researchers’ file drawers. Because I devote an entire chapter to this problem, also called publication bias, later in this book (Chapter 11), I do not treat this threat in detail here. Instead, I briefly note that this is indeed a threat to meta-analysis, as it is to any literature review. Fortunately, metaanalyses typically use systematic and thorough methods of obtaining studies (Chapter 3) that minimize this threat, and meta-analytic techniques for detecting and potentially correcting for this bias exist (Chapter 11).

5. Garbage In, Garbage Out

The critique of “garbage in, garbage out” is that the meta-analysis of poor quality primary studies only results in conclusions of poor quality. In many respects this critique is a valid threat, though there are some exceptions. First, we can consider what “poor quality” (i.e., garbage) really means. If studies are described as being of poor quality because they are underpowered (i.e., have low statistical power to detect the hypothesized effect), then metaanalysis can overcome this limitation by aggregating findings from multiple underpowered studies to produce a single analysis that is more powerful. If studies are considered to be of poor quality because they contain artifacts such as using measures that are less reliable or less valid than is desired, or if the primary study authors used certain inappropriate analytic techniques (e.g., artificially dichotomizing continuous variables), then methods of correcting effect sizes might help overcome these problems (see Chapter 6). For these types of “garbage” then, meta-analyses might be able to produce high- quality findings.

There are other types of problems of study quality that meta-analyses cannot overcome. For instance, if all primary studies evaluating a particular treatment fail to assign participants randomly to conditions, do not use double-blind procedures, or the like, then these threats to internal validity in the primary studies will remain when you combine the results across studies in a meta-analysis. Similarly, if the primary studies included in a metaanalysis are all concurrent naturalistic designs, then there is no way that meta-analytic combination of these results can inform causality. In short, the design limitations that consistently occur in the primary studies will also be limitations when you meta-analytically combine these studies.

Given this threat, some have recommended that meta-analysts exclude studies that are of poor study quality, however that might be defined (see Chapter 4). Although this exclusion does ensure that the conclusions you reach have the same advantages afforded by good study designs as are available in the primary studies, I think that uncritically following this advice is misguided for three reasons. First, for some research questions, there may be so few primary studies that meet strict criteria for “quality” that it is not very informative to combine or compare them; however, there may be many more studies that contain some methodological flaws. In these same situations, it seems that the progression of knowledge is unnecessarily delayed by stubborn unwillingness to consider all available evidence. I believe that most fields benefit more from an imperfect meta-analysis than no meta-analysis at all, provided that you appropriately describe the limits of the conclusions of your review. A second reason I think that dogmatically excluding poor quality studies is a poor choice is that this practice assumes that certain imperfections of primary studies result in biased effects, yet does not test this assumption. This leads to the third reason: Meta-analyses can evaluate whether systematic differences in effect sizes emerge from certain methodological features. If you code the relevant features of primary studies that are considered “quality” within your particular field (see Chapter 4), you can then evaluate whether these features systematically relate to differences in the results (effect sizes) found among studies through moderator analyses (Chapter 9). Having done this, you can (1) make statements about how the differences in specific aspects of quality impact the effect sizes that are found, which can guide future design of primary studies; (2) where differences are found, limit conclusions to the types of studies that you believe produce the most valid results; and (3) where differences are not found, have the advantage of including all relevant studies (versus a priori excluding a potentially large number of studies).

6. Are These Problems Relevant Only to Quantitative Reviews?

Although these critiques were raised primarily against the early meta-analyses and have since been raised as challenges primarily against meta-analytic (i.e., quantitative) reviews, most apply to all types of research syntheses. Aside from the first two I have reviewed (meta-analyses requiring extensive statistical expertise and lacking in finesse), which I have clarified as being generally misconceptions, the remainder can be considered as threats to all types of research syntheses (including narrative research reviews) and often all types of literature reviews (see Figure 1.1). However, because these critiques have most often been applied toward meta-analysis, we have arguably considered these threats more carefully than have scholars performing other types of literature reviews. It is useful to consider how each of the critiques I described above threatens both quantitative and other literature reviews (considering primarily the narrative research review), and how each discipline typically manages the problem.

The “apples and oranges” problem (i.e., inclusion of diverse types of studies within a review) is potentially threatening to both narrative and meta-analytic review. However, my impression is that meta-analyses more commonly attempt to draw generalized conclusions across diverse types of primary studies, whereas narrative reviews more often draw fragmented conclusions of the form “These types of studies find this. These other types of studies find this.” If practices stopped there, then the apples and oranges problem could more fairly be applied to meta-analyses than other reviews. However, meta-analysts usually perform moderator analyses to compare the diverse types of studies, and narrative reviews often try to draw synthesized conclusions about the diverse types of studies. Given that both types of reviews typically attempt to draw conclusions at multiple levels (i.e., about fruits in general and about apples and oranges in particular), the critique of focusing on the “wrong” level of generalization—if there is such a thing, versus just focusing on a different level of generalization than another scholar might choose—is equally applicable to both. However, both the process of drawing generalizations across diverse studies and the process of comparing diverse types of studies are more objective and lead to more accurate conclusions (Cooper & Rosenthal, 1980) when performed using meta-analytic versus narrative review techniques.

The “file drawer” problem—the threat of unpublished studies not being included in a review, and the resultant available studies being a biased representation of the literature—is a threat to all attempts to draw conclusions from this literature. In other words, if the available literature is biased, then this bias affects any attempt to draw conclusions from the literature, narrative or meta-analytic. However, narrative reviews almost never consider this threat, whereas meta-analytic reviews routinely consider it and often take steps to avoid it and/or evaluate it (indeed, there exists an entire book on this topic; see Rothstein, Sutton, & Borenstein, 2005b). Meta-analysts typically make greater efforts to systematically search for unpublished literature (and studies published in more obscure sources) than do those preparing narrative reviews (Chapter 3). Meta-analysts also have the ability to detect publication bias through comparison of published and available unpublished studies, funnel plots, or regression analyses, as well as the means to evaluate the plausibility of the file drawer threat through failsafe numbers (see Chapter 11). All of these capabilities are absent in the narrative review.

Finally, the problem of “garbage in, garbage out”—that the inclusion of poor quality studies in a review leads to poor quality results from the review—is a threat to both narrative and meta-analytic reviews. However, I have described ways that you can overcome some problems of the primary studies in meta-analysis (low power, presence of methodological artifacts), as well as systematically evaluate the presumed impact of study quality on results, that are not options in a narrative review.

In sum, the problems that might threaten the results of a meta-analytic review are also threats to other types (e.g., narrative) of reviews, even though they are less commonly considered in other contexts. Moreover, metaanalytic techniques have been developed that partially or fully address these problems; parallel techniques for narrative reviews either do not exist or are rarely considered. For these reasons, although you should be mindful of these potential threats when performing a meta-analytic review, these threats are not limited—and are often less of threats—in a meta-analytic relative to other types of research reviews.

Source: Card Noel A. (2015), Applied Meta-Analysis for Social Science Research, The Guilford Press; Annotated edition.

Meta Analysis

Critiques of Meta-Analysis: When Are They Valid and When Are They Not?

1. Amount of Expertise Needed to conduct and understand

2. Quantitative Analysis May Lack “Qualitative Finesse” of Evaluating Literature

3. The “Apples and Oranges” Problem

4. The “File Drawer” Problem

5. Garbage In, Garbage Out

6. Are These Problems Relevant Only to Quantitative Reviews?

Leave a Reply Cancel reply

1. Amount of Expertise Needed to conduct and understand

2. Quantitative Analysis May Lack “Qualitative Finesse” of Evaluating Literature

3. The “Apples and Oranges” Problem

4. The “File Drawer” Problem

5. Garbage In, Garbage Out

6. Are These Problems Relevant Only to Quantitative Reviews?

Leave a Reply Cancel reply

Login