Practical Matters: Nonindependence among Effect Sizes

An important qualifier to the analyses I have described in this chapter (and those I will describe in subsequent chapters) is that they should be performed with a set of independent effect sizes. In primary data analysis, it is well known that a critical assumption is of independent observations; that each case (e.g., person) is a random sample from the population independent of the likelihood of another participant being selected. In meta-analysis, this assumption is that each effect size in your analysis is independent from others; this assumption is usually considered satisfied if each study of a particular sample of individuals provides one effect size to your meta-analysis.

As you will quickly learn when coding effect sizes, this assumption is often violated—single studies often provide multiple effect sizes. This multitude of effect sizes from single studies creates nonindependence in metaanalytic datasets in that effect sizes from the same study (i.e., the same sample of individuals) cannot be considered independent.

These multiple effect sizes arise for various reasons, and the reason impacts how you handle these situations. The end goal of handling each type of nonindependence is to obtain one single effect size from each study for any particular analysis.

1. Multiple Effect Sizes from Multiple Measures

One potential source of multiple effect sizes from a single study is that the authors report multiple effect sizes based on different measures. For example, the study by Rys and Bear (1997) in the example meta-analysis of Table 8.1 provided effect sizes of the association between relational aggression and peer rejection based on a peer-report (corrected r = .556) and teacher-report (corrected r = .338) measures of relational aggression. Or a single study might examine an association at two distinct time points. For example, Werner and Crick (2004) studied children in second through fourth grades and then readministered measures to these same children approximately one year later, finding concurrent correlations between relational aggression and rejection of r = .479 and .458 at the first and second occasions, respectively.

In these situations, you have two options for obtaining a single effect size. The first option is to determine if one effect size is more central to your interests and to use only that effect size. This decision should be made in consultation with your study inclusion/exclusion criteria (see Chapter 3), and you should only reach this decision if it is clear that one effect size should be included whereas the other should not. Using the two example studies mentioned, I might choose one of the two measurement approaches of Rys and Bear (1997) if I had a priori decided that peer reports of relational aggression were more important than teacher reports (or vice versa). Or I might decide to use only the first measurement occasion of the study by Werner and Crick (2004) if something occurred after this first data collection so as to make the subsequent results less relevant for my meta-analysis (e.g., if they had implemented an intervention and I was only interested in the association between relational aggression and rejection in normative situations). These decisions should not be based on which effect size estimate best fits your hypotheses (i.e., do not simply choose the largest effect size); it is best if you can make this decision without looking at the value of the effect size.

The second, and likely more common, option is to average these multiple effect sizes. Here, you should compute the average effect size (see Equation 8.2) among these multiple effect sizes and use this average as your single effect size estimate for the study (if the effect size is one that is typically transformed, such as Z_r or ln(o), then you should average the transformed effect sizes).⁹ To illustrate, I combined the two effect sizes from Rys and Bear (1997) by converting both correlations (.556 and .338 for peer and teacher reports) to Z_r (.627 and .352) and then averaged these values to yield the Z_r = .489 shown in Table 8.1; I back-transformed this value to r = .454 for summary in this table. Similarly, I converted the correlations at times 1 and 2 from Werner and Crick (2004), r = .479 and .458 to Z_r = .522 and .495, and computed the average of these two, which is shown in Table 8.1 as Z_r = .509 (and the parallel r = .469). If Rys and Bear (1997) had more than two measurement approaches, or if Werner and Crick (2004) had more than two measurement occasions, I could compute the average of these three or more effect sizes in the same way to yield a single effect size per study.

2. Multiple Effect Sizes from Subsets of Participants

A second potential source of multiple effect sizes from a single study is that the effect sizes are separately reported for subgroups of the sample. For example, effect sizes might be reported separately by gender, ethnicity, or multiple treatment groups. If each of these groups should be included in your metaanalysis given your inclusion/exclusion criteria, then your goal is to compute an average effect size for these multiple groups.¹⁰ Two considerations distinguish this situation from that of the previous subsection, however. First, if you average effect sizes across multiple subgroups, your effective sample size for the study (used in computing the standard error for the study) is now the sum of the multiple combined groups. Second, the average in this situation should be a weighted average so that larger subgroups have greater contribution to the average than smaller subgroups.

To illustrate, a study by Hawley et al. (2007) used data from 407 boys and 522 girls, reporting information to compute effect sizes for boys (corrected r = .210 and Z_r = .214) and girls (corrected r = .122 and Z_r = .122), but not for the overall sample. To obtain one common effect size for this sample, I computed the weighted average effect size using Equation 8.2 to obtain the value Z_r = .162 (and r = .161) shown in Table 8.1. The standard error of this effect size is based on the total sample size, combining the sizes of the multiple subgroups (here, 407 + 522 = 929). It is important to note that this computed effect size is different from what would have been obtained if you could simply compute the effect size from the raw data. Specifically, this effect size from combined subgroups represents the association between the variables of interest controlling for the variable on which subgroups were created (in this example, gender). If you expect that this covariate control will—or even could—change the effect sizes (typically reduce them), then it would be useful to create a dichotomous variable for studies in which this method of combining subgroups was used for evaluation as a potential moderator (see Chapter 9).

It is also possible that some studies will report multiple effect sizes for multiple subgroups. In fact, the Rys and Bear (1997) study I described earlier actually reported effect sizes separately by measure of aggression and gender, so that the coded data consisted of correlations of peer-reported relational aggression with rejection for 132 boys (corrected r = .590, Z_r = .678) and 134 girls (corrected r = .520, Z_r = .577) and correlations of teacher-reported relational aggression with rejection for these boys (corrected r = .270, Z_r = .277) and girls (corrected r = .402, Z_r = .427). In this type of situation, I suggest a two-step process in which you average effect sizes first within groups and then across groups (summing the sample size in the second round of averaging). For this example of the Rys and Bear (1997) study, I would first average the effect sizes from peer and teacher reports within the 132 boys (yielding Z_r = .478), and then compute this same average within the 134 girls (yielding Z_r = .502). I would then compute the weighted average of these effect sizes across boys and girls, which produces the Z_r = .489 (and transformation to r = .454) shown in Table 8.1. You could also reverse the steps of this two- step process—in this example, first computing a weighted average effect size across gender for each of the two measures, and then averaging across the two measures (the order I took to produce the effect sizes described earlier)—to obtain the same results.

3. Effect Sizes from Multiple Reports of the Same Study

A third potential source of nonindependence is when data from the same study are disseminated in multiple reports (e.g., multiple publications, a dissertation that is later published). It is important to keep in mind that when I refer to a single effect size per study, I mean one effect size per sample of participants. Therefore, the multiple reports that might arise from a single primary dataset should be treated as a single study. If the two reports provide different effect size estimates (presumably due to analysis of different measures, rather than a miscalculation in one or the other report), then you should average these as I described earlier. If the two reports provide some overlapping effect size estimates (e.g., the two reports both provide the correlation between relational aggression and rejection; both reports provide a Time 1 correlation but the second report also contains the Time 2 correlation), these repetitive values should be omitted.

Unfortunately, the uncertainty that arises from this sort of multiple reporting is greater than I have described here. Often, it is unclear if authors of separate reports are using the same dataset. In this situation, I recommend comparing the descriptions of methods carefully and contacting the authors if you are still uncertain. Similarly, authors might report results that seem to come from the full sample in one report and only a subset in another. Here, I suggest selecting values from the full sample when effect sizes are identical. Having made these suggestions, I recognize that every meta-analyst is likely to come across unique situations. As with much of my previous advice on these difficult issues, I strongly suggest contacting the authors of the reports to obtain further information.

Source: Card Noel A. (2015), Applied Meta-Analysis for Social Science Research, The Guilford Press; Annotated edition.

Meta Analysis

Practical Matters: Nonindependence among Effect Sizes

1. Multiple Effect Sizes from Multiple Measures

2. Multiple Effect Sizes from Subsets of Participants

3. Effect Sizes from Multiple Reports of the Same Study

Leave a Reply Cancel reply

1. Multiple Effect Sizes from Multiple Measures

2. Multiple Effect Sizes from Subsets of Participants

3. Effect Sizes from Multiple Reports of the Same Study

Leave a Reply Cancel reply

Login