One-Sample Tests by using Stata

One-sample t tests have two seemingly different applications:

Testing whether a sample mean y differs significantly from an hypothesized value p₀ .
Testing whether the means of yj and y₂ , two variables measured over the same set of observations, differ significantly from each other. This is equivalent to testing whether the mean of a difference score variable created by subtracting y₁ from y₂ equals zero.

We use essentially the same formulas for either application, although the second starts with information on two variables instead of one.

The data in writing.dta were collected to evaluate a college writing course based on word processing (Nash and Schwartz 1987). Measures such as the number of sentences completed in timed writing were collected both before and after students took the course. The researchers wanted to know whether the post-course measures showed improvement.

Suppose that we knew that students in previous years were able to complete an average of 10 sentences. Before examining whether the students in writing.dta improved during the course, we might want to learn whether at the start of the course they were essentially like earlier students — in other words, whether their pre-test (preS) mean differs significantly from the mean of previous students (10). To see a one-sample t test of H₀:p = 10, type

The notation Pr(T < t) means “probability of a t-distribution value less than the observed t, if ^H0 were true”— that is, the one-tail test probability. The two-tail probability of a greater absolute t appears as Pr(|T| > |t|) = 0.4084 . Because this probability is high, we have no reason to reject H₀:p = 10. Note that ttest automatically provides a 95% confidence interval for the mean, and this confidence interval includes the null-hypothesis value 10. We could see a different confidence interval, such as 90%, by adding a level(90) option to this command.

A nonparametric counterpart, the sign test, employs the binomial distribution to test hypotheses about single medians. For example, we could test whether the median of preS equals 10. signtest gives us no reason to reject that null hypothesis either.

Like ttest, signtest includes right-tail, left-tail, and two-tail probabilities. Unlike the symmetrical t distributions used by ttest, however, the binomial distributions used by signtest have different left- and right-tail probabilities. In this example, only the two-tail probability matters because we were testing whether the writing.dta students “differ” from the null- hypothesis median of 10.

Next, we can test for improvement during the course by testing the null hypothesis that the mean number of sentences completed before and after the course (that is, the means ofpreS and postS) are equal. The ttest command accomplishes this as well, finding a significant improvement.

Because we expect “improvement,” not just “difference” between the preS and postS means, a one-tail test is appropriate. The displayed right-tail probability rounds off to zero. Students’ mean sentence completion does significantly improve. Based on this sample, we are 95% confident that it improves by between 12.7 and 18.4 sentences.

t tests ordinarily assume that variables are normally distributed around their group means. This assumption usually is not critical because the tests are moderately robust. When nonnormality involves severe outliers, however, or occurs in small samples, we might be safer turning to medians instead of means and employing a nonparametric test that does not assume normality. The Wilcoxon signed-rank test, for example, assumes only that the distributions are symmetrical and continuous. Applying a signed-rank test to these data yields essentially the same conclusion as ttest: that students’ sentence completion significantly improved. Because both tests agree on this conclusion, we can state it with more assurance.

Source: Hamilton Lawrence C. (2012), Statistics with STATA: Version 12, Cengage Learning; 8th edition.

Leave a Reply Cancel reply