The sample variance

is the point estimator of the population variance s^{2}. In using the sample variance as a basis for making inferences about a population variance, the sampling distribution of the quantity (n – 1)s^{1}/s^{2} is helpful. This sampling distribution is described as follows.

Figure 11.1 shows some possible forms of the sampling distribution of (n − 1)s2/σ2.

Because the sampling distribution of (n – 1)s ^{2}/σ^{2} is known to have a chi-square distribution whenever a simple random sample of size n is selected from a normal population, we can use the chi-square distribution to develop interval estimates and conduct hypothesis tests about a population variance.

**1. Interval Estimation**

To show how the chi-square distribution can be used to develop a confidence interval estimate of a population variance σ^{2}, suppose that we are interested in estimating the population variance for the production filling process mentioned at the beginning of this chapter. A sample of 20 containers is taken, and the sample variance for the filling quantities is found to be s^{2} = .0025. However, we know we cannot expect the variance of a sample of 20 containers to provide the exact value of the variance for the population of containers filled by the production process. Hence, our interest will be in developing an interval estimate for the population variance.

We will use the notation x2a_{ }to denote the value for the chi-square distribution that provides an area or probability of a to the right of the x2a value. For example, in Figure 11.2 the chi-square distribution with 19 degrees of freedom is shown with X_{025} = 32.852 indicating that 2.5% of the chi-square values are to the right of 32.852, and X^{2}975 = 8.907 indicating that 97.5% of the chi-square values are to the right of 8.907. Tables of areas or probabilities are readily available for the chi-square distribution. Refer to Table 11.1 and verify that these chi-square values with 19 degrees of freedom (19th row of the table) are correct. Table 3 of Appendix B provides a more extensive table of chi-square values.

From the graph in Figure 11.2 we see that .95, or 95%, of the chi-square values are between x^{2}_{975} and x^{2}_{025}. That is, there is a .95 probability of obtaining a X^{2} value such that

We stated in expression (11.2) that (n – 1)s^{2}/σ^{2} follows a chi-square distribution; therefore we can substitute (n – 1)s^{2}/σ^{2} for x^{2} and write

In effect, expression (11.3) provides an interval estimate in that .95, or 95%, of all possible values for (n – 1)s^{2}/σ^{2} will be in the interval x^{2}_{975} to x^{2}_{025}. We now need to do some algebraic manipulations with expression (11.3) to develop an interval estimate for the population variance σ^{2}. Working with the leftmost inequality in expression (11.3), we have

Performing similar algebraic manipulations with the rightmost inequality in expression (11.3) gives

The results of expressions (11.4) and (11.5) can be combined to provide

Because expression (11.3) is true for 95% of the (n – 1) s^{2}/σ^{2} values, expression (11.6) provides a 95% confidence interval estimate for the population variance s^{2}.

Let us return to the problem of providing an interval estimate for the population variance of filling quantities. Recall that the sample of 20 containers provided a sample variance of s^{2} = .0025. With a sample size of 20, we have 19 degrees of freedom. As shown in Figure 11.2, we have already determined that x^{2}_{975} = 8.907 and x^{2}_{025} = 32.852. Using these values in expression (11.6) provides the following interval estimate for the population variance.

Taking the square root of these values provides the following 95% confidence interval for the population standard deviation.

Thus, we illustrated the process of using the chi-square distribution to establish interval estimates of a population variance and a population standard deviation. Note specifically that because x^{2}_{975} and x^{2}_{025} were used, the interval estimate has a .95 confidence coefficient. Extending expression (11.6) to the general case of any confidence coefficient, we have the following interval estimate of a population variance.

**2. Hypothesis Testing**

Using s0 to denote the hypothesized value for the population variance, the three forms for a hypothesis test about a population variance are as follows:

These three forms are similar to the three forms used to conduct one-tailed and two-tailed hypothesis tests about population means and proportions.

The procedure for conducting a hypothesis test about a population variance uses the hypothesized value for the population variance sO and the sample variance s^{2} to compute the value of a X^{2} test statistic. Assuming that the population has a normal distribution, the test statistic is as follows:

After computing the value of the x^{2} test statistic, either the p-value approach or the critical value approach, may be used to determine whether the null hypothesis can be rejected.

Let us consider the following example. The St. Louis Metro Bus Company wants to promote an image of reliability by encouraging its drivers to maintain consistent schedules. As a standard policy, the company would like arrival times at bus stops to have low variability. In terms of the variance of arrival times, the company standard specifies an arrival time variance of 4 or less when arrival times are measured in minutes. The following hypothesis test is formulated to help the company determine whether the arrival time population variance is excessive.

In tentatively assuming H_{0} is true, we are assuming that the population variance of arrival times is within the company guideline. We reject H_{0} if the sample evidence indicates that the population variance exceeds the guideline. In this case, follow-up steps should be taken to reduce the population variance. We conduct the hypothesis test using a level of significance of a = .05.

Suppose that a random sample of 24 bus arrivals taken at a downtown intersection provides a sample variance of s^{2} = 4.9. Assuming that the population distribution of arrival times is approximately normal, the value of the test statistic is as follows.

The chi-square distribution with n − 1 = 24 − 1 = 23 degrees of freedom is shown in Figure 11.3. Because this is an upper tail test, the area under the curve to the right of the test statistic x2 = 28.18 is the p-value for the test.

Like the t distribution table, the chi-square distribution table does not contain sufficient detail to enable us to determine the p-value exactly. However, we can use the chi-square distribution table to obtain a range for the p-value. For example, using Table 11.1, we find the following information for a chi-square distribution with 23 degrees of freedom.

Because x2 = 28.18 is less than 32.007, the area in upper tail (the p-value) is greater than .10. With the p-value . a = .05, we cannot reject the null hypothesis. The sample does not support the conclusion that the population variance of the arrival times is

excessive.

Because of the difficulty of determining the exact p-value directly from the chi-square distribution table, statistical software is helpful. Appendix F, at the back of the book, describes how to compute p-values using JMP or Excel. In the appendix, we show that the exact p-value corresponding to x2 = 28.18 is .2091.

As with other hypothesis testing procedures, the critical value approach can also be used to draw the hypothesis testing conclusion. With a = .05, x2 .05 provides the critical value for the upper tail hypothesis test. Using Table 11.1 and 23 degrees of freedom, X_{05} = 35.172. Thus, the rejection rule for the bus arrival time example is as follows:

Because the value of the test statistic is X^{2} = 28.18, we cannot reject the null hypothesis.

In practice, upper tail tests as presented here are the most frequently encountered tests about a population variance. In situations involving arrival times, production times, filling weights, part dimensions, and so on, low variances are desirable, whereas large variances are unacceptable. With a statement about the maximum allowable population variance, we can test the null hypothesis that the population variance is less than or equal to the maximum allowable value against the alternative hypothesis that the population variance is greater than the maximum allowable value. With this test structure, corrective action will be taken whenever rejection of the null hypothesis indicates the presence of an excessive population variance.

As we saw with population means and proportions, other forms of hypothesis tests can be developed. Let us demonstrate a two-tailed test about a population variance by considering a situation faced by a bureau of motor vehicles. Historically, the variance in test scores for individuals applying for driver’s licenses has been s^{2} = 100. A new examination with new test questions has been developed. Administrators of the bureau of motor vehicles would like the variance in the test scores for the new examination to remain at the historical level. To evaluate the variance in the new examination test scores, the following two-tailed hypothesis test has been proposed.

Rejection of H_{0} will indicate that a change in the variance has occurred and suggest that some questions in the new examination may need revision to make the variance of the new test scores similar to the variance of the old test scores. A sample of 30 applicants for driver’s licenses will be given the new version of the examination. We will use a level of significance a = .05 to conduct the hypothesis test.

The sample of 30 examination scores provided a sample variance s^{2} = 162. The value of the chi-square test statistic is as follows:

Now, let us compute the p-value. Using Table 11.1 and n – 1 = 30 – 1 = 29 degrees of freedom, we find the following.

Thus, the value of the test statistic x^{2} = 46.98 provides an area between .025 and .01 in the upper tail of the chi-square distribution. Doubling these values shows that the two-tailed p-value is between .05 and .02. Statistical software can be used to show the exact p-value = .0374. With p-value < a = .05, we reject H_{0} and conclude that the new examination test scores have a population variance different from the historical variance of σ^{2} = 100. A summary of the hypothesis testing procedures for a population variance is shown in Table 11.2.

Source: Anderson David R., Sweeney Dennis J., Williams Thomas A. (2019), *Statistics for Business & Economics*, Cengage Learning; 14th edition.

30 Aug 2021

30 Aug 2021

31 Aug 2021

31 Aug 2021

31 Aug 2021

30 Aug 2021