Measurement and Descriptive Statistics with SPSS: The Normal Curve

Figure 3.10 is an example of a normal curve. The frequency distributions of many of the variables used in the behavioral sciences are distributed approximately as a normal curve when N is large. Examples of such variables that approximately fit a normal curve are height, weight, intelligence, and many personality variables. Notice that for each of these examples, most people would fall toward the middle of the curve, with fewer people at the extremes. If the average height of men in the United States was 5’10”, then this height would be in the middle of the curve. The heights of men who are taller than 5’10” would be to the right of the middle on the curve, and those of men who are shorter than 5’10” would be to the left of the middle on the curve, with only a few men 7 feet or 5 feet tall.

Fig. 3.10. Frequency distribution and areas under the normal curve

The normal curve can be thought of as derived from a frequency distribution. It is theoretically formed from counting an “infinite” number of occurrences of a variable. Usually, when the normal curve is depicted, only the X axis (horizontal) is shown. To determine how a frequency distribution is obtained, you could take a fair coin, flip it 10 times, and record the number of heads on this first set or trial. Then flip it another 10 times and record the number of heads. If you had nothing better to do, you could do 100 trials. After performing this task, you could plot the number of times that the coin turned up heads out of each trial of 10. What would you expect? Of course, the largest number of trials probably would show 5 heads out of 10. There would be very few, if any, trials where 0, 1, 9, or 10 heads occurred. It could happen, but the probability is quite low, which brings us to a probability distribution. If we performed this experiment 100 times, or 1,000 times, or 1,000,000 times, the frequency distribution would “fill in” and look more and more like a normal curve.

Properties of the Normal Curve

The normal curve has five properties that are always present.

  1. The normal curve is unimodal. It has one “hump,” and this hump is in the middle of the distribution. The most frequent value is in the middle.
  2. The mean, median, and mode are equal.
  3. The curve is symmetric. If you fold the normal curve in half, the right side would fit perfectly with the left side; that is, it is not skewed.
  4. The range is infinite. This means that the extremes approach but never touch the X axis.
  5. The curve is neither too peaked nor too flat and its tails are neither too short nor too long; it has no Its proportions are like those in Fig 3.10.

Non-Normally Shaped Distributions

Skewness. If one tail of a frequency distribution is longer than the other, and if the mean and median are different, the curve is skewed. Because most common inferential statistics (e.g., t test) assume that the dependent variable is normally distributed (the data are normal), it is important that we know whether our variables are highly skewed.

Figure 3.2 showed a frequency distribution that is skewed to the left. This is called a negative skew. A perfectly normal curve has a skewness of zero (0.0). The curve in Fig. 3.2, for the competence scale, has a skewness statistic of -1.63, which indicates that the curve is quite different from a normal curve. We use a somewhat arbitrary guideline that if the skewness is more than +1.0 or less than -1.0, the distribution is markedly skewed and it would be prudent to use a nonparametric (ordinal type) statistic (or to transform the variable). However, some parametric statistics, such as the two-tailed t test and ANOVA, are quite robust, so even a skewness of more than +/-1 may not change the results much. We provide more examples and discuss this more in Chapter 4.

Kurtosis. If a frequency distribution is more peaked than the normal curve shown in Fig. 3.10, it is said to have positive kurtosis and is called leptokurtic. Note in Fig 3.1 that the scholastic aptitude test – math histogram is peaked (i.e., the bar for 500 extends above the normal curve line), and thus there is some positive kurtosis. If a frequency distribution is relatively flat with heavy tails, it has negative kurtosis and is called platykurtic. Although the program can easily compute a kurtosis value for any variable using an option in the Frequencies command, usually we do not do so because kurtosis does not seem to affect the results of most statistical analyses very much.

Areas Under the Normal Curve

The normal curve is also a probability distribution. Visualize that the area under the normal curve is equal to 1.0. Therefore, portions of this curve could be expressed as fractions of 1.0. For example, if we assume that 5’10” is the average height of men in the United States, then the probability of a man being 5’10” or taller is .5. The probability of a man being over 6’3″ or less than 5’5″ is considerably smaller. It is important to be able to conceptualize the normal curve as a probability distribution because statistical convention sets acceptable probability levels for rejecting the null hypothesis at .05 or .01. As we shall see, when events or outcomes happen very infrequently, that is, only 5 times in 100 or 1 time in 100 (way out in the left or right tail of the curve), we wonder if they belong to that distribution or perhaps to a different distribution. We come back to this point later in the book.

All normal curves, regardless of whether they are narrow or spread out, can be divided into areas or units in terms of the standard deviation. Approximately 34% of the area under the normal curve is between the mean and one standard deviation above or below the mean (see Fig 3.10 again). If we include both the area to the right and to the left of the mean, 68% of the area under the normal curve is within one standard deviation from the mean. Another approximately 13.5% of the area under the normal curve is accounted for by adding a second standard deviation to the first standard deviation. In other words, two standard deviations to the right of the mean account for an area of approximately 47.5%, and two standard deviations to the left and right of the mean make up an area of approximately 95% of the normal curve. If we were to subtract 95% from 100%, the remaining 5% relates to that ever-present probability or p value of 0.05 needed for statistical significance. Values not falling within two standard deviations of the mean are seen as relatively rare events.

The Standard Normal Curve

All normal curves can be converted into standard normal curves by setting the mean equal to zero and the standard deviation equal to one. Because all normal curves have the same proportion of the curve within one standard deviation, two standard deviations, and so forth of the mean, this conversion allows comparisons among normal curves with different means and standard deviations. Figure 3.10, the normal distribution, has the standard normal distribution units underneath. These units are referred to as z scores. If you examine the normal curve table in any statistics book, you can find the areas under the curve for one standard deviation (z = 1), two standard deviations (z = 2), and so on. As described in Appendix A, it is easy to convert raw scores into standard scores. This is often done when one wants to aggregate or add together several scores that have quite different means and standard deviations.

Source: Morgan George A, Leech Nancy L., Gloeckner Gene W., Barrett Karen C.

(2012), IBM SPSS for Introductory Statistics: Use and Interpretation, Routledge; 5th edition; download Datasets and Materials.

Leave a Reply

Your email address will not be published. Required fields are marked *