Histograms as Total Quality Tool

Histograms are used to chart frequency of occurrence. How often does something happen? Any discussion of histograms must begin with an understanding of the two kinds of data commonly associated with processes: attributes and variables data. Although they were not introduced as such, both kinds of data have been used in the illustrations of this chapter. An attribute is something that the output product of the process either has or does not have. From one of the examples (Figure 15.6), either an electronic assembly had wiring errors or it did not. Another example (see Figure 15.30) shows that either an assembly had broken screws or it did not. These are attributes. The example of making shafts of a specified length (Figures 15.11 and 15.12) was concerned with
measured data. That example used shaft length measured in thousandths of an inch, but any scale of measurement can be used, as appropriate for the process under scrutiny. A process used in making electrical resistors would use the scale of electrical resistance in ohms, another process might use a weight scale, and so on. Variables data are something that results from measurement.

Using the shaft example again, an all-too-common scenario in manufacturing plants would have been to place a Go-No Go screen at the end of the process, accepting all shafts between the specification limits of 1.120 and 1.130 in. and discarding the rest. Data might have been recorded to keep track of the number of shafts that had to be scrapped. Such a record might have looked like Figure 15.14, based on the original data.

Figure 15.14 would tell us what we wanted to know if we were interested only in the number of shafts accepted versus the number rejected. Looking at the shaft process in this way, we are using attributes data: either they passed or they failed the screening. This reveals only that we are scrapping between 3 and 4% of all the shafts made. It does not tell us anything about the process adjustment that may be contributing to the scrap rate. Nor does it tell us anything about how robust the process is—might some slight change push the process over the edge? For that kind of insight, we need variables data.

One can gain much more information about a process when variables data are available. The check sheet of Figure 15.12 shows that both of the rejects (out-of-limits shafts) were on the low side of the specified tolerance. The peak of the histogram seems to occur between 1.123 and 1.124 in. If the machine were adjusted to bring the peak up to 1.125 in., some of the low-end rejects might be eliminated without causing any new rejects at the top end. The frequency distribution also suggests that the process as it stands now will always have occasional rejects—probably in the 2 to 3% range at best.

1. Potential Trap with Histograms

Be aware of a potential trap when using histograms. The histogram is nothing more than a measurement scale across one axis (usually the x-axis) and frequency of like measurements on the other. (Histograms are also called frequency distribution diagrams.) The trap occurs when measurements are taken over a long period of time. Too many things can affect processes over time: wear, maintenance, adjustment, material differences, operator influence, and environmental influence. The histogram makes no allowance for any of these factors. It may be helpful to consider a histogram to be the equivalent of a snapshot of the process performance. If the subject of a photograph is moving, the photographer must use a fast shutter speed to prevent a blurred image. If the histogram data are not collected over a suitably short period of time, the result will be blurred, just as if the camera’s shutter is too slow for the action taking place, because it is possible that the process’s performance changes over time.

Blurred photographs and blurred histograms are both useless. A good histogram will show a crisp snapshot of process performance as it was at the time the data were taken, not before and not after. This leads some people to claim that histograms should be used only on processes that are known to be in control. (See the section on control charts later in this chapter.)

That limitation is not necessary as long as you understand that histograms have this inherent flaw. Be careful that any interpretation you make has accounted for time and its effect on the process you are studying. For example, we do not know enough about the results of the shaft-making process from Figure 15.12 to predict with any certainty that it will do as well next week. We don’t know that a machine operator didn’t tweak the machine two or three times during the week, trying to find the center of the range. What happens if that operator is on vacation next week? Would we dare predict that performance will be the same? We can make these predictions only if we know the process is statistically in control; thus, the warnings. Taking this into consideration, the histogram in Figure 15.12 provides valuable information.

2. Histograms and Statistics

Understanding a few basic facts is fundamental to the use of statistical techniques for quality and process applications. We have said that all processes are subject to variability, or variation. There are many examples of this. One of the oldest and most graphically convincing is the Red Bead experiment Dr. Deming regularly used in his seminars. This involves a container with a large number of beads. The beads are identical except for the color. Suppose there are 900 white beads and 100 red beads, making a total of 1,000. The beads are mixed thoroughly (Step 1). Then 50 beads are drawn at random as a sample (Step 2). The red beads in the sample are counted. A check mark is entered in a histogram column for that number. All the beads are put back into the container, and they are mixed again (Step 3). When you repeat these steps a second time, the odds are that a different number of red beads will be drawn. When a third sample is taken, it will probably contain yet another number of red beads. The process (steps 1, 2, and 3) has not changed, yet the output of the process does change. This is process variation or variability. If these steps are repeated over and over until a valid statistical sampling has been taken, the resulting histogram will invariably take on the characteristic bell shape common to process variability (see Figure 15.15).

It is possible to calculate the process variability from the data. The histogram in Figure 15.15 was created from 100 samples of 50 beads each. The data were as shown in Figure 15.16.

The flatter and wider the frequency distribution curve, the greater the process variability. The taller and narrower the curve, the lesser the process variability. Even though the variability may change from process to process, it would be helpful to have a common means of measuring, discussing, or understanding variability. Fortunately, we do. To express the process’s variability, we need to know only two things, both of which can be derived from the process’s own distribution data: standard deviation and mean. Standard deviation is represented by the lowercase Greek letter sigma (s) and indicates a deviation from the average, or mean, value of the samples in the data set. The mean is represented by the Greek letter mu (m). In a normal histogram, m is seen as a vertical line from the peak of the bell curve to the base, and it is the line from which deviation is measured, minus to the left of m and plus to the right. Standard deviation (s) is normally plotted at -3s, -2s, and -1s (left of m) and +1s, +2s, and +3s (right of m); refer to Figure 15.18. Because mean and standard deviation are always derived from data from the process in question, standard deviation has a constant meaning from process to process. From this, we can tell what the process can do in terms of its statistical variability (assuming that it remains stable and no changes are introduced):

26% of all sample values will be found between +1s and -1s.
46% of all sample values will be found between +2s and -2s.
73% of all sample values will be found between +3s and -3s.
9999998% of all sample values will be found between +6s and -6s.

Note: As we discussed in Chapter 1, Six Sigma practitioners use 99.99966% rather than the actual statistical value.

This information has a profound practical value, as we shall see as we develop the discussion.

In order to calculate the process mean value (m) and standard deviation (s), we must first use the raw process data from Figure 15.16 to develop the information required for those calculations. As we develop the information, we will post it in the appropriate columns of Figures 15.17a, b, and c.

Columns 1 and 2 of Figures 15.17a, b, and c contain the measured raw data from the colored bead process from Figure 15.16. Column 1 lists the number of red beads possible to be counted (from 0 to 10) in the various samples. Column 2 lists the number of samples that contained the corresponding number of red beads. The number of samples in column 2 is totaled, yielding n = 100.

3. Calculating the Mean

For a histogram representing a truly normal distribution between ± infinity, the mean value would be a vertical line to the peak of the bell curve. Our curve is slightly off normal because we are using a relatively small sample, so the mean (m) must be calculated. The equation for m is

where X is the product of the number of red beads in a sample times the number of samples containing that number of red beads, or for Figure 15.17a, the product of columns 1 and 2. We calculate column 3 of Figure 15.17b.

Now that we have the X values, we simply add them up to give us the sum of the X values (2X). The figure tells us that n = 100 and 2X = 510. Using the equation for m,

The mean (m) is placed at 5.1 on the histogram’s x-axis, and all deviations are measured relative to that. (See Figure 15.18.)

4. Calculating Standard Deviation (s)

To understand the process’s variability, we must know its standard deviation. The formula for standard deviation is

where

d = the deviation of any unit from the mean

n = the number of units sampled

We already have the value of n (100), but we have not calculated the values of d, d², or 2d². We will perform these calculations and post the information in the remaining three columns of Figure 15.17c. The values of the deviations (d) are determined by subtracting m (5.1) from each of the red bead values (0 through 10) of column 1. The first entry in column 4 (deviation from m) is determined by subtracting m from the value in column 1, that is, 0 — 5.1 = — 5.1 Similarly, the second entry in column 4 is the value of column 1 at the 1-bead row minus m, or 1 — 5.1 = —4.1.

Repeating this process through the 10-bead row completes the deviation column.

Column 5 of Figure 15.17c is simply a list of the column 4 deviation values squared. For example, in the 0-bead row, column 4 shows d = — 5.1. Column 5 lists the square of —5.1, or 26.01. The 1-bead row has d = — 4.1. Column 5 lists its square, 16.81. This process is continued through the 10-bead row to complete column 5 of the figure.

Column 6 of Figure 15.17c lists the results of the squared deviations (column 5) multiplied by the number of samples at the corresponding deviation value (column 2). For the column 6 entry at the 0-bead row, we multiply 0 (from column 2) by 26.01 (from column 5); since, 0 X 26.01 = 0, 0 is entered in column 6. For the 1-bead row, we multiply 1 by 16.81; 16.81 is the second entry in column 6. At the 2-bead row, we multiply 3 by 9.61 and enter 28.83 in column 6. This process is repeated through the remaining rows of the figure.

Next we add column 6’s entries to obtain the sum of the squared deviations, 2d². 2d² for our bead process experiment is 221.

Now we have all the information we need to calculate the standard deviation (s) for our process.

Note: Calculations are to two decimal places.

Next calculate the positions of m ± 1s, 2s, and 3s.

s = 1.49 2s = 2.99 3s = 4.47

These values are entered in Figure 15.15 to create Figure 15.18:

Suppose we have a process that is operating like the curve in Figure 15.18. We have specifications for the product output that require us to reject any part below 3.6 and above 6.6. It turns out that these limits are approximately ±1s. We know immediately that about one-third of the process output will be rejected. (Refer to the first bullet in the list on page 240.) If this is not acceptable, which is highly probable, we will have to improve the process or change to a completely different process. Even if more variation could be tolerated in the product and we took the specification limits out to 2 and 8, about 5 of every 100 pieces flowing out of the process would still be rejected. In a competitive world, this is poor performance indeed. Many companies no longer consider 2,700 parts per million defective (±3s) to be good enough. A growing number of organizations are seeking the Motorola version of Six Sigma quality performance. These companies target a defect rate of 3.4 nonconformances per million opportunities (NPMO) for nonconformance. Technically speaking, 3.4 NPMO is not very close to the statistically pure 6-sigma rate of 0.002 per million opportunities, or 1 nonconformance in 500 million. (We explain this difference in the Six Sigma section of Chapter 19.) Although the popular Six Sigma does not match the true 6-sigma, 3.4 NPMO is a remarkable achievement. Whatever the situation, with this statistical sampling tool properly applied, there is no question about what can be achieved with any process because you will be able to predict the results.

5. Shapes of Histograms

Consider the shape of some histograms and their position relative to specification limits. Figure 15.19 is a collection of histograms. Histogram A represents a normal distribution. So does B, except it is shallower. The difference between the process characteristics of these two histograms is that process A is much tighter, whereas the looser process B will have greater variances. Process A is usually preferred. Processes C and D are skewed left and right, respectively. Although the curves are normal, product will be lost because the processes are not centered. Process E is bimodal. This can result from two batches of input material, for example. One batch produces the left bell curve, and the second batch the curve on the right. The two curves may be separated for a better view of what is going on by stratifying the data by batch. (See the “Stratification” section later in this chapter.)

Histogram F suggests that someone is discarding the samples below and above a set of limits. This typically happens when there is a 100% inspection and only data that are within limits are recorded. The strange Histogram G might have used data from incoming inspection. The message here is that the vendor is screening the parts and someone else is getting the best ones. A typical case might be electrical resistors that are graded as 1, 5, and 10% tolerance. The resistors that met 1 and 5% criteria were screened out and sold at a higher price. You got what was left.

Histogram H shows a normal distribution properly centered between a set of upper and lower control limits. Histograms I and J illustrate what happens when the same normal curve is allowed to shift left or right, respectively. There will be a significant loss of product as a control limit intersects the curve higher up its slope.

Histograms K through P show a normal, centered curve that went out of control and drifted. Remember that histograms do not account for time and you must, therefore, be careful about making judgments. If all the data that produced Histograms K through P were averaged, or even if all the data were combined to make a single histogram, you could be misled. You would not know that the process was drifting. Plotting a series of histograms over time, such as K through P, clearly illustrates any drift right or left, shallowing of the bell, and the like.

The number of samples or data points has a bearing on the accuracy of the histogram, just as with other tools. But with the histogram, there is another consideration: How does one determine the proper number of intervals for the chart? (The intervals are, in effect, the data columns of the histogram.) For example, Figure 15.15 is set up for 11 intervals: 0, 1, 2, and so on. The two outside intervals are not used, however, so the histogram plots data in nine intervals. The rule of thumb is as follows:

It is not necessary to be very precise with this. These methods are used to get close and adjust one way or the other for a fit with your data.

Suppose we are using steel balls in one of our products and the weight of the ball is critical. The specification is 5 ± 0.2 grams. The balls are purchased from a vendor, and because our tolerance is tighter than the vendor’s, we weigh the balls and use only those that meet our specification. The vendor is trying to tighten its tolerance and has asked for assistance in the form of data. Today 60 balls were received and weighed. The data were plotted on a histogram. To give the vendor the complete information, a histogram with intervals every 0.02 gram is established.

Figure 15.20 does not look much like a bell curve because we have tried to stretch a limited amount of data (60 observations) too far. There are 23 active or skipped intervals. Our rule of thumb suggests 5 to 7 intervals for less than 75 observations. If the same data were plotted into a histogram of 6 intervals (excluding the blank), it would look like Figure 15.21. At least in this version, it looks like a histogram. With more data—say, 100 or more observations—one could narrow the intervals and get more granularity. Don’t try to stretch data too thin because the conversion to real information can become difficult and risky.

Source: Goetsch David L., Davis Stanley B. (2016), Quality Management for organizational excellence introduction to total Quality, Pearson; 8th edition.

2 thoughts on “Histograms as Total Quality Tool”

graliontorile says:
Hi there! Quick question that’s totally off topic. Do you know how to make your site mobile friendly? My website looks weird when browsing from my iphone. I’m trying to find a template or plugin that might be able to resolve this problem. If you have any suggestions, please share. Thanks!

19/08/2022 at 02:58
Zackary Mccallough says:
I don’t even know how I ended up here, but I thought this post was good. I don’t know who you are but certainly you are going to a famous blogger if you aren’t already 😉 Cheers!

23/09/2022 at 16:34

Quality management