Multiple Regression of Experimental Data

Thus far, we have dealt with simple linear regression, which is adequate for most lab experiments required in college course work. For research work, one-factor-at-a-time experiments are inadequate. It is now time for us to reflect on how close—or how far away—the reality of the experimental situation is to the results obtained by simple linear regression.

  1. It has often been pointed out in this book that a single, independent variable acting alone on a dependent variable is quite an idealized situation; reality is far from it.
  2. Because there is one independent variable affect­ing the dependent variable, we can represent their mutual relation as a number of points, each with two coordinates, x and y, in a two-dimensional space.
  3. If more than one independent variable affects the dependent variable, we need, for representation on the same basis, three or more dimensions.
  4. We made the assumption that the points repre­senting the mutual relation of x and y are scattered around a straight line. To the extent that the points do not touch the straight line, we attribute the discrepancy to “experimental error” or to sacri­fices made for simplification.
  5. The straight line representing the relation of x and y is taken only as an estimate.
  6. The reason for selecting the straight line, instead of a curve, which is more likely, is that we can use the polynomial of the first degree, thus avoiding the need to deal with an exponent of x that is either less or more than 1.0; this, in turn, makes the computation of the function of y One could argue that thus restricting the exponent to 1.0 is not justified in view of the apparently unlimited computing capacity now available.

These considerations, besides others, serve as indicators of the limitations of linear regression. Even so, linear regressions are often useful for analyzing situations with two independent vari­ables. The functional form of the equation, then, conforming to linear regression, is

In this form, the function is referred to as multiple regression, meaning two or more independent variables.

The solution of this function consists of finding the numeri­cal values for a, b, and p, conforming to the following:

When different values of x1 and X2 are substituted, the corre­sponding y values should represent the known values of the dependent variable. Further, and more importantly, the equation should become the instrument to estimate the value of the dependent variable, y, for any arbitrarily chosen values of x1 and x2, within the ranges of these variables as required in the experi­ment. The above equation can be solved considering p as the y intercept and a and b as the slopes of the two straight lines, each conforming to the least-square method. The regression function will then be represented by a plane in a three-dimensional space, whose coordinates are y, X1, and X2. And, the plane will stretch between the two straight lines, somewhat like the fabric between two consecutive spokes of an umbrella. Solutions of such and more involved problems require the help of computer software programs, a host of which are now available, such as Statistical Package for the Social Sciences (SPSS), Biomedical Computer Programs: P Series (BMDP), and Statistical Analysis Systems (SAS). Considering (19.13), several combinations—usually a limited but sufficient number—of the numerical values of y, xp and X2, in the required order, are input to the computer. The processed output typically contains the numerical values of a, b, and p, often supplemented with such information as the standard devia­tion and regression coefficient of the variables, the coefficient of determination, and the standard error of the estimation done.

Going another step further, we may think of the situation of three independent variables, x1, x2, and x3, together causing effect on the dependent variable. Restricting the analysis to linear regression, we may state such a functional relation as

In this case, we reach, in fact, go beyond the limits of linear regression. This is so because, in place of the plane with the previ­ous situation of two independent variables, we now need a three­dimensional volume to represent the result of the linear regres­sion. Even more formidable, we need to have a space of four dimensions in which such a volume should be placed.

The obvious question is, What do we resort to when we have three or more independent variables? For the answer and the applications, the reader is advised to revisit Chapter 9 of this book.

Source: Srinagesh K (2005), The Principles of Experimental Research, Butterworth-Heinemann; 1st edition.

Leave a Reply

Your email address will not be published. Required fields are marked *