Basic Concepts of Structural Equation Modeling

1. Latent versus Observed Variables

In the behavioral sciences, researchers are often interested in studying theoretical constructs that cannot be observed directly. These abstract phe­nomena are termed latent variables, or factors. Examples of latent variables in psychology are self-concept and motivation; in sociology, powerless­ness and anomie; in education, verbal ability and teacher expectancy; in economics, capitalism and social class.

Because latent variables are not observed directly, it follows that they cannot be measured directly. Thus, the researcher must operationally define the latent variable of interest in terms of behavior believed to repre­sent it. As such, the unobserved variable is linked to one that is observable, thereby making its measurement possible. Assessment of the behavior, then, constitutes the direct measurement of an observed variable, albeit the indirect measurement of an unobserved variable (i.e., the underlying construct). It is important to note that the term behavior is used here in the very broadest sense to include scores on a particular measuring instru­ment. Thus, observation may include, for example, self-report responses to an attitudinal scale, scores on an achievement test, in vivo observation scores representing some physical task or activity, coded responses to interview questions, and the like. These measured scores (i.e., measure­ments) are termed observed or manifest variables; within the context of SEM methodology, they serve as indicators of the underlying construct that they are presumed to represent. Given this necessary bridging process between observed variables and unobserved latent variables, it should now be clear why methodologists urge researchers to be circumspect in their selection of assessment measures. Although the choice of psychometrically sound instruments bears importantly on the credibility of all study findings, such selection becomes even more critical when the observed measure is presumed to represent an underlying construct.1

2. Exogenous versus Endogenous Latent Variables

It is helpful in working with SEM models to distinguish between latent variables that are exogenous and those that are endogenous. Exogenous latent variables are synonymous with independent variables: they “cause” fluctuations in the values of other latent variables in the model. Changes in the values of exogenous variables are not explained by the model. Rather, they are considered to be influenced by other factors external to the model. Background variables such as gender, age, and socioeconomic status are examples of such external factors. Endogenous latent variables are synonymous with dependent variables and, as such, are influenced by the exogenous variables in the model, either directly or indirectly. Fluctuation in the values of endogenous variables is said to be explained by the model because all latent variables that influence them are included in the model specification.

3. The Factor Analytic Model

The oldest and best-known statistical procedure for investigating rela­tions between sets of observed and latent variables is that of factor anal­ysis. In using this approach to data analyses, the researcher examines the covariation among a set of observed variables in order to gather information on their underlying latent constructs (i.e., factors). There are two basic types of factor analyses: exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). We turn now to a brief description of each.

Exploratory factor analysis (EFA) is designed for the situation where links between the observed and latent variables are unknown or uncer­tain. The analysis thus proceeds in an exploratory mode to determine how, and to what extent, the observed variables are linked to their underlying factors. Typically, the researcher wishes to identify the minimal number of factors that underlie (or account for) covariation among the observed variables. For example, suppose a researcher develops a new instrument designed to measure five facets of physical self-concept (e.g., Health, Sport Competence, Physical Appearance, Coordination, Body Strength). Following the formulation of questionnaire items designed to measure these five latent constructs, he or she would then conduct an EFA to deter­mine the extent to which the item measurements (the observed variables) were related to the five latent constructs. In factor analysis, these relations are represented by factor loadings. The researcher would hope that items designed to measure Health, for example, exhibited high loadings on that factor, and low or negligible loadings on the other four factors. This factor analytic approach is considered to be exploratory in the sense that the researcher has no prior knowledge that the items do, indeed, measure the intended factors. (For texts dealing with EFA, see Comrey, 1992; Gorsuch, 1983; McDonald, 1985; Mulaik, 2009; for informative articles on EFA, see Byrne, 2005a; Fabrigar, Wegener, MacCallum, & Strahan, 1999; Floyd & Widaman, 1995; Preacher & MacCallum, 2003; MacCallum, Widaman, Zhang, & Hong, 1999; Wood, Tataryn, & Gorsuch, 1996.)

In contrast to EFA, confirmatory factor analysis (CFA) is appropriately used when the researcher has some knowledge of the underlying latent variable structure. Based on knowledge of the theory, empirical research, or both, he or she postulates relations between the observed measures and the underlying factors a priori and then tests this hypothesized structure statistically. For example, based on the example cited earlier, the researcher would argue for the loading of items designed to meas­ure Sport Competence self-concept on that specific factor, and not on the Health, Physical Appearance, Coordination, or Body Strength self-concept dimensions. Accordingly, a priori specification of the CFA model would allow all Sport Competence self-concept items to be free to load on that factor, but restricted to have zero loadings on the remaining factors. The model would then be evaluated by statistical means to determine the adequacy of its goodness-of-fit to the sample data. (For texts dealing with CFA, see Brown, 2006; Mulaik, 2009; for detailed discussions of CFA, see, e.g., Bollen, 1989a; Byrne, 2003; Byrne, 2005b; Byrne, 2015; Long, 1983a.)

In summary, then, the factor analytic model (EFA or CFA) focuses solely on how, and the extent to which, the observed variables are linked to their underlying latent factors. More specifically, it is concerned with the extent to which the observed variables are generated by the under­lying latent constructs and thus strength of the regression paths from the factors to the observed variables (i.e., the factor loadings) are of primary interest. Although inter-factor relations are also of interest, any regres­sion structure among them is not considered in the factor analytic model. Because the CFA model focuses solely on the link between factors and their measured variables, within the framework of SEM, it represents what has been termed a measurement model.

4. The Full Latent Variable Model

In contrast to the factor analytic model, the full latent variable (LV) model allows for the specification of regression structure among the latent varia­bles. That is to say, the researcher can hypothesize the impact of one latent construct on another in the modeling of causal direction. This model is termed “full” (or “complete”) because it comprises both a measurement model and a structural model: the measurement model depicting the links between the latent variables and their observed measures (i.e., the CFA model), and the structural model depicting the links among the latent variables themselves.

A full LV model that specifies direction of cause from one direction only is termed a recursive model; one that allows for reciprocal or feed­back effects is termed a nonrecursive model. Only applications of recursive models are considered in the present book.

5. General Purpose and Process of Statistical Modeling

Statistical models provide an efficient and convenient way of describing the latent structure underlying a set of observed variables. Expressed either diagrammatically or mathematically via a set of equations, such models explain how the observed and latent variables are related to one another.

Typically, a researcher postulates a statistical model based on his or her knowledge of the related theory, on empirical research in the area of study, or some combination of both. Once the model is specified, the researcher then tests its plausibility based on sample data that comprise all observed variables in the model. The primary task in this model-testing procedure is to determine the goodness-of-fit between the hypothesized model and the sample data. As such, the researcher imposes the structure of the hypothesized model on the sample data, and then tests how well the observed data fit this restricted structure. Because it is highly unlikely that a perfect fit will exist between the observed data and the hypoth­esized model, there will necessarily be a differential between the two; this differential is termed the residual. The model-fitting process can therefore be summarized as follows:

Data = Model + Residual

where

Data represent score measurements related to the observed variables as derived from persons comprising the sample

Model represents the hypothesized structure linking the observed vari­ables to the latent variables, and in some models, linking particular latent variables to one another

Residual represents the discrepancy between the hypothesized model and the observed data

In summarizing the general strategic framework for testing structural equation models, Joreskog (1993) distinguished among three scenarios, which he termed strictly confirmatory (SC), alternative models (AM), and model generating (MG). In the strictly confirmatory scenario, the researcher postulates a single model based on theory, collects the appropriate data, and then tests the fit of the hypothesized model to the sample data. From the results of this test, the researcher either rejects or fails to reject the model; no further modifications to the model are made. In the alternative models case, the researcher proposes several alternative (i.e., competing) models, all of which are grounded in theory. Following analysis of a sin­gle set of empirical data, he or she selects one model as most appropriate in representing the sample data. Finally, the model generating scenario represents the case where the researcher, having postulated and rejected a theoretically derived model on the basis of its poor fit to the sample data, proceeds in an exploratory (rather than confirmatory) fashion to modify and reestimate the model. The primary focus, in this instance, is to locate the source of misfit in the model and to determine a model that better describes the sample data. Joreskog (1993) noted that, although respecification may be either theory or data driven, the ultimate objective is to find a model that is both substantively meaningful and statistically well-fitting. He further posited that despite the fact that “a model is tested in each round, the whole approach is model generating, rather than model testing” (Joreskog, 1993, p. 295).

Of course, even a cursory review of the empirical literature will clearly show the MG situation to be the most common of the three scenarios, and for good reason. Given the many costs associated with the collection of data, it would be a rare researcher indeed who could afford to terminate his or her research on the basis of a rejected hypoth­esized model! As a consequence, the SC case is not commonly found in practice. Although the AM approach to modeling has also been a rela­tively uncommon practice, at least two important papers on the topic (e.g., MacCallum, Roznowski, & Necowitz, 1992; MacCallum, Wegener, Uchino, & Fabrigar, 1993) have precipitated more activity with respect to this analytic strategy.

Statistical theory related to these model-fitting processes can be found (a) in texts devoted to the topic of SEM (e.g., Bollen, 1989a; Kline, 2011; Loehlin, 1992; Long, 1983b; Raykov & Marcoulides, 2000; Saris & Stronkhurst, 1984; Schumacker & Lomax, 2004); (b) in edited books devoted to the topic (e.g., Bollen & Long, 1993; Cudeck, du Toit, & Sorbom, 2001; Hoyle, 1995a; Marcoulides & Schumacker, 1996); and (c) in methodologically oriented journals such as British Journal of Mathematical and Statistical Psychology, Journal of Educational and Behavioral Statistics, Multivariate Behavioral Research, Psychological Assessment, Psychological Methods, Psychometrika, Sociological Methodology, Sociological Methods & Research, and Structural Equation Modeling.

Source: Byrne Barbara M. (2016), Structural Equation Modeling with Amos: Basic Concepts, Applications, and Programming, Routledge; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *