Skip navigation

Category Archives: Structural Equation Modeling

Article by Chin, Wynne, W., and Todd, Peter, A. (1995) in MIS Quarterly, June 1995.


Few weeks ago, one of my colleagues – Mr. H2 from the Faculty of Businesss Management – told me about his perception on SEM while we were having our breakfast together. He told me that he feels like cheating when he uses SEM, especially when he modifies his model using the ‘modification index’ (this index basically tells us what to do to improve the model’s fitness, such as linking ‘error’ of one ‘item’ with ‘error’ of another ‘item’. To a certain extent, some researchers delete few items in their initial model to get the better fit. Technically in SEM, we call this step as ‘specification search’[1] or in some articles they refer it as ‘model modification’[2].)


I think this paper provides the best answer to what my colleague has raised few weeks ago (I’m looking forward to give him the hardcopy of this article – until now I don’t have that opportunity, he is now very busy with his lectures and doesn’t have time to breakfast with me anymore:) ).


This paper is actually a critique to the work done by Segars and Grover (1993) (hereafter S&G). S&G’s work is also a critique to another work done by earlier researchers, Adams, et. al. (1992). So this paper is basically a critique on a critique.


This paper outlines 5 mistakes done by S&G and Adams. We can take these 5 issues as a note of caution and guideline for our future research using SEM.

  1. Adams did not validate their measurement model prior to analyzing the ‘structural’ model. S&G argued that “unless the measurement model, which postulates the relationship between observed measures (or indicators) and their underlying constructs, is both reliable and valid, its application in testing structural relationships may lead to equivocal results… This can occur due to a confounding of substantive and measurement issues.” To overcome this, S&G suggested applying Confirmatory Factor Analysis prior to model testing [note: those who’re familiar with model development using SEM, they can always imagine the big problem that one will face if CFA is not performed at the initial stage – definitely a catastrophic ending!!!].
  2. S&G used cross validation technique [note: note on what is validation techniques is provided at the end of this entry] to confirm their finding, but it was not based on a sample of independent respondents. Therefore, it exposed his model to a great probability of bias.
  3. During the step of specification searches, the model’s fitness was tested after several modifications were done on the model. Therefore, no conclusion could be made as which amendment was actually contributing to the best model’s fit.
  4. The sample size in the calibration set was too small to provide a stable solution for the cross validation. This paper refers to MacCallum (1986) that stated “specification searches typically show inconsistent and unstable outcomes for sample sizes of 100 to 400 observations”.[3]
  5. During the step of specification searches, constructs and model modifications were guide largely by statistical considerations, and not supported by any sound and substantive theoretical rationale. So it was lack of substantive knowledge and theoretical justification.


Another valuable lesson I learn from this paper is the so called “distribution free resampling”. This approach will rest us from the assumptions of multivariate normality and the limitation of sample size. This paper provides a sample on how to conduct the “distribution free resampling” approach.


What is Cross Validation Technique

Cross validation addresses the question of how well a solution obtained by fitting a model to a given sample will fit an independent sample from the same population. It typically begins by randomly splitting a sample into two sub-samples. This provides two independent sub-samples sharing similar statistical properties. One sub-samples then used as a calibration set for model parameter estimation. These parameter estimates are then validated by holding them constant and applying them on the second sub-sample, which is referred to as the validation set. This is done to test the predictive accuracy of a fitted model, which may have provided a good fit to one data set by capitalizing on the peculiar characteristics of that data set. If the models valid, the exact parameter estimates from the first data set should predict relationships in the new sample as well.


Look at the illustration below for better understanding.

Cross Validation

Cross Validation

[1] This term used by MacCallum (1986) in his article “Specification Searches in Covariance Structure Modeling” Psychological Bulletin, 100:1, pp 107-120. Insya-ALLAH, I’ll put my notes about this article later in the coming post.

[2] Wikipedia uses this term.

[3] I would personally prefer to use what Bartlett, et. al. (2001) has suggested for sample size determination. With regards to SEM, I usually combine Bartlett’s suggestions with suggestions from Benter and Chou (1987). I really hope I’ll have time to put my notes on these two papers in the coming post.


Several Sources


For those who’re not familiar with Structural Equation Modeling (SEM), and looking for some clues on what it is, I quote here the description about it from several sources:




“Structural equation modeling (SEM) is a statistical technique for testing and estimating causal relationships using a combination of statistical data and qualitative causal assumptions…”


“SEM encourages confirmatory rather than exploratory modeling; thus, it is suited to theory testing rather than theory development. It usually starts with a hypothesis, represents it as a model, operationalises the constructs of interest with a measurement instrument, and tests the model…”


“Among its strengths is the ability to model constructs as latent variables (variables which are not measured directly, but are estimated in the model from measured variables which are assumed to ‘tap into’ the latent variables). This allows the modeler to explicitly capture the unreliability of measurement in the model, which in theory allows the structural relations between latent variables to be accurately estimated.”


Hair, J.F., Anderson, R.E., Tatham, R.L. and Black, W.C. (1998), Multivariate Analysis, 5th ed., Prentice-Hall, Englewood Cliffs, NJ., p 584.


SEM is good in estimating “multiple and interrelated dependence relationship” and it has “the ability to represent unobserved concepts in these relationships and account for measurement error in the estimation process”. Truly amazing, SEM can also estimate “a series of separate, but interdependent, multiple regression equations simultaneously”. [Note: words in orange are mine; they’re not taken from the stated source.]


Schumaker, R. E. (2005). Structural Equation Modeling: Overview. Encyclopedia of Statistics in Behavioral Science. B. S. Everett and D. C. Howell. Chichester, John Wiley & Sons. 4: 1941-1947.


“Structural equation modeling (SEM) has been historically referred to as linear structural relationships, covariance structure analysis, or latent variable modeling. SEM has traditionally tested hypothesized theoretical models that incorporate a correlation methodology with correction for unreliability of measurement in the observed variables. SEM models have currently included most statistical applications using either observed variables and/or latent variables... Six basic steps are involved in structural equation modeling: model specification, model identification, model estimation, model testing, model modification, and model validation.”


Schumaker, R. E. (2005). Structural Equation Modeling: Overview. Encyclopedia of Statistics in Behavioral Science. B. S. Everett and D. C. Howell. Chichester, John Wiley & Sons. 4: 1941-1947.


Structural equation models (SEMs) comprise two components, a measurement model and a structural model. The measurement model relates observed responses or ‘indicators’ to latent variables and sometimes to observed covariates. The structural model then specifies relations among latent variables and regressions of latent variables on observed variables.”