Pages

Friday, April 20, 2012

What is Structural Equation Modeling (SEM)?

If any of you are like me, you're excited about the idea of using a structural equation model to analyze your data, but completely overwhelmed with its complexity. I'm hoping that this post will give you a brief introduction along with a few sources to help you develop the skills and knowledge to apply this statistical procedure to your own research. Good luck!


What is Structural Equation Modeling?

Structural equation modeling (SEM) is a statistical technique used to assess whether a proposed model (which is a set of specified causal and noncausal relationships among variables) accounts for the observed relationships we see in empirical data (Savalei & Bentler 2006). This involves the use of factor analysis, path analysis, measurement models, and structural models (Stoelting 2002). Factor analysis deals with constructs (AKA latent variables, factors, concepts) that cannot be directly measured but are related to measurable variables. For example, intelligence is a vavriable that cannot be measured directly but it can be determined through a series of questions or tests. Path analysis is a technique used to identify causal relationships between directly measured variables (Klem 2000). Measurement models deal with relationships between measured variables and latent variables (like a factor analysis). Structural Models deal with relationships between latent variables only (Stoelting 2002). The goal of SEM is two-fold: to obtain estimates of the parameters of the model (i.e. the factor loadings, the variances and covariances of the factor, and the residual error variances of the observed variables), and to assess the fit of the model (i.e. to assess whether the model itself provides a good fit to the data) (Hox & Bechger 2001).

Many books and articles on SEM discuss its ability to determine causal relationships; however, the data used to determine these relationships is correlational. Remember learning in your basic stats class that correlation does not equal causation? Well, that idea also applies with SEM. Testing a single model does not validate causality, it must be compared to competing models and used in multiple samples. Just because a single model fits the data does not mean that it has been proven "true" (Hox & Bechger 2001).


How to read a model created through SEM (Klem 2000, Hox & Bechger 2001, Stoelting 2002, Savalei & Bentler 2006)

  • Variables that are incased in rectangles represent variables that can be measured directly and are called indicators.
  • Variables that are incased in circles represent latent variables that cannot be measured directly and therefore are the product of some kind of instrument or measurement tool. These latent variables are abstract concepts and refered to as factors or constructs.
  • Single headed arrows represent regression coefficients and indicate a hypothesized pathway (causal relationship) between two variables. The variable at the tail of the arrow causes the variable at the point. 
  • Double-headed arrows represent covariances and indicate the relationship between two variables or error terms. These relationships are non-directional and therefore not causal.
  • Error terms represent the variance within a single variable. These are depicted by circles (since they are not directly measured) or simply represented by an arrow pointing toward the variable.

We can further break down unmeasurable (latent) variables into two subcategories: exogenous and endogenous. Exogenous factors are those that the model does not try to explain, and arrows will point away from these factors. Endogenous factors are completely opposite, they are affected by one or more of the latent variables and arrows will be pointing towards these variables. In SEM, you can also have error variables which are exogenous variables within the model that include the effects of omitted variables along with the effects of measurement error (Klem 2000).

There are 8 types of parameters that can be estimated using structural equation modeling. Four of those parameters are direct effects and these effects are analogous to the coefficientts used inn multiple regression (beta values) (Klem 2000):
  • the effect of an exogenous factor on a measure variable (e.g. effect of locus of control on plan unhappy)
  • the effect of an endogenous factor on a measured variable (e.g. the effect of self esteem on worth)
  • the effect of an exogenous factor on an endogenous factor (e.g. the effect of locus of control on self esteem)
  • the effect of an endogenous factor on an exogenous factor (e.g. the effect of self esteem on overall satisfaction)
The other four parameters distinguished by SEM are all variances or covariances (Klem 2000):
  • the variance and covariance of unmeasured variables (e.g. the curved double headed arrow linking locus of control and loneliness)
  • the variance and covariance of unmeasured variables which represents error - this is interesting because it is unexplained variance (e.g. the variance of the unmeasured variable in the top left corner - the oval - its the variance of plan unhappy that is not explained by locus of control)
  • variance/covariance of errors in measured dependent variables
  • variance/covariance of errors in measured independent variables

    This diagram should help you understand both the variables and parameters used in SEM


Two assumptions for SEM
  1. the variables on which the matrix coefficients are based are intervally scaled
  2. the variables have a multivariate distribution
These assumptions can be hard to meet within social science research but maximum likelihood, the estimation method most commonly used in SEM, is robust to violations of normality. Currentt SEM software provides possible remedies for unmet assumtions (Klem 2000).


Sample Size and SEM

Sample size is important to consider when conducting a SEM. The necessary sample size for reliable results depends upon the complexity of the model, the magnitude of the coefficients, the number of measured variables associated with the factors, and the multivariate normality of the variable distributions - more cases are needed for complex models, models with weak relationships,\ models with few measured variables per factor, and nonnomral distributions. The input matrix should be based on at least 150 cases, and at least 5-10 cases per parameter estimated. It is recommended to have 10 cased per parameter if the variables do not have a multivariate normal distributionm (Klem 2000).

How to estimate parameters in SEM (Klem 2000)
  1. Create a model based on the literature. If there are competing models as indicated by the literature, then they should be specified. This can be done by completing a simple diagram and indicating factors, indicators, and all relationships.
  2. Obtain parameter estimates for the model - the eight parameters discussed above that involve coefficients of direct effects and variances or covariances of unmeasured variables. For this step, you must use statistical software to calculate the relationships

How to evaluate results (Klem 2000).
  1. Make sure the results fit statistical and theoretical criteria. Any model tested by SEM should be based on theory. After the parameters have been estimated by the statistical program, each paramete should be assessed from a theoretical perspective, for example, the signs and magnitudes of thee coefficients should be consistent with what is know from the literature and previous research. Results should be theorettically sensible.
  2. Determine the identification status of the model. IF the model is considered "identified" then there is a unique solution for each parameter in the model.
  3. Check if parameters are reasonable. A model that is misspecified can results in improper results such as negative variances and correlations greater than one.
  4. Check to see if the data fits the model.

References:
*Please note that not all of the references listed below are the best resources. I simply used information that was presented in a simple and easy to understand format. If you plan on publishing your research, I would suggest finding better sources.
Hox, J.J. & Bechger, T.M. (2001). An introduction to structural equation modeling. Family Science Review. 11:354-373.
Hoyle 1995 Structural Equation Modeling
Klem, L. (2000). Structural equation modeling. Grimm, Laurence G. (Ed); Yarnold, Paul R. (Ed), (2000). Reading and understanding MORE multivariate statistics., (pp. 227-260). Washington, DC, US: American Psychological Association.

Savalei, V. & Bentler, P.M. (2006). Structural Equation Modeling. In: The Handbook of Market Research: Uses, Misuses, and Future Advances. Edited by R. Grover & M.Vriens. Sage Publications

Stoelting, R.  (2002). Webpage retrieved April 15, 2012 from: http://userwww.sfsu.edu/~efc/classes/biol710/path/SEMwebpage.htm