Where My Shoe Pinches: Statistics

Showing posts with label Statistics. Show all posts

Friday, April 20, 2012

What is Structural Equation Modeling (SEM)?

If any of you are like me, you're excited about the idea of using a structural equation model to analyze your data, but completely overwhelmed with its complexity. I'm hoping that this post will give you a brief introduction along with a few sources to help you develop the skills and knowledge to apply this statistical procedure to your own research. Good luck!

What is Structural Equation Modeling?

Structural equation modeling (SEM) is a statistical technique used to assess whether a proposed model (which is a set of specified causal and noncausal relationships among variables) accounts for the observed relationships we see in empirical data (Savalei & Bentler 2006). This involves the use of factor analysis, path analysis, measurement models, and structural models (Stoelting 2002). Factor analysis deals with constructs (AKA latent variables, factors, concepts) that cannot be directly measured but are related to measurable variables. For example, intelligence is a vavriable that cannot be measured directly but it can be determined through a series of questions or tests. Path analysis is a technique used to identify causal relationships between directly measured variables (Klem 2000). Measurement models deal with relationships between measured variables and latent variables (like a factor analysis). Structural Models deal with relationships between latent variables only (Stoelting 2002). The goal of SEM is two-fold: to obtain estimates of the parameters of the model (i.e. the factor loadings, the variances and covariances of the factor, and the residual error variances of the observed variables), and to assess the fit of the model (i.e. to assess whether the model itself provides a good fit to the data) (Hox & Bechger 2001).

Many books and articles on SEM discuss its ability to determine causal relationships; however, the data used to determine these relationships is correlational. Remember learning in your basic stats class that correlation does not equal causation? Well, that idea also applies with SEM. Testing a single model does not validate causality, it must be compared to competing models and used in multiple samples. Just because a single model fits the data does not mean that it has been proven "true" (Hox & Bechger 2001).

How to read a model created through SEM (Klem 2000, Hox & Bechger 2001, Stoelting 2002, Savalei & Bentler 2006)

Variables that are incased in rectangles represent variables that can be measured directly and are called indicators.
Variables that are incased in circles represent latent variables that cannot be measured directly and therefore are the product of some kind of instrument or measurement tool. These latent variables are abstract concepts and refered to as factors or constructs.
Single headed arrows represent regression coefficients and indicate a hypothesized pathway (causal relationship) between two variables. The variable at the tail of the arrow causes the variable at the point.
Double-headed arrows represent covariances and indicate the relationship between two variables or error terms. These relationships are non-directional and therefore not causal.
Error terms represent the variance within a single variable. These are depicted by circles (since they are not directly measured) or simply represented by an arrow pointing toward the variable.

We can further break down unmeasurable (latent) variables into two subcategories: exogenous and endogenous. Exogenous factors are those that the model does not try to explain, and arrows will point away from these factors. Endogenous factors are completely opposite, they are affected by one or more of the latent variables and arrows will be pointing towards these variables. In SEM, you can also have error variables which are exogenous variables within the model that include the effects of omitted variables along with the effects of measurement error (Klem 2000).

There are 8 types of parameters that can be estimated using structural equation modeling. Four of those parameters are direct effects and these effects are analogous to the coefficientts used inn multiple regression (beta values) (Klem 2000):

the effect of an exogenous factor on a measure variable (e.g. effect of locus of control on plan unhappy)
the effect of an endogenous factor on a measured variable (e.g. the effect of self esteem on worth)
the effect of an exogenous factor on an endogenous factor (e.g. the effect of locus of control on self esteem)
the effect of an endogenous factor on an exogenous factor (e.g. the effect of self esteem on overall satisfaction)

The other four parameters distinguished by SEM are all variances or covariances (Klem 2000):

the variance and covariance of unmeasured variables (e.g. the curved double headed arrow linking locus of control and loneliness)
the variance and covariance of unmeasured variables which represents error - this is interesting because it is unexplained variance (e.g. the variance of the unmeasured variable in the top left corner - the oval - its the variance of plan unhappy that is not explained by locus of control)
variance/covariance of errors in measured dependent variables
variance/covariance of errors in measured independent variables

This diagram should help you understand both the variables and parameters used in SEM

Two assumptions for SEM

the variables on which the matrix coefficients are based are intervally scaled
the variables have a multivariate distribution

These assumptions can be hard to meet within social science research but maximum likelihood, the estimation method most commonly used in SEM, is robust to violations of normality. Currentt SEM software provides possible remedies for unmet assumtions (Klem 2000).

Sample Size and SEM

Sample size is important to consider when conducting a SEM. The necessary sample size for reliable results depends upon the complexity of the model, the magnitude of the coefficients, the number of measured variables associated with the factors, and the multivariate normality of the variable distributions - more cases are needed for complex models, models with weak relationships,\ models with few measured variables per factor, and nonnomral distributions. The input matrix should be based on at least 150 cases, and at least 5-10 cases per parameter estimated. It is recommended to have 10 cased per parameter if the variables do not have a multivariate normal distributionm (Klem 2000).

How to estimate parameters in SEM (Klem 2000)

Create a model based on the literature. If there are competing models as indicated by the literature, then they should be specified. This can be done by completing a simple diagram and indicating factors, indicators, and all relationships.
Obtain parameter estimates for the model - the eight parameters discussed above that involve coefficients of direct effects and variances or covariances of unmeasured variables. For this step, you must use statistical software to calculate the relationships

How to evaluate results (Klem 2000).

Make sure the results fit statistical and theoretical criteria. Any model tested by SEM should be based on theory. After the parameters have been estimated by the statistical program, each paramete should be assessed from a theoretical perspective, for example, the signs and magnitudes of thee coefficients should be consistent with what is know from the literature and previous research. Results should be theorettically sensible.
Determine the identification status of the model. IF the model is considered "identified" then there is a unique solution for each parameter in the model.
Check if parameters are reasonable. A model that is misspecified can results in improper results such as negative variances and correlations greater than one.
Check to see if the data fits the model.

References:
*Please note that not all of the references listed below are the best resources. I simply used information that was presented in a simple and easy to understand format. If you plan on publishing your research, I would suggest finding better sources.

Hox, J.J. & Bechger, T.M. (2001). An introduction to structural equation modeling. Family Science Review. 11:354-373.

Hoyle 1995 Structural Equation Modeling

Klem, L. (2000). Structural equation modeling. Grimm, Laurence G. (Ed); Yarnold, Paul R. (Ed), (2000). Reading and understanding MORE multivariate statistics., (pp. 227-260). Washington, DC, US: American Psychological Association.

Savalei, V. & Bentler, P.M. (2006). Structural Equation Modeling. In: The Handbook of Market Research: Uses, Misuses, and Future Advances. Edited by R. Grover & M.Vriens. Sage Publications

Stoelting, R. (2002). Webpage retrieved April 15, 2012 from: http://userwww.sfsu.edu/~efc/classes/biol710/path/SEMwebpage.htm

Tuesday, January 11, 2011

Reliability and Validity - Developing a Good Instrument.

There are a lot of factors that go into developing a good instrument, like the choice of items and measurement scales, using the appropriate sample for pilot testing, implementing statistical techniques that fit the characteristics of your data, etc. Testing your instrument for reliability and validity can help to assess if your instrument is "good" or "bad" and ultimately help you to know if the interpretation of your data is accurate or misleading.

Testing for reliability and/or validity is not a simple process, and can take years of implementation in different samples to determine. Below, I will list the different types of reliability/validity and how they are assessed.

It's important to remember that reliability is a necessary but NOT sufficient condition of validity. A necessary condition of a statement must be satisfied for the statement to be true and a sufficient condition is one that, if satisfied, assures the statement's truth. In other words you can have an instrument that is reliable but not valid. However, if your instrument is valid, then it HAS to be reliable. It reminds me of the old adage that states: "all poodles are dogs, but not all dogs are poodles". It's the same thing in this case: all valid instruments are reliable, but not all reliable instruments are valid.

Definitions:

Reliability - the degree to which an instrument consistently measures whatever it intends to measure. In other words, it's the statistical measure of the reproducibility or stability of the data gathered by your survey.

Validity - the degree to which an instrument measures what it is suppose to measure. If your instrument is valid then you can feel confident in the interpretation of the data. Going back to the adage above, a valid instrument must also be reliable, but what does that mean? Here is another way to interpret it:

Precision (reliability) + Accuracy = Validity

If you spent any time studying the sciences, you were sure to come across precision and accuracy. Although the differences of these terms are clear to me now, they seemed very ambiguous when I was in Chemistry 101, so I'll use the bulls-eye analogy to explain. Precision, AKA reliability, is when all your "hits" are clustered in the same area (the degree to which repeated measures under unchanged conditions show the same result). Accuracy is when all your "hits" are close to the bulls-eye (how close the measurements are to the actual value). With both of these properties together, you have a validity (your "hits" are clustered together around the bulls-eye).

Image from: Precise Positioning and Targeting

Types of Reliability:

Test-Retest (stability) - the degree to which scores on the same test are consistent over time, in other words, the questions are worded in such a way that cause the respondents to consistently answer the same way. To test this, you would implement the survey to a sample twice and then calculate the correlation coefficients (r) to compare the two sets of responses. Correlation coefficients are considered good if they are 0.70 or above, indicating that the responses are reasonably consistent from one point in time to the other. The trick is determining the amount of time to wait in between implementation. Two weeks is suggested as a good amount because it's long enough for the respondents to forget their answers, but short enough so that they don't gain knowledge or change behaviors before the second survey.

Intraobserver (intrajudge) - measures the stability of responses from the same person. This is a type of test-retest reliability because it looks at a single individual's score over a period of time. It is also measured using correlation coefficient.

Alternate Form (equivalent-form) - the degree to which two similar forms of a test produce similar scores from a single sample. The two instruments will have the same structure, number of items, reading level, difficulty level, etc; however, each item is not the same. Items differ in wording, but still measure the same idea. You can do this by implementing the survey in two separate samples of the same population, or implement the survey twice in the same sample (as a pre and post-test). Correlation coefficients (r) are compared and high values indicate good alternate-form reliability.

Internal Consistency - indicates how well different items measure the same issue. It is applied to a group of items that are thought to measure different aspects of the same concept. This is important when measuring the reliability of latent constructs because a single item will not be able to assess concepts such as knowledge, behavior, and attitude. Below are two commonly used methods to assess internal consistency.

Split-Half - measures internal consistency by comparing two parts of a single instrument. Divide the instrument (or construct items) into two halves, compute each respondent's score on the two halves, correlate both scores. High correlation coefficients indicates high internal consistency.
Cronbach's Alpha -indicates how well the other items complement each other according to a single construct. This can be used for dichotomous items or longer measurement scales like the Likert scale. High CA values indicate high internal consistency. If your instrument only involves dichotomous responses (e.g. yes or no) then the Kuder-Richardson 20 or KR-20 is another option for this statistic.

Interobserver (interjudge/interrater) - measures how well two or more evaluators agree in their assessment of a variable. It refers to the consistency of two or more independent observers and is usually reported as a correlation coefficient. This type of reliability is used in qualitative studies like interviews, focus groups, or open ended surveys.

Types of Validity:

Face - involves the feedback of untrained reviewers. If you were to categorize validity testing into stages, this would be the first one. You're looking for incorrect spelling and grammar, ambiguous items, confusing layouts, etc. Untrained reviewers will focus on the overall aesthetics of the survey and not the content.

Content - the measure of how appropriate items or scales seem to a set of trained reviewers. Like the term suggests, you're looking at the content of the survey. The more people you can have to look over the survey, the better because each person will point out something different. This should be conducted after you check for face validity and can therefore be referred to as the second stage.

Criterion - the measure of how well one instrument compares to another. This is determined by relating the performance of your instrument to another instrument (the criterion against which the validity of your instrument is judged)

Concurrent - the comparison of your instrument again another that is considered to be the gold standard for the variable in question. It is calculated using correlation coefficients and high values indicate good concurrent validity.
Predictive- the degree to which a test can predict how well an individual will do in a future situation. For example, the GRE is suppose to be a good predictor of how well we will do in graduate school. However, I think that many of us will disagree with the GRE's predictive potential, but that's another subject for another time. :) Correlation coefficients are used to compare the initial score with the secondary outcome.

Construct - the degree to which an instrument measures a construct. This is the most important form of validity because it answers the question: is this instrument measuring what is was intended to measure? However, it is also the most difficult form of validity to understand, to measure, and to report.

Convergent - implies that several different methods for obtaining the same information will provide similar results. Assessing convergent validity is similar to alternate form reliability but is more theoretical. This requires a great amount of work over a long period of time to determine.
Divergent - measures the ability of an instrument to estimate the underlying truth in a given area. This is also very theoretical and requires a lot of time and work to determine.

References:

Litwin, M.S. (2003). How to assess and interpret survey psychometrics: the survey kit 2. Sage Publications: Thousand Oaks, CA.

Gay, L.R., Mills, G.E., Airasian, P. (2006). Educational research: competencies for analysis and applications. 8th Edition. Pearson Merrill Prentice Hall: Columbus, Ohio.

Pages

Friday, April 20, 2012

What is Structural Equation Modeling (SEM)?

Tuesday, January 11, 2011

Reliability and Validity - Developing a Good Instrument.