What To Do When Your C.F.A. Doesn’t Fit

One question that applied researchers often ask me is: What can I do when my confirmatory factor analysis (CFA) or structural equation model (SEM) shows a suboptimal model fit (for example, a large and highly significant chi-square value)?

First of all, it is often useful to rethink the model and to consider ways in which the model may be misspecified (too restrictive). For example, is it reasonable to assume that the indicators (observed variables) for a given factor are unidimensional? Lack of unidimensionality is one of the primary reasons form CFA/SEM model misfit, in particular, when there are many indicators per factor. Also, are there any omitted paths in the model, either in the measurement model (e.g., omitted cross-loadings) or the structural (latent variable) model (e.g., omitted direct effects, correlations between exogeneous factors, or correlations between factor residuals for endogenous factors) that may be significantly different from zero? Each non-zero omitted path may contribute to the overall misfit of a model.

To examine the sources of a suboptimal CFA/SEM model fit empirically, it can be useful to study standardized covariance residuals and/or model modification indices. The standardized residuals (e.g., OUTPUT: RESIDUAL in Mplus) tell you which observed (co)variances are substantially over- or underestimated by your model. Often, the residuals can help uncover inhomogeneities or other misspecification in the measurement model (e.g., some items within the same factor may be more highly correlated due to similar wording, polarity, or other item-specific or method effects) and/or the structural model (e.g., misspecification due to omitted paths).

Modification indices (OUTPUT: MODINDICES in Mplus) can also be helpful as they may point you to specific cross-loadings, item residual associations (again, potentially due to shared item-specific or method variance), omitted paths, and other forms of model under- or misspecification. Not all suggested modifications may be theoretically or practically meaningful, and you should never blindly apply all suggested modifications since they may be purely data-driven. However, like the residuals, modification indices can point you to key portions of your model that may be misspecified or in need for respecification. For further information, you can watch my Youtube videos on this topic here:

https://www.youtube.com/watch?v=yUusj7laFWE&t=1s

https://www.youtube.com/watch?v=SrI9riDr1W4

Once again, residuals and modification indices should never be applied in a completely atheoretical, data-driven way, but only in conjunction with substantive considerations. Otherwise, you risk capitalizing on chance.

Another reason for model misfit can be due to having a large model (so-called “model-size effect”) with many indicators resulting in many degrees of freedom. Simulation studies have s    hown that chi-square values can be inflated for large CFA/SEM models—leading to rejection of too many correct models. See, for example:

Herzog, W., Boomsma, A., & Reinecke, S. (2007). The model-size effect on traditional and modified tests of covariance structures. Structural Equation Modeling: A Multidisciplinary Journal14(3), 361-390.

Moshagen, M. (2012). The model size effect in SEM: Inflated goodness-of-fit statistics are due to the size of the covariance matrix. Structural Equation Modeling: A Multidisciplinary Journal, 19(1), 86-98.

Shi, D., Lee, T., & Terry, R. A. (2018). Revisiting the model size effect in structural equation modeling. Structural Equation Modeling: A Multidisciplinary Journal25(1), 21-40.

Correction methods are available that appear to work well in practice. See:

Yuan K. H., Tian Y., Yanagihara H. (2015). Empirical correction to the likelihood ratio statistic for structural equation modeling with many variables. Psychometrika, 80, 379-405.

Other reasons why you may get a bad fit may be related to your data structure and sampling design rather than using a misspecified or large model. For example, non-normal data can inflate model chi-square test statistics when using standard (uncorrected) maximum likelihood estimation. As a result, a proper model may be rejected simply due to a violation of distributional assumptions. For non-normal data, robust maximum likelihood estimation (such as the Satorra-Bentler correction; ANALYSIS: ESTIMATOR = MLM; in Mplus) can be used to obtain adjusted fit statistics and parameter standard errors. I will discuss additional options for dealing with non-normal data in CFA and SEM in one of my subsequent newsletters.

Another data feature that can lead to an inflated chi-square statistic is when you have nested (clustered, hierarchical, multilevel) data such as students nested within school classes or employees nested within companies. The non-independence that arises from clustered data can lead to bias in standard errors and fit statistics. Applying appropriate methods for clustered data (e.g., multilevel modeling) can help in this case.