Saturated Models in CFA & SEM

Tips and cautions for “just-identified” models

Many researchers are confused about just identified or saturated models in CFA and SEM and how they relate to model degrees of freedom (df), model testing, and parameter interpretation. You probably use just identified statistical models routinely in your work outside of the formal CFA/SEM world even though you may not be aware of it. For example, a multiple regression model is typically just identified. This means that the model uses up all the observed information (i.e., observed variances, covariances, and means) to estimate parameters (e.g., regression coefficients). The model reproduces the observed (co)variance and mean structure perfectly and is therefore said to be saturated.

Saturated CFA and SEM models always have zero df for the chi-square test of model fit (you can try this out by fitting a linear regression model in, for example, Mplus using maximum likelihood estimation). This means, the model does not imply any testable restrictions for the observed (co)variance or mean structure. In other words, a saturated model does provide a simplification of the data. Instead, it uses all the available information and simply translates it into as many model parameters as there are pieces of information in the data. Therefore, a just identified model (trivially) fits perfectly, resulting in a chi-square value of zero with zero df. An example from the CFA/SEM world is a single-factor factor model with three indicators, in which the loadings can differ between the indicators (this is referred to as a congeneric measurement model in classical test theory). Without additional variables, this model will always fit perfectly (provided the three indicators have non-zero covariances; otherwise, you may run into identification and/or estimation problems). Again, you can try this out in any CFA/SEM software.

Can a just identified/saturated model be useful? It sure can! For example, the above-mentioned single-factor model with three indicators may be useful if your goal is to determine the reliabilities of the three indicators. The model provides this information in terms of R2 for the observed variables. The model can also be used to determine composite reliability (i.e., McDonald’s omega, see my Youtube video above), that is, the reliability of the sum or average of the three indicators that form a unidimensional scale. As another example, probably nobody would argue that multiple regression analysis is useless (and it typically implies a just identified/saturated model).

What is the most important drawback of saturated models? They are not testable. For example, the assumption of congeneric/unidimensional measurement implied by the single-factor model is not testable for three indicators without (1) further restrictions (e.g., equal loadings) or (2) additional variables (more indicators, factors, or external variables/covariates). This means that you may falsely claim that your three indicators are unidimensional measures of a single attribute when in fact they may be multidimensional.

Should you use saturated models? Absolutely, as long as (1) they provide the information that you are looking for and (2) the implications of the models seem reasonable for your application (e.g., can you reasonably assume that the three indicators are measures of a single common factor?)!