Differential Item Functioning Detection With Latent Classes: How Accurately Can We Detect Who Is Responding Differentially?

Document Type


Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Publication Date



There is a long history of differential item functioning (DIF) detection methods for known, manifest grouping variables, such as sex or ethnicity. But if the experiences or cognitive processes leading to DIF are not perfectly correlated with the manifest groups, it would be more informative to uncover the latent groups underlying DIF. The use of item response theory (IRT) mixture models to detect latent groups and estimate the DIF caused by these latent groups has been explored/interpreted with real data sets, but the accuracy of model estimation has not been thoroughly explored. The purpose of this simulation research was to assess the accuracy of the recovery of classes, item parameters, and DIF effects in contexts where relatively small clusters of items showed DIF. Overall, the results from the study reveal that the use of IRT mixture models for latent DIF detection may be problematic. Class membership recovery was poor in all conditions tested. Discrimination parameters were estimated well for the invariant items, as well as for the DIF items when there was no group impact. But when there was group impact, discriminations for the DIF items were positively biased. When there was no group impact, DIF effect estimates tended to be positively biased. In general, having fewer items was associated with more biased estimates and larger standard errors.