Detection of item parameter drift over multiple test administrations

Document Type


Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Publication Date



Three methods of detecting item drift were compared: the procedure in BILOG-MG for estimating linear trends in item difficulty, the CUSUM procedure that Veerkamp and Glas (2000) used to detect trends in difficulty or discrimination, and a modification of Kim, Cohen, and Park’s (1995) χ 2 test for multiple-group differential item functioning (DIF), using linear contrasts on the discrimination and difficulty parameters. Data were simulated as if collected over 3, 4, or 5 time points, with parameter drift in either a gradual, linear pattern, a less linear but still monotonic pattern, or as a sudden shift at the third time point. The BILOG-MG procedure and the modification of the Kim et al. procedure were more powerful than the CUSUM procedure, nearly always detecting drift. All three procedures had false alarm rates for nondrift items near the nominal alpha. The procedures were also illustrated on a real data set.