Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Date of Award
Doctor of Philosophy (PhD)
Department of Graduate Psychology
Christine E. DeMars
Over the past decade, educational policy trends have shifted to a focus on examining students’ growth from kindergarten through twelfth grade (K-12). One way states can track students’ growth is with a vertical scale. Presently, every state that uses a vertical scale bases the scale on a unidimensional IRT model. These models make a strong but implausible assumption that a single construct is measured, in the same way, across grades. Additionally, research has found that variations of psychometric methods within the same model can result in different vertical scales. The purpose of this study was to examine the impact of three IRT models (unidimensional model, U3PL; bifactor model with grade specific subfactors, BG-M3PL; and a bifactor model with content specific subfactors, BC-M3PL); three calibration methods (separate, hybrid, and concurrent), and two scoring methods (EAP pattern and EAP summed scoring; EAPSS) on the resulting vertical scales. Empirical data based on a states’ assessment program were used to create vertical scales for Mathematics and Reading from Grades 3-8. Several important results were found. First, the U3PL model always resulted in the worst model-data fit. The BC-M3PL fit the data best in Mathematics and the BG-M3PL fit the data best in Reading. Second, calibration methods led to minor differences in the resulting vertical scale. Third, examinee proficiency estimates based on the primary factor for each model were generally highly correlated (.97+) across all conditions. Fourth, meaningful classification differences were observed across models, calibration methods, and scoring methods. Overall, I concluded that none of the models were viable for developing operational vertical scales. Multidimensional models are promising for addressing the current limitations of unidimensional models for vertical scaling but more research is needed to identify the correct model specification within and across grades. Implications for these results are discussed within the context of research, operational practice, and educational policy.
Koepfler, James, "Examining the Bifactor IRT Model for Vertical Scaling in K-12 Assessment" (2012). Dissertations. 69.