Four methods of scoring multiple-choice items were compared: Dichotomous classical (number-correct), polytomous classical (classical optimal scaling – COS), dichotomous IRT (3 parameter logistic – 3PL), and polytomous IRT (nominal response – NR). Data were generated to follow either a nominal response model or a non-parametric model, based on empirical data. The polytomous models, which weighted the distractors differentially, yielded small increases in reliability compared to their dichotomous counterparts. The polytomous IRT estimates were less biased than the dichotomous IRT estimates for lower scores. The classical polytomous scores were as reliable, sometimes more reliable, than the IRT polytomous scores. This was encouraging because the classical scores are easier to calculate and explain to users.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
DeMars, C. E. (2008, March). Scoring multiple choice items: A comparison of IRT and classical polytomous and dichotomous methods. Paper presented at the annual meeting of the National Council on Measurement in Education, New York.