Publication Date


Document Type

Presented Paper


Four methods of scoring multiple-choice items were compared: Dichotomous classical (number-correct), polytomous classical (classical optimal scaling – COS), dichotomous IRT (3 parameter logistic – 3PL), and polytomous IRT (nominal response – NR). Data were generated to follow either a nominal response model or a non-parametric model, based on empirical data. The polytomous models, which weighted the distractors differentially, yielded small increases in reliability compared to their dichotomous counterparts. The polytomous IRT estimates were less biased than the dichotomous IRT estimates for lower scores. The classical polytomous scores were as reliable, sometimes more reliable, than the IRT polytomous scores. This was encouraging because the classical scores are easier to calculate and explain to users.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.