Evaluating rater effects in the context of ethical reasoning essay assessment: An application of the many-facets rasch measurement model
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Date of Graduation
Doctor of Philosophy (PhD)
Department of Graduate Psychology
Sonia J. Horst
Performance assessments are an often desired type of assessment due to their potential for alignment between the assessment and reality. However, due to the rater-mediated nature of scoring (Eckes, 2015), performance assessments have psychometric challenges that cannot be ignored in testing and assessment work. Specifically, performance assessment scores are prone to rater effects, or systematic differences in how raters evaluate performance assessment products (Myford & Wolfe, 2003). The purpose of this project was to evaluate ethical reasoning essay scores for rater effects. The Many-Facets Rasch Measurement (MFRM) model was used to evaluate ethical reasoning essay scores for rater leniency/severity effects, restriction of range, and rater leniency/severity by rubric element interaction effects. Individual rater leniency/severity effects were observed in this sample of raters, as was an interaction effect between rater leniency/severity and rubric element. Moreover, a restriction of range effect was observed, with scores restricted primarily to the lower end of the rubric score categories. To provide a preliminary explanation for differences in rater leniency/severity, the relationship between raters’ knowledge of ethical reasoning and their leniency/severity was evaluated. No relationship between raters’ knowledge of ethical reasoning and their leniency/severity was observed in this study. Based on findings, recommendations are made for rater training. Specifically, ethical reasoning program coordinators may consider using the MFRM analysis during rating to identify individual raters who are exhibiting rater effects. Program coordinators may then work with individual raters on additional training and rubric calibration to mitigate individual rater effects. Additionally, recommendations are made regarding the statistical adjustment of student scores to mitigate rater leniency/severity effects in the ethical reasoning scores. Though score adjustment is attractive if the goal is to mitigate rater leniency/severity effects, it has implications for inferences made from scores. Future research may focus on further identifying causes of rater effects, as well as methods for mitigating rater effects.
Holzman, Madison A., "Evaluating rater effects in the context of ethical reasoning essay assessment: An application of the many-facets rasch measurement model" (2018). Dissertations, 2014-2019. 192.