Understanding Rater Effects Using Rasch Measurement Models
Faculty Advisor Name
John D. Hathcoat
Department
Department of Graduate Psychology
Description
Experts in the field of arts and humanities tend to rate performance of student work. Faculty experts are believed to provide an accurate judgement of student ability due to their depth of disciplinary knowledge. Furthermore, judges are usually provided training to become familiar with the scoring rubric so that they can apply criteria to student work systematically. However, even trained judges exhibit tendencies that inadvertently influence scores. For instance, researchers have found that some raters tend to be more severe, giving lower scores across the board, while others tend to be more lenient, inflating scores for all students. Thus, these scores are not only representative of student ability but also of rater effects.
Ultimately, one does not want scores to depend on which rater a student is assigned; therefore, we need to know about rater tendencies to assign particular scores. Rasch measurement models can be used to better understand rater effects, such as severity and leniency. Such information may, therefore, allow us to account for these effects when assigning students a score. Said differently, understanding rater tendencies will allow us to assign scores that are a better indication of student ability.
The present study investigates rater effects of an arts and humanities assessment at James Madison University. Ninety-one students watched a foreign video and responded to five open-ended questions. Five experts, arts and humanities faculty, received rater training to calibrate them to a rubric over the summer. The present study applies the Rasch models to investigate the variation in rater severity and differences in how raters are applying the scoring rubric.
Understanding Rater Effects Using Rasch Measurement Models
Experts in the field of arts and humanities tend to rate performance of student work. Faculty experts are believed to provide an accurate judgement of student ability due to their depth of disciplinary knowledge. Furthermore, judges are usually provided training to become familiar with the scoring rubric so that they can apply criteria to student work systematically. However, even trained judges exhibit tendencies that inadvertently influence scores. For instance, researchers have found that some raters tend to be more severe, giving lower scores across the board, while others tend to be more lenient, inflating scores for all students. Thus, these scores are not only representative of student ability but also of rater effects.
Ultimately, one does not want scores to depend on which rater a student is assigned; therefore, we need to know about rater tendencies to assign particular scores. Rasch measurement models can be used to better understand rater effects, such as severity and leniency. Such information may, therefore, allow us to account for these effects when assigning students a score. Said differently, understanding rater tendencies will allow us to assign scores that are a better indication of student ability.
The present study investigates rater effects of an arts and humanities assessment at James Madison University. Ninety-one students watched a foreign video and responded to five open-ended questions. Five experts, arts and humanities faculty, received rater training to calibrate them to a rubric over the summer. The present study applies the Rasch models to investigate the variation in rater severity and differences in how raters are applying the scoring rubric.