Understanding Rater Effects Using Rasch Measurement Models

Presenter Information

Yelisey ShapovalovFollow

Faculty Advisor Name

John D. Hathcoat

Department

Department of Graduate Psychology

Description

Experts in the field of arts and humanities tend to rate performance of student work. Faculty experts are believed to provide an accurate judgement of student ability due to their depth of disciplinary knowledge. Furthermore, judges are usually provided training to become familiar with the scoring rubric so that they can apply criteria to student work systematically. However, even trained judges exhibit tendencies that inadvertently influence scores. For instance, researchers have found that some raters tend to be more severe, giving lower scores across the board, while others tend to be more lenient, inflating scores for all students. Thus, these scores are not only representative of student ability but also of rater effects.

Ultimately, one does not want scores to depend on which rater a student is assigned; therefore, we need to know about rater tendencies to assign particular scores. Rasch measurement models can be used to better understand rater effects, such as severity and leniency. Such information may, therefore, allow us to account for these effects when assigning students a score. Said differently, understanding rater tendencies will allow us to assign scores that are a better indication of student ability.

The present study investigates rater effects of an arts and humanities assessment at James Madison University. Ninety-one students watched a foreign video and responded to five open-ended questions. Five experts, arts and humanities faculty, received rater training to calibrate them to a rubric over the summer. The present study applies the Rasch models to investigate the variation in rater severity and differences in how raters are applying the scoring rubric.

This document is currently not available here.

Share

COinS
 

Understanding Rater Effects Using Rasch Measurement Models

Experts in the field of arts and humanities tend to rate performance of student work. Faculty experts are believed to provide an accurate judgement of student ability due to their depth of disciplinary knowledge. Furthermore, judges are usually provided training to become familiar with the scoring rubric so that they can apply criteria to student work systematically. However, even trained judges exhibit tendencies that inadvertently influence scores. For instance, researchers have found that some raters tend to be more severe, giving lower scores across the board, while others tend to be more lenient, inflating scores for all students. Thus, these scores are not only representative of student ability but also of rater effects.

Ultimately, one does not want scores to depend on which rater a student is assigned; therefore, we need to know about rater tendencies to assign particular scores. Rasch measurement models can be used to better understand rater effects, such as severity and leniency. Such information may, therefore, allow us to account for these effects when assigning students a score. Said differently, understanding rater tendencies will allow us to assign scores that are a better indication of student ability.

The present study investigates rater effects of an arts and humanities assessment at James Madison University. Ninety-one students watched a foreign video and responded to five open-ended questions. Five experts, arts and humanities faculty, received rater training to calibrate them to a rubric over the summer. The present study applies the Rasch models to investigate the variation in rater severity and differences in how raters are applying the scoring rubric.