Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
ORCID
https://orcid.org/0000-0001-6138-2848
Date of Graduation
5-6-2021
Semester of Graduation
Spring
Degree Name
Master of Arts (MA)
Department
Department of Graduate Psychology
Second Advisor
Christine DeMars
Third Advisor
S. Jeanne Horst
Fourth Advisor
Kathryne McConnell
Abstract
Performance assessments require examinees to carry out a process or produce a product and can be designed to have high fidelity to real-world application of higher-order skills. As such, performance assessments are highly valued in higher education settings. However, performance assessment is vulnerable to psychometric challenges that threaten the validity of scores due to the subjective nature of the scoring process. Specifically, raters must exercise judgement to provide scores to examinee work, which may be impacted by rater effects, or systematic differences in how raters evaluate performance assessment artifacts. Research has indicated that performance assessment may never be fully free from errors in rater judgement. Consequently, additional quality control measures are investigated in the hopes of reducing the impact of rater effects by selecting raters that have not exhibited rater effect in previous performance assessment assignments. The purpose of this project was to evaluate VALUE Institute artifact scores for diagnostic information of rater effects. The Many-Facets Rasch Measurement (MFRM) model was used to evaluate VALUE Institute scores for rater leniency/severity effects, halo effect, and restriction of range effect. Data for the 2018-2019 academic year was collected by the VALUE Institute of the Association of American Colleges and Universities (AAC&U) on two of their most popular VALUE (Valid Assessment of Learning in Undergraduate Education) Rubrics: Critical Thinking and Written Communication. A series of follow-up evaluation of MFRM indices were conducted to identify which raters were exhibiting rater effects to create a pool of preferable raters for selection who did not exhibit rater effects. Findings showed that only a few raters exhibited rater effects, building confidence in the validity of scores produced by the VALUE Institute using the VALUE Rubrics. Moreover, MFRM methods were successful in flagging initial raters for rater effects. Mixed success was experienced with follow-up frequency procedures to confirm how raters assign scores, suggesting a limitation of relying solely on frequency counts to identify rater effects. Recommendations for future research are made and the subjectivity of judgement in MFRM interpretation and classification is discussed. Ultimately, preferable raters were identified by using MFRM diagnostic information flagging raters exhibiting rater effects.