ORCID

https://orcid.org/0000-0001-6138-2848

Date of Graduation

5-6-2021

Semester of Graduation

Spring

Degree Name

Master of Arts (MA)

Department

Department of Graduate Psychology

Second Advisor

Christine DeMars

Third Advisor

S. Jeanne Horst

Fourth Advisor

Kathryne McConnell

Abstract

Performance assessments require examinees to carry out a process or produce a product and can be designed to have high fidelity to real-world application of higher-order skills. As such, performance assessments are highly valued in higher education settings. However, performance assessment is vulnerable to psychometric challenges that threaten the validity of scores due to the subjective nature of the scoring process. Specifically, raters must exercise judgement to provide scores to examinee work, which may be impacted by rater effects, or systematic differences in how raters evaluate performance assessment artifacts. Research has indicated that performance assessment may never be fully free from errors in rater judgement. Consequently, additional quality control measures are investigated in the hopes of reducing the impact of rater effects by selecting raters that have not exhibited rater effect in previous performance assessment assignments. The purpose of this project was to evaluate VALUE Institute artifact scores for diagnostic information of rater effects. The Many-Facets Rasch Measurement (MFRM) model was used to evaluate VALUE Institute scores for rater leniency/severity effects, halo effect, and restriction of range effect. Data for the 2018-2019 academic year was collected by the VALUE Institute of the Association of American Colleges and Universities (AAC&U) on two of their most popular VALUE (Valid Assessment of Learning in Undergraduate Education) Rubrics: Critical Thinking and Written Communication. A series of follow-up evaluation of MFRM indices were conducted to identify which raters were exhibiting rater effects to create a pool of preferable raters for selection who did not exhibit rater effects. Findings showed that only a few raters exhibited rater effects, building confidence in the validity of scores produced by the VALUE Institute using the VALUE Rubrics. Moreover, MFRM methods were successful in flagging initial raters for rater effects. Mixed success was experienced with follow-up frequency procedures to confirm how raters assign scores, suggesting a limitation of relying solely on frequency counts to identify rater effects. Recommendations for future research are made and the subjectivity of judgement in MFRM interpretation and classification is discussed. Ultimately, preferable raters were identified by using MFRM diagnostic information flagging raters exhibiting rater effects.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.