Context Matters: The Impact of External Events on Low-Stakes Assessment

Faculty Advisor Name

Dr. Christine DeMars

Department

Department of Graduate Psychology

Description

This project investigates validity concerns of assessment results that are due to of low effort, common with low-stakes assessment, and additional contextual factors: different environmental and personal factors that impact test-taking effort and subsequent scores. Low stakes-assessment is a common way universities assess general education programming and student learning outcomes. However, students have fewer personal consequences with low-stakes assessment, so they are less motivated to put forth their best effort (Wise, 2019). Any systematic disruptions, like effort or additional contextual factors that affect scores on assessment, impacts the validity of scores. Validity is concerned with what scores mean and how they are interpreted considering that meaning. If we incorrectly interpret scores due to these factors, programmatic and institutional decisions can be impacted.

We examined if student effort and assessment performance varied over time, potentially complicating the interpretation of cross-cohort comparisons. JMU’s Assessment Day results for three different assessments were analyzed to compare results from different semesters and different cohorts of students (first-year and second year[1] students entering or continuing their time in the university between 2020 and 2022). We investigated the following research questions:

  1. Did students in different semesters differ in how long they spent on the tests?
  2. Did students in Spring 21 and students in Spring 22 differ in their test scores? Did students in different Fall semesters differ in their test scores?
  3. For students who took the same test in Fall 20 and Spring 22, are differences in time spent testing related to score gains from pretest to posttest?

The selected period of test administrations presented a unique opportunity to explore how changing context may impact large-scale, low-stakes assessment; we paid special attention to the Spring 2022 semester as students were faced with news of a fatal campus shooting nearby, more than one suicide on JMU’s campus, and a later-redacted announcement ‘cancelling’ Assessment Day. Students were confused about their participation while grieving the loss of fellow community members. In this context, it is likely the validity of test scores was affected.

We found that time spent on the tests did vary across semesters, and this variance in times was also associated with variance in scores. This calls into question comparisons of the test scores of different cohorts. When one group scores higher than another, it is difficult to know if the higher-scoring group really knew the content better or was more focused and engaged in the test. It also makes comparisons of the same students over time difficult. Students who gave more effortful responses at pretest than posttest may appear to regress in their knowledge. These contextual factors must be considered when interpreting assessment results.

Wise, S.L. (2019) Controlling construct-irrelevant factors through computer-based testing: disengagement, anxiety, & cheating, Education Inquiry, 10:1, 21-33, DOI: 10.1080/20004508.2018.1490127

[1] We use the descriptor second year for brevity: this group includes students with 45-70 credits before the Spring semester. The group thus includes some 1st year students who took college credit concurrently with high school, as well as some 3rd year students who did not earn quite enough credits to be tested in their 2nd year.

This document is currently not available here.

Share

COinS
 

Context Matters: The Impact of External Events on Low-Stakes Assessment

This project investigates validity concerns of assessment results that are due to of low effort, common with low-stakes assessment, and additional contextual factors: different environmental and personal factors that impact test-taking effort and subsequent scores. Low stakes-assessment is a common way universities assess general education programming and student learning outcomes. However, students have fewer personal consequences with low-stakes assessment, so they are less motivated to put forth their best effort (Wise, 2019). Any systematic disruptions, like effort or additional contextual factors that affect scores on assessment, impacts the validity of scores. Validity is concerned with what scores mean and how they are interpreted considering that meaning. If we incorrectly interpret scores due to these factors, programmatic and institutional decisions can be impacted.

We examined if student effort and assessment performance varied over time, potentially complicating the interpretation of cross-cohort comparisons. JMU’s Assessment Day results for three different assessments were analyzed to compare results from different semesters and different cohorts of students (first-year and second year[1] students entering or continuing their time in the university between 2020 and 2022). We investigated the following research questions:

  1. Did students in different semesters differ in how long they spent on the tests?
  2. Did students in Spring 21 and students in Spring 22 differ in their test scores? Did students in different Fall semesters differ in their test scores?
  3. For students who took the same test in Fall 20 and Spring 22, are differences in time spent testing related to score gains from pretest to posttest?

The selected period of test administrations presented a unique opportunity to explore how changing context may impact large-scale, low-stakes assessment; we paid special attention to the Spring 2022 semester as students were faced with news of a fatal campus shooting nearby, more than one suicide on JMU’s campus, and a later-redacted announcement ‘cancelling’ Assessment Day. Students were confused about their participation while grieving the loss of fellow community members. In this context, it is likely the validity of test scores was affected.

We found that time spent on the tests did vary across semesters, and this variance in times was also associated with variance in scores. This calls into question comparisons of the test scores of different cohorts. When one group scores higher than another, it is difficult to know if the higher-scoring group really knew the content better or was more focused and engaged in the test. It also makes comparisons of the same students over time difficult. Students who gave more effortful responses at pretest than posttest may appear to regress in their knowledge. These contextual factors must be considered when interpreting assessment results.

Wise, S.L. (2019) Controlling construct-irrelevant factors through computer-based testing: disengagement, anxiety, & cheating, Education Inquiry, 10:1, 21-33, DOI: 10.1080/20004508.2018.1490127

[1] We use the descriptor second year for brevity: this group includes students with 45-70 credits before the Spring semester. The group thus includes some 1st year students who took college credit concurrently with high school, as well as some 3rd year students who did not earn quite enough credits to be tested in their 2nd year.