Preferred Name
Jack Gilmore
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
ORCID
http://0009-0005-3698-4245
Date of Graduation
8-16-2025
Semester of Graduation
Summer
Degree Name
Master of Arts (MA)
Department
Department of Psychology
First Advisor
Yu Bao
Second Advisor
Dena Pa
Third Advisor
Joseph Kush
Fourth Advisor
Susan Lottridge
Abstract
This study examines the effectiveness of using isolation forest to identify aberrant responses in a low-stakes multiple-choice assessment and compares it to a traditional response time threshold method (NT20). In low-stakes contexts, students are prone to not providing their best effort, ultimately threatening the validity of test score interpretations. While response time thresholds can identify guesses, they fail to capture other potentially validity-threatening responses. By identifying aberrant responses, we can determine an estimated upper limit of validity-threatening responses. Data comprised of 1,763 undergraduate students was analyzed to determine if isolation forest could identify aberrant responses (beyond rapid guesses), what properties these aberrant responses had, and how data would change as a result of filtering aberrant responses. Students completed a 30-item low-stakes assessment on information literacy. For each item, rapid guesses were identified using NT20, and aberrant responses were identified using an isolation forest model. Each isolation forest model was trained on 11 attributes, including item-level (e.g., response time, score), and person-level attributes (e.g., test order, self-reported effort). Isolation forest showed promise in detecting aberrant responses, as aberrant responses meaningfully differed from non-aberrant responses in anomaly scores and descriptive information. However, using t-distributed Stochastic Neighbor Embedding (t-SNE) and internal clustering indices (silhouette scores and Davies-Bouldin Index) revealed that aberrant responses lacked compact clustering, suggesting considerable heterogeneity and no clear subtypes of aberrant responses. To compare the effects of filtering various responses, we analyzed three datasets: one unfiltered, one with rapid guesses filtered, and one with aberrant responses filtered. As expected, most rapid guesses were a subset of aberrant responses. Notably, aberrant responses were nearly twice as prevalent as rapid guesses. When rapid guesses and aberrant responses were filtered from the data, reliability was reduced. Ultimately, findings suggest that isolation forest may offer a valuable alternative to traditional methods by capturing a broader range of potentially problematic responses in low-stakes assessments. However, future research is needed to better understand the types of aberrant responses detected and to establish clearer guidelines for filtering.
