Preferred Name

Sean York

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

ORCID

http://orcid.org/0009-0009-5851-6425

Date of Graduation

5-9-2024

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Department of Graduate Psychology

Advisor(s)

Dena A. Pastor

Yu Bao

Joseph Kush

Susan Lottridge

Abstract

In this study, I use recurrent neural networks (RNN) and modern natural language processing techniques to analyze clickstream activity data from an automated, online, mastery learning platform called the Indiana University Plagiarism Tutorials and Tests (IPTAT). IPTAT offers learning materials and certification tests on recognizing and avoiding plagiarism, and has awarded over 1.17 million certificates since 2016. Learners may use or not use resources as they wish, but must pass a randomized test drawn from a large item pool to obtain a certificate. Most learners pass after one to seven test attempts, but some take as many as 50 randomized tests before passing. Grounded in the literature and methods of Learning Analytics and Educational Data Mining, and with particular attention to temporal and sequential features of IPTAT clickstream data, I model the journeys of tens of thousands of individual learners as they interact with informational, instructional, and assessment resources on the site. I use these data as inputs to train RNN models to predict the number of test attempts remaining after each test attempt given a learner’s prior activity on the site. I compare the predictive ability of three RNN models, each trained on different representations of sequence and time. Though the ranking of models on RMSE was as expected, the overall predictive ability of the models was poor. I explore possible reasons for the results and describe avenues for future work.

Available for download on Sunday, March 01, 2026

Share

COinS