Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.


Date of Graduation

Spring 2018

Document Type


Degree Name

Doctor of Philosophy (PhD)


Department of Graduate Psychology


Sonia J. Horst


In applied intervention studies, researchers frequently aim to make inferences about the impact of a treatment program on participants. However, applied researchers are often faced with threats to the internal validity of their studies, or the extent to which changes in participants’ outcomes can be attributed to the intervention. When researchers are unable to randomly assign study participants to treatment conditions, changes in the intervention outcome might be confounded with systematic differences in participants’ baseline characteristics. Propensity score matching is one technique that allows researchers to account for threats to the internal validity of a study. Specifically, using propensity score matching methods, researchers construct a qualitatively-similar comparison group based on participants’ characteristics at baseline (i.e., covariates).

In addition to threats to the internal validity of a study, measurement error is a reality with which many applied researchers must contend. However, research on the impact of covariate score measurement error on the quality of matches and the accuracy of treatment effect estimates is sparse in the propensity score matching literature. Consequently, the purpose of the current study was to evaluate how different levels and types of measurement error impacted the quality of propensity score matched groups and the accuracy of treatment effect estimates.

A simulation study was conducted to manipulate both the levels of measurement error (e.g., 10% versus 60% unreliability) and the types of measurement error (e.g., treatment and comparison group scores measured with the same level of reliability versus different levels of reliability). Four common propensity score matching methods were then used to create comparison groups, including nearest neighbor matching, nearest neighbor matching with a 0.2 caliper, optimal matching, and Mahalanobis distance matching. Numeric diagnostic information and the accuracy of treatment effect estimates were then evaluated. When unreliable covariates were included in the model, the final propensity score matched groups appeared balanced on the unreliable covariates. However, propensity score matching was not able to appropriately account for the full influence of the covariates on treatment effect estimates. That is, as the level of measurement error increased, the estimated treatment effect also increased, resulting in a higher estimated treatment effect than the simulated treatment effect.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.