Applying a Multiple Comparison Control to IRT Item-fit Testing
Faculty Advisor Name
Christine DeMars
Description
Many different item response theory (IRT) models exist, but simply because an IRT model can be applied to data does not mean that it represents the data well. Thus, using IRT requires assessment of the extent to which the data fit the model. Specifically, the S- χ2 (Orlando & Thissen, 2000; Orlando & Thissen, 2003) has been shown to perform well and control Type I error. Yet, even though the S- χ2 may control Type I error for each item, no research has been done to examine its control of familywise Type I error rate: the probability of rejecting at least one null hypothesis erroneously for a set of statistical tests. As the number of item-fit tests increases, so does the familywise Type I error. The Benjamini-Hochberg method (BH; Benjamini & Hochberg, 1995) is a relatively new method for controlling familywise Type I error similar to the Bonferroni correction that may have utility to reduce errors while maintaining high power to detect misfitting items. The research questions addressed in this paper are (a) does the S- χ2 control Type I error for varying percentages of misfitting items; (b) what are the familywise Type I error rates with no correction, the Bonferroni procedure, and the BH correction; and (c) what power levels are achieved with no correction, the Bonferroni procedure, and the BH correction?
To assess the research questions, a simulation study will be conducted. Specifically, the length of test (20, 40, or 60 items), percentage of misfitting items (0, 15, or 30), and sample size (500, 1000, or 2000) will all be varied to determine the effectiveness of the S- χ2 coupled with the BH correction to identify misfitting items while controlling familywise Type I error. Item misfit will be simulated to follow several different shapes to determine if the degree of misfit influences the results. For conditions with no misfitting items, Type I error will be computed for each item and overall familywise Type I error will also be reported. When both fitting and misfitting items are simulated, power to detect misfitting items, familywise Type I error, and false hit rates (i.e., Type I errors for fitting items) will be reported. All data simulation and analysis will be conducted in R (R Core Team, 2018). Expected results include adequate Type I error and familywise Type I error control with minimal decreases in power. As the sample size and degree of misfit increase, the ability to correctly identify the misfitting items should also increase.
Applying a Multiple Comparison Control to IRT Item-fit Testing
Many different item response theory (IRT) models exist, but simply because an IRT model can be applied to data does not mean that it represents the data well. Thus, using IRT requires assessment of the extent to which the data fit the model. Specifically, the S- χ2 (Orlando & Thissen, 2000; Orlando & Thissen, 2003) has been shown to perform well and control Type I error. Yet, even though the S- χ2 may control Type I error for each item, no research has been done to examine its control of familywise Type I error rate: the probability of rejecting at least one null hypothesis erroneously for a set of statistical tests. As the number of item-fit tests increases, so does the familywise Type I error. The Benjamini-Hochberg method (BH; Benjamini & Hochberg, 1995) is a relatively new method for controlling familywise Type I error similar to the Bonferroni correction that may have utility to reduce errors while maintaining high power to detect misfitting items. The research questions addressed in this paper are (a) does the S- χ2 control Type I error for varying percentages of misfitting items; (b) what are the familywise Type I error rates with no correction, the Bonferroni procedure, and the BH correction; and (c) what power levels are achieved with no correction, the Bonferroni procedure, and the BH correction?
To assess the research questions, a simulation study will be conducted. Specifically, the length of test (20, 40, or 60 items), percentage of misfitting items (0, 15, or 30), and sample size (500, 1000, or 2000) will all be varied to determine the effectiveness of the S- χ2 coupled with the BH correction to identify misfitting items while controlling familywise Type I error. Item misfit will be simulated to follow several different shapes to determine if the degree of misfit influences the results. For conditions with no misfitting items, Type I error will be computed for each item and overall familywise Type I error will also be reported. When both fitting and misfitting items are simulated, power to detect misfitting items, familywise Type I error, and false hit rates (i.e., Type I errors for fitting items) will be reported. All data simulation and analysis will be conducted in R (R Core Team, 2018). Expected results include adequate Type I error and familywise Type I error control with minimal decreases in power. As the sample size and degree of misfit increase, the ability to correctly identify the misfitting items should also increase.