The Impact of Undesirable Distractors on Estimates of Ability

Faculty Advisor Name

Brian C Leventhal

Department

Department of Graduate Psychology

Description

Item analysis is a common practice in educational testing, yet test-developers continue to neglect the analysis of incorrect options (distractors) in multiple-choice (MC) items (Gierl, Bulut, Guo, & Zhang, 2017; Sideridis, Tsaousis, & Harbi, 2016; Thissen, Steinberg, & Fitzpatrick, 1989). Well-written and high performing distractors result in high quality items and provide diagnostic information about examinees (Gierl et al., 2017). To help write better distractors, it is important to identify the type of distractor through a distractor analysis.

We have identified and characterized undesirable distractors as distractors that a test-developer would not want to see after performing an item analysis. We have divided them into four categories: implausible, equivalent, upper lure, and lower lure. Implausible distractors are hardly selected by examinees, meaning there is no additional information that these distractors provide. Equivalent distractors have a probability of being selected by all examinees, regardless of ability. They provide no information that allows us to distinguish between low and high ability examinees. Upper lure distractors are selected by high ability examinees. This is problematic as we assume that high ability examinees should select the correct answer. Lower lure distractors share a similar pattern with upper lure distractors. High ability examinees still have a higher probability of selecting the correct answer, but a large proportion of high ability examinees still select the distractor.

Traditional dichotomous item response theory (IRT) models cannot be used to examine distractor functioning. Instead, polytomous IRT models provide a way to evaluate how all options of a MC item are functioning (Drasgow, Levine, Tsien, Williams, & Mead, 1995). Thissen and Steinberg’s (1984) multiple-choice model (TSMCM) extended Bock’s (1972) nominal response model and Samejima’s (1979) multiple-choice model to include guessing that varies across options for each item.

We performed a simulation study to investigate what effects undesirable distractors have on ability estimates. Using the TSMCM to generate data, we investigated the following factors:

  • Frequency of items containing undesirable distractors [10%, 30%, 50%]
  • Test length [30, 50, 100 items]
  • Type of undesirable distractor [implausible, equivalent, upper lure, lower lure]

We selected these factors because they align with real-word standardized testing situations. We fixed sample size (N=2000) and the number of options (3). We generated data for ability and analyzed our item responses using the three-parameter IRT model across 1000 replications. We calculated indices of total score bias and standard error to determine the accuracy of our results. These results were then analyzed using a fully crossed 3x3x4 ANCOVA with ability as a covariate.

The types of undesirable distractors have practical effects on estimates of ability. These effects varied across levels of the frequency of items containing undesirable distractors and test length while controlling for ability. We recommend that test-developers should ensure that MC items do not contain undesirable distractors due to the influence of bias and accuracy on ability estimates.

This document is currently not available here.

Share

COinS
 

The Impact of Undesirable Distractors on Estimates of Ability

Item analysis is a common practice in educational testing, yet test-developers continue to neglect the analysis of incorrect options (distractors) in multiple-choice (MC) items (Gierl, Bulut, Guo, & Zhang, 2017; Sideridis, Tsaousis, & Harbi, 2016; Thissen, Steinberg, & Fitzpatrick, 1989). Well-written and high performing distractors result in high quality items and provide diagnostic information about examinees (Gierl et al., 2017). To help write better distractors, it is important to identify the type of distractor through a distractor analysis.

We have identified and characterized undesirable distractors as distractors that a test-developer would not want to see after performing an item analysis. We have divided them into four categories: implausible, equivalent, upper lure, and lower lure. Implausible distractors are hardly selected by examinees, meaning there is no additional information that these distractors provide. Equivalent distractors have a probability of being selected by all examinees, regardless of ability. They provide no information that allows us to distinguish between low and high ability examinees. Upper lure distractors are selected by high ability examinees. This is problematic as we assume that high ability examinees should select the correct answer. Lower lure distractors share a similar pattern with upper lure distractors. High ability examinees still have a higher probability of selecting the correct answer, but a large proportion of high ability examinees still select the distractor.

Traditional dichotomous item response theory (IRT) models cannot be used to examine distractor functioning. Instead, polytomous IRT models provide a way to evaluate how all options of a MC item are functioning (Drasgow, Levine, Tsien, Williams, & Mead, 1995). Thissen and Steinberg’s (1984) multiple-choice model (TSMCM) extended Bock’s (1972) nominal response model and Samejima’s (1979) multiple-choice model to include guessing that varies across options for each item.

We performed a simulation study to investigate what effects undesirable distractors have on ability estimates. Using the TSMCM to generate data, we investigated the following factors:

  • Frequency of items containing undesirable distractors [10%, 30%, 50%]
  • Test length [30, 50, 100 items]
  • Type of undesirable distractor [implausible, equivalent, upper lure, lower lure]

We selected these factors because they align with real-word standardized testing situations. We fixed sample size (N=2000) and the number of options (3). We generated data for ability and analyzed our item responses using the three-parameter IRT model across 1000 replications. We calculated indices of total score bias and standard error to determine the accuracy of our results. These results were then analyzed using a fully crossed 3x3x4 ANCOVA with ability as a covariate.

The types of undesirable distractors have practical effects on estimates of ability. These effects varied across levels of the frequency of items containing undesirable distractors and test length while controlling for ability. We recommend that test-developers should ensure that MC items do not contain undesirable distractors due to the influence of bias and accuracy on ability estimates.