A Novel Method for Expediting the Development of Patient-Reported Outcome Measures and an Evaluation Across Several PopulationsGarrard, Lili; Price, Larry R.; Bott, Marjorie J.; Gajewski, Byron J.
doi: 10.1177/0146621616652634pmid: 27667878
Item response theory (IRT) models provide an appropriate alternative to the classical ordinal confirmatory factor analysis (CFA) during the development of patient-reported outcome measures (PROMs). Current literature has identified the assessment of IRT model fit as both challenging and underdeveloped. This study evaluates the performance of Ordinal Bayesian Instrument Development (OBID), a Bayesian IRT model with a probit link function approach, through applications in two breast cancer-related instrument development studies. The primary focus is to investigate an appropriate method for comparing Bayesian IRT models in PROMs development. An exact Bayesian leave-one-out cross-validation (LOO-CV) approach is implemented to assess prior selection for the item discrimination parameter in the IRT model and subject content experts’ bias (in a statistical sense and not to be confused with psychometric bias as in differential item functioning) toward the estimation of item-to-domain correlations. Results support the utilization of content subject experts’ information in establishing evidence for construct validity when sample size is small. However, the incorporation of subject experts’ content information in the OBID approach can be sensitive to the level of expertise of the recruited experts. More stringent efforts need to be invested in the appropriate selection of subject experts to efficiently use the OBID approach and reduce potential bias during PROMs development.
Optimal Reassembly of Shadow Tests in CATChoi, Seung W.; Moellering, Karin T.; Li, Jie; van der Linden, Wim J.
doi: 10.1177/0146621616654597pmid: 29881064
Even in the age of abundant and fast computing resources, concurrency requirements for large-scale online testing programs still put an uninterrupted delivery of computer-adaptive tests at risk. In this study, to increase the concurrency for operational programs that use the shadow-test approach to adaptive testing, we explored various strategies aiming for reducing the number of reassembled shadow tests without compromising the measurement quality. Strategies requiring fixed intervals between reassemblies, a certain minimal change in the interim ability estimate since the last assembly before triggering a reassembly, and a hybrid of the two strategies yielded substantial reductions in the number of reassemblies without degradation in the measurement accuracy. The strategies effectively prevented unnecessary reassemblies due to adapting to the noise in the early test stages. They also highlighted the practicality of the shadow-test approach by minimizing the computational load involved in its use of mixed-integer programming.
MIMIC Methods for Detecting DIF Among Multiple GroupsChun, Seokjoon; Stark, Stephen; Kim, Eun Sook; Chernyshenko, Oleksandr S.
doi: 10.1177/0146621616659738pmid: 29881065
A simulation study was conducted to investigate the efficacy of multiple indicators multiple causes (MIMIC) methods for multi-group uniform and non-uniform differential item functioning (DIF) detection. DIF was simulated to originate from one or more sources involving combinations of two background variables, gender and ethnicity. Three implementations of MIMIC DIF methods were compared: constrained baseline, free baseline, and a new sequential-free baseline. When the MIMIC assumption of equal factor variance across comparison groups was satisfied, the sequential-free baseline method provided excellent Type I error and power, with results similar to an idealized free baseline method that used a designated DIF-free anchor, and results much better than a constrained baseline method, which used all items other than the studied item as an anchor. However, when the equal factor variance assumption was violated, all methods showed inflated Type I error. Finally, despite the efficacy of the two free baseline methods for detecting DIF, identifying the source(s) of DIF was problematic, especially when background variables interacted.
A Dominance Variant Under the Multi-Unidimensional Pairwise-Preference FrameworkMorillo, Daniel; Leenen, Iwin; Abad, Francisco J.; Hontangas, Pedro; de la Torre, Jimmy; Ponsoda, Vicente
doi: 10.1177/0146621616662226pmid: 29881066
Forced-choice questionnaires have been proposed as a way to control some response biases associated with traditional questionnaire formats (e.g., Likert-type scales). Whereas classical scoring methods have issues of ipsativity, item response theory (IRT) methods have been claimed to accurately account for the latent trait structure of these instruments. In this article, the authors propose the multi-unidimensional pairwise preference two-parameter logistic (MUPP-2PL) model, a variant within Stark, Chernyshenko, and Drasgow’s MUPP framework for items that are assumed to fit a dominance model. They also introduce a Markov Chain Monte Carlo (MCMC) procedure for estimating the model’s parameters. The authors present the results of a simulation study, which shows appropriate goodness of recovery in all studied conditions. A comparison of the newly proposed model with a Brown and Maydeu’s Thurstonian IRT model led us to the conclusion that both models are theoretically very similar and that the Bayesian estimation procedure of the MUPP-2PL may provide a slightly better recovery of the latent space correlations and a more reliable assessment of the latent trait estimation errors. An application of the model to a real data set shows convergence between the two estimation procedures. However, there is also evidence that the MCMC may be advantageous regarding the item parameters and the latent trait correlations.
Unfolding IRT Models for Likert-Type Items With a Don’t Know OptionLiu, Chen-Wei; Wang, Wen-Chung
doi: 10.1177/0146621616664047pmid: 29881067
Attitude surveys are widely used in the social sciences. It has been argued that the underlying response process to attitude items may be more aligned with the ideal-point (unfolding) process than with the cumulative (dominance) process, and therefore, unfolding item response theory (IRT) models are more appropriate than dominance IRT models for these surveys. Missing data and don’t know (DK) responses are common in attitude surveys, and they may not be ignorable in the likelihood for parameter estimation. Existing unfolding IRT models often treat missing data or DK as missing at random. In this study, a new class of unfolding IRT models for nonignorable missing data and DK were developed, in which the missingness and DK were assumed to measure a hierarchy of latent traits, which may be correlated with the latent attitude that a test intended to measure. The Bayesian approach with Markov chain Monte Carlo methods was used to estimate the parameters of the new models. Simulation studies demonstrated that the parameters were recovered fairly well, and ignoring nonignorable missingness or DK resulted in poor parameter estimates. An empirical example of a religious belief scale about health was given.
Parameter Drift Detection in Multidimensional Computerized Adaptive Testing Based on Informational Distance/Divergence MeasuresKang, Hyeon-Ah; Chang, Hua-Hua
doi: 10.1177/0146621616663676pmid: 29881068
An informational distance/divergence-based approach is proposed to detect the presence of parameter drift in multidimensional computerized adaptive testing (MCAT). The study presents significance testing procedures for identifying changes in multidimensional item response functions (MIRFs) over time based on informational distance/divergence measures that capture the discrepancy between two probability functions. To approximate the MIRFs from the observed response data, the k-nearest neighbors algorithm is used with the random search method. A simulation study suggests that the distance/divergence-based drift measures perform effectively in identifying the instances of parameter drift in MCAT. They showed moderate power with small samples of 500 examinees and excellent power when the sample size was as large as 1,000. The proposed drift measures also adequately controlled for Type I error at the nominal level under the null hypothesis.
MCMC Z-GWang, Wei; Lee, Philseok; Joo, Seang-Hwane; Stark, Stephen; Louden, Robert
doi: 10.1177/0146621616663682pmid: 29881069
In recent years, there has been a surge of interest in measuring noncognitive constructs in educational and managerial/organizational settings. For the most part, these noncognitive constructs have been and continue to be measured using Likert-type (ordinal response) scales, which are susceptible to several types of response distortion. To deal with these response biases, researchers have proposed using forced-choice format, which requires respondents or raters to evaluate cognitive, affective, or behavioral descriptors presented in blocks of two or more. The workhorse for this measurement endeavor is the item response theory (IRT) model developed by Zinnes and Griggs (Z-G), which was first used as the basis for a computerized adaptive rating scale (CARS), and then extended by many organizational scientists. However, applications of the Z-G model outside of organizational contexts have been limited, primarily due to the lack of publicly available software for parameter estimation. This research effort addressed that need by developing a Markov chain Monte Carlo (MCMC) estimation program, called MCMC Z-G, which uses a Metropolis-Hastings-within-Gibbs algorithm to simultaneously estimate Z-G item and person parameters. This publicly available computer program MCMC Z-G can run on both Mac OS® and Windows® platforms.