The Explanatory Generalized Graded Unfolding Model: Incorporating Collateral Information to Improve the Latent Trait Estimation AccuracyJoo, Seang-Hwane; Lee, Philseok; Stark, Stephen
doi: 10.1177/01466216211051717pmid: 34898744
Collateral information has been used to address subpopulation heterogeneity and increase estimation accuracy in some large-scale cognitive assessments. The methodology that takes collateral information into account has not been developed and explored in published research with models designed specifically for noncognitive measurement. Because the accurate noncognitive measurement is becoming increasingly important, we sought to examine the benefits of using collateral information in latent trait estimation with an item response theory model that has proven valuable for noncognitive testing, namely, the generalized graded unfolding model (GGUM). Our presentation introduces an extension of the GGUM that incorporates collateral information, henceforth called Explanatory GGUM. We then present a simulation study that examined Explanatory GGUM latent trait estimation as a function of sample size, test length, number of background covariates, and correlation between the covariates and the latent trait. Results indicated the Explanatory GGUM approach provides scoring accuracy and precision superior to traditional expected a posteriori (EAP) and full Bayesian (FB) methods. Implications and recommendations are discussed.
The Lack of Robustness of a Statistic Based on the Neyman–Pearson Lemma to Violations of Its Underlying AssumptionsSinharay, Sandip
doi: 10.1177/01466216211049209pmid: 34898745
Drasgow, Levine, and Zickar (1996) suggested a statistic based on the Neyman–Pearson lemma (NPL; e.g., Lehmann & Romano, 2005, p. 60) for detecting preknowledge on a known set of items. The statistic is a special case of the optimal appropriateness indices (OAIs) of Levine and Drasgow (1988) and is the most powerful statistic for detecting item preknowledge when the assumptions underlying the statistic hold for the data (e.g., Belov, 2016Belov, 2016; Drasgow et al., 1996). This paper demonstrated using real data analysis that one assumption underlying the statistic of Drasgow et al. (1996) is often likely to be violated in practice. This paper also demonstrated, using simulated data, that the statistic is not robust to realistic violations of its underlying assumptions. Together, the results from the real data and the simulations demonstrate that the statistic of Drasgow et al. (1996) may not always be the optimum statistic in practice and occasionally has smaller power than another statistic for detecting preknowledge on a known set of items, especially when the assumptions underlying the former statistic do not hold. The findings of this paper demonstrate the importance of keeping in mind the assumptions underlying and the limitations of any statistic or method.
Quantifying the Distorting Effect of Rapid Guessing on Estimates of Coefficient ΑlphaRios, Joseph A.; Deng, Jiayi
doi: 10.1177/01466216211051719pmid: 34898746
An underlying threat to the validity of reliability measures is the introduction of systematic variance in examinee scores from unintended constructs that differ from those assessed. One construct-irrelevant behavior that has gained increased attention in the literature is rapid guessing (RG), which occurs when examinees answer quickly with intentional disregard for item content. To examine the degree of distortion in coefficient alpha due to RG, this study compared alpha estimates between conditions in which simulees engaged in full solution (i.e., do not engage in RG) versus partial RG behavior. This was done by conducting a simulation study in which the percentage and ability characteristics of rapid responders as well as the percentage and pattern of RG were manipulated. After controlling for test length and difficulty, the average degree of distortion in estimates of coefficient alpha due to RG ranged from −.04 to .02 across 144 conditions. Although slight differences were noted between conditions differing in RG pattern and RG responder ability, the findings from this study suggest that estimates of coefficient alpha are largely robust to the presence of RG due to cognitive fatigue and a low perceived probability of success.
Examining the Performance of the Trifactor Model for Multiple RatersSoland, James; Kuhfeld, Megan
doi: 10.1177/01466216211051728pmid: 34898747
Researchers in the social sciences often obtain ratings of a construct of interest provided by multiple raters. While using multiple raters provides a way to help avoid the subjectivity of any given person’s responses, rater disagreement can be a problem. A variety of models exist to address rater disagreement in both structural equation modeling and item response theory frameworks. Recently, a model was developed by Bauer et al. (2013) and referred to as the “trifactor model” to provide applied researchers with a straightforward way of estimating scores that are purged of variance that is idiosyncratic by rater. Although the intent of the model is to be usable and interpretable, little is known about the circumstances under which it performs well, and those it does not. We conduct simulation studies to examine the performance of the trifactor model under a range of sample sizes and model specifications and then compare model fit, bias, and convergence rates.
DIFSIB: A SIBTEST PackageWeese, James D.
doi: 10.1177/01466216211040498pmid: 34898748
The R package DIFSIB provides a direct translated version of the SIBTEST, Crossing- SIBTEST, and POLYSIBTEST procedures that were last updated and released in 2005. Having these functions directly written from Fortran into R code will allow researchers and practitioners to easily access the most recent versions of these procedures when they are conducting differential item functioning analysis and continue to improve the software more easily.
autoFC: An R Package for Automatic Item Pairing in Forced-Choice Test ConstructionLi, Mengtong; Sun, Tianjun; Zhang, Bo
doi: 10.1177/01466216211051726pmid: 34898749
Recently, there has been increasing interest in adopting the forced-choice (FC) test format in non-cognitive assessments, as it demonstrates faking resistance when well-designed. However, traditional or manual pairing approaches to FC test construction are time- and effort- intensive and often involve insufficient considerations. To address these issues, we developed the new open-source autoFC R package to facilitate automated and optimized item pairing strategies. The autoFC package is intended as a practical tool for FC test constructions. Users can easily obtain automatically optimized FC tests by simply inputting the item characteristics of interest. Customizations are also available for considerations on matching rules and the behaviors of the optimization process. The autoFC package should be of interest to researchers and practitioners constructing FC scales with potentially many metrics to match on and/or many items to pair, essentially exempting users from the burden of manual item pairing and reducing the computational costs and biases induced by simple ranking methods.