Applied Psychological Measurement

Applied Psychological Measurement | DeepDyve

journal article

LitStream Collection

The Explanatory Generalized Graded Unfolding Model: Incorporating Collateral Information to Improve the Latent Trait Estimation Accuracy

Joo, Seang-Hwane; Lee, Philseok; Stark, Stephen

2022 Applied Psychological Measurement

doi: 10.1177/01466216211051717pmid: 34898744

Collateral information has been used to address subpopulation heterogeneity and increase estimation accuracy in some large-scale cognitive assessments. The methodology that takes collateral information into account has not been developed and explored in published research with models designed specifically for noncognitive measurement. Because the accurate noncognitive measurement is becoming increasingly important, we sought to examine the benefits of using collateral information in latent trait estimation with an item response theory model that has proven valuable for noncognitive testing, namely, the generalized graded unfolding model (GGUM). Our presentation introduces an extension of the GGUM that incorporates collateral information, henceforth called Explanatory GGUM. We then present a simulation study that examined Explanatory GGUM latent trait estimation as a function of sample size, test length, number of background covariates, and correlation between the covariates and the latent trait. Results indicated the Explanatory GGUM approach provides scoring accuracy and precision superior to traditional expected a posteriori (EAP) and full Bayesian (FB) methods. Implications and recommendations are discussed.

journal article

LitStream Collection

The Lack of Robustness of a Statistic Based on the Neyman–Pearson Lemma to Violations of Its Underlying Assumptions

Sinharay, Sandip

2022 Applied Psychological Measurement

doi: 10.1177/01466216211049209pmid: 34898745

Drasgow, Levine, and Zickar (1996) suggested a statistic based on the Neyman–Pearson lemma (NPL; e.g., Lehmann & Romano, 2005, p. 60) for detecting preknowledge on a known set of items. The statistic is a special case of the optimal appropriateness indices (OAIs) of Levine and Drasgow (1988) and is the most powerful statistic for detecting item preknowledge when the assumptions underlying the statistic hold for the data (e.g., Belov, 2016Belov, 2016; Drasgow et al., 1996). This paper demonstrated using real data analysis that one assumption underlying the statistic of Drasgow et al. (1996) is often likely to be violated in practice. This paper also demonstrated, using simulated data, that the statistic is not robust to realistic violations of its underlying assumptions. Together, the results from the real data and the simulations demonstrate that the statistic of Drasgow et al. (1996) may not always be the optimum statistic in practice and occasionally has smaller power than another statistic for detecting preknowledge on a known set of items, especially when the assumptions underlying the former statistic do not hold. The findings of this paper demonstrate the importance of keeping in mind the assumptions underlying and the limitations of any statistic or method.

journal article

LitStream Collection

Quantifying the Distorting Effect of Rapid Guessing on Estimates of Coefficient Αlpha

Rios, Joseph A.; Deng, Jiayi

2022 Applied Psychological Measurement

doi: 10.1177/01466216211051719pmid: 34898746

An underlying threat to the validity of reliability measures is the introduction of systematic variance in examinee scores from unintended constructs that differ from those assessed. One construct-irrelevant behavior that has gained increased attention in the literature is rapid guessing (RG), which occurs when examinees answer quickly with intentional disregard for item content. To examine the degree of distortion in coefficient alpha due to RG, this study compared alpha estimates between conditions in which simulees engaged in full solution (i.e., do not engage in RG) versus partial RG behavior. This was done by conducting a simulation study in which the percentage and ability characteristics of rapid responders as well as the percentage and pattern of RG were manipulated. After controlling for test length and difficulty, the average degree of distortion in estimates of coefficient alpha due to RG ranged from −.04 to .02 across 144 conditions. Although slight differences were noted between conditions differing in RG pattern and RG responder ability, the findings from this study suggest that estimates of coefficient alpha are largely robust to the presence of RG due to cognitive fatigue and a low perceived probability of success.

journal article

LitStream Collection

Examining the Performance of the Trifactor Model for Multiple Raters

Soland, James; Kuhfeld, Megan

2022 Applied Psychological Measurement

doi: 10.1177/01466216211051728pmid: 34898747

Researchers in the social sciences often obtain ratings of a construct of interest provided by multiple raters. While using multiple raters provides a way to help avoid the subjectivity of any given person’s responses, rater disagreement can be a problem. A variety of models exist to address rater disagreement in both structural equation modeling and item response theory frameworks. Recently, a model was developed by Bauer et al. (2013) and referred to as the “trifactor model” to provide applied researchers with a straightforward way of estimating scores that are purged of variance that is idiosyncratic by rater. Although the intent of the model is to be usable and interpretable, little is known about the circumstances under which it performs well, and those it does not. We conduct simulation studies to examine the performance of the trifactor model under a range of sample sizes and model specifications and then compare model fit, bias, and convergence rates.

journal article

LitStream Collection

DIFSIB: A SIBTEST Package

Weese, James D.

2022 Applied Psychological Measurement

doi: 10.1177/01466216211040498pmid: 34898748

The R package DIFSIB provides a direct translated version of the SIBTEST, Crossing- SIBTEST, and POLYSIBTEST procedures that were last updated and released in 2005. Having these functions directly written from Fortran into R code will allow researchers and practitioners to easily access the most recent versions of these procedures when they are conducting differential item functioning analysis and continue to improve the software more easily.

journal article

LitStream Collection

autoFC: An R Package for Automatic Item Pairing in Forced-Choice Test Construction

Li, Mengtong; Sun, Tianjun; Zhang, Bo

2022 Applied Psychological Measurement

doi: 10.1177/01466216211051726pmid: 34898749

Recently, there has been increasing interest in adopting the forced-choice (FC) test format in non-cognitive assessments, as it demonstrates faking resistance when well-designed. However, traditional or manual pairing approaches to FC test construction are time- and effort- intensive and often involve insufficient considerations. To address these issues, we developed the new open-source autoFC R package to facilitate automated and optimized item pairing strategies. The autoFC package is intended as a practical tool for FC test constructions. Users can easily obtain automatically optimized FC tests by simply inputting the item characteristics of interest. Customizations are also available for considerations on matching rules and the behaviors of the optimization process. The autoFC package should be of interest to researchers and practitioners constructing FC scales with potentially many metrics to match on and/or many items to pair, essentially exempting users from the burden of manual item pairing and reducing the computational costs and biases induced by simple ranking methods.

journal article

LitStream Collection

maat: An R Package for Multiple Administrations Adaptive Testing

Choi, Seung W.; Lim, Sangdon; Niu, Luping; Lee, Sooyong; Schneider, Christina M.; Lee, Jay; Gianopulos, Garron J.

2022 Applied Psychological Measurement

doi: 10.1177/01466216211049212pmid: 34898750

Multiple Administrations Adaptive Testing (MAAT) is an extension of the shadow-test approach to CAT for the assessment framework involving multiple tests administered periodically throughout the year. The maat package utilizes multiple item pools vertically scaled across grades and multiple phases (stages) within each test administration, allowing for transitioning from an item pool to another as deemed necessary to further enhance the quality of assessment.

Showing 1 to 7 of 7 Articles

Articles per page

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

1989

1988

1987

1986

1985

1984

1983

1982

1981

1980

1979

1978

1977

Related Journals: