Model Selection Indices for Polytomous ItemsKang, Taehoon; Cohen, Allan S.; Sung, Hyun-Jung
doi: 10.1177/0146621608327800pmid: N/A
This study examines the utility of four indices for use in model selection with nested and nonnested polytomous item response theory (IRT) models: a cross-validation index and three information-based indices. Four commonly used polytomous IRT models are considered: the graded response model, the generalized partial credit model, the partial credit model, and the rating scale model. In a simulation study, comparisons among the four indices suggest that model selection is dependent to some extent on the particular conditions simulated. Overall, the Bayesian information criterion index appears to be most accurate in selecting the correct polytomous IRT model. Results are presented from analysis of a real data set to illustrate the use of the four indices for selecting an appropriate model.
Posterior Predictive Model Checking for Multidimensionality in Item Response TheoryLevy, Roy; Mislevy, Robert J.; Sinharay, Sandip
doi: 10.1177/0146621608329504pmid: N/A
If data exhibit multidimensionality, key conditional independence assumptions of unidimensional models do not hold. The current work pursues posterior predictive model checking, a flexible family of model-checking procedures, as a tool for criticizing models due to unaccounted for dimensions in the context of item response theory. Factors hypothesized to influence dimensionality and dimensionality assessment are couched in conditional covariance theory and conveyed via geometric representations of multidimensionality. A simulation study investigates the performance of the model-checking tools for dichotomous observables. Key findings include support for the hypothesized effects of the manipulated factors with regard to their influence on dimensionality assessment and the superiority of certain discrepancy measures for conducting posterior predictive model checking for dimensionality assessment.
Testing for Differential Item Functioning With Measures of Partial AssociationWoods, Carol M.
doi: 10.1177/0146621608329506pmid: N/A
Differential item functioning (DIF) occurs when an item on a test or questionnaire has different measurement properties for one group of people versus another, irrespective of mean differences on the construct. There are many methods available for DIF assessment. The present article is focused on indices of partial association. A family of average conditional ordinal association measures described by Quade is described and empirically compared to the Mantel-Haenszel (MH) test, the partial Pearson correlation pr, and the Spearman rank correlation prs. Because coefficients of linear association are not meaningful for the binary and ordinal variables usually used in DIF applications, practitioners are urged to seek alternatives to pr and prs. Some of the Quade-family measures are viable alternatives and performed as well as, or better than, the established MH test. Computer code for calculating the average conditional ordinal measures using the free R program is given.
Locally Dependent Linear Logistic Test Model With Person CovariatesIp, Edward H.; Smits, Dirk J. M.; De Boeck, Paul
doi: 10.1177/0146621608326424pmid: N/A
The article proposes a family of item-response models that allow the separate and independent specification of three orthogonal components: item attribute, person covariate, and local item dependence. Special interest lies in extending the linear logistic test model, which is commonly used to measure item attributes, to tests with embedded item clusters. The problem of local item dependence arises in item clusters. Existing methods for handling such dependence, however, often fail to satisfy the property of invariant marginal interpretation of the item attribute parameters. Although such a property may not be necessary for applications that focus on predictive analysis, it is critical for linear logistic test models. To achieve the marginal property, we implement an iterative estimation method, which is illustrated using data collected from an inventory on verbal aggressiveness.