Trait Parameter Recovery Using Multidimensional Computerized Adaptive Testing in Reading and MathematicsLi, Yuan H.; Schafer, William D.
doi: 10.1177/0146621604270667pmid: N/A
Under a multidimensional item response theory (MIRT) computerized adaptive testing (CAT) testing scenario, a trait estimate (θ) in one dimension will provide clues for subsequently seeking a solution in other dimensions. This feature may enhance the efficiency of MIRT CAT’s item selection and its scoring algorithms compared with its counterpart, the unidimensional CAT (UCAT). The present study used existing Reading and Math test data to generate simulated item parameters. A confirmatory item factor analysis model was applied to the data using NOHARM to produce interpretable MIRT item parameters. Results showed that MIRT CAT, conditional on the constraints, was quite capable of producing accurate estimates on both measures. Compared with UCAT, MIRT CAT slightly increased the accuracy of both trait estimates, especially for the low-level or high-level trait examinees in both measures, and reduced the rate of unused items in the item pool.
The Effect of Person Misfit on Classification DecisionsHendrawan, Irene; Glas, Cees A. W.; Meijer, Rob R.
doi: 10.1177/0146621604270902pmid: N/A
The effect of person misfit to an item response theory model on a mastery/nonmastery decision was investigated. Furthermore, it was investigated whether the classification precision can be improved by identifying misfitting respondents using person-fit statistics. A simulation study was conducted to investigate the probability of a correct classification using different cutoff points, estimation methods, person-fit statistics, model violations, test lengths, and sample sizes. The effect of the presence of misfitting item score patterns on the item parameter estimates was also taken into account. Results showed that the effect of the presence of misfitting item score patterns on the classification of nonaberrant simulees was generally small (i.e., the classification precision for these simulees did not go down). Furthermore, for simulees classified as nonaberrant using a person-fit statistic, the classification decisions were comparable with the classification decisions for actual nonaberrant simulees. These results were comparable across different person-fit statistics and estimation methods.
Estimating Johnson Curve Population Distributions in MULTILOGvan den Oord, Edwin J. C. G.
doi: 10.1177/0146621604269791pmid: N/A
The shape of the latent trait distribution can be of considerable theoretical and methodological importance. A simulation study was performed to examine the distribution of the likelihood ratio statistic that was used to test for normality via Johnson curves, the power to detect deviations from normality, and the estimation properties of the item and latent trait distribution parameters. Except in conditions in which all items had high-difficulty parameters or were dichotomous, the distribution of the statistic used to test for normality could be approximated with a chi-square distributed with 2 degrees of freedom. In a variety of situations, the power was good enough to detect even small deviations from normality. Compared to assuming a normal distribution, allowing for Johnson curve latent trait distributions increased the standard errors of the marginal maximum likelihood item parameter estimates but reduced their bias in situations in which the latent trait was nonnormal.
The α and the ω of Congeneric Test Theory: An Extension of Reliability and Internal Consistency to Heterogeneous TestsLucke, Joseph F.
doi: 10.1177/0146621604270882pmid: N/A
Psychometric theory focuses primarily on tests that are homogeneous, measuring only one attribute of a psychosocial entity. However, the complexity of psychosocial behavior often requires tests that are heterogeneous, measuring more than one attribute. In this presentation, reliability and internal consistency are extended to heterogeneous tests under the rubric of congeneric test theory. The extensions show that reliability and internal consistency have very similar properties. Reliability and internal consistency are shown to be unique up to a linear transformation. Whereas internal consistency is a lower bound to reliability in the homogeneous case, it is a strict lower bound in the heterogeneous case. Reliability equals internal consistency if and only if the test is homogeneous and true-score equivalent. An analytic argument shows that neither reliability nor internal consistency can detect whether a test is homogeneous or heterogeneous. The reliability (internal consistency) of a heterogeneous test is the sum of the sub-tests’ reliabilities (internal consistencies) plus the sum of reliabilities (internal consistencies) involving the correlations among the sub-tests. Higher-order tests are tests in which items measure first-order attributes, first-order attributes reflect second-order attributes, and so on. Reliability (internal consistency) can be partitioned into the sums of direct reliability (direct internal consistency) for the first-order attributes and indirect reliabilities (indirect internal consistencies) for the second- and higher-order attributes. Simple examples demonstrate that internal consistency may be a poor approximator to the reliability of congeneric tests. Given that reliability for congeneric tests can now be estimated by modern statistical software, the need for internal consistency is diminished. However, internal consistency may be useful in bounding or approximating reliability in general linear latent variable models.