journal article
LitStream Collection
Bell, Richard C.; Pattison, Philippa E.; Withers, Graeme P.
doi: 10.1177/014662168801200103pmid: N/A
Although the assumption of local independence un derlies all latent trait theories in mental testing, it has rarely been empirically examined. In this study of a clustered item test (the Australian Scholastic Aptitude Test), a loglinear modeling approach was used to ex amine the conditional independence of items both within and between clusters. In general, although rela tionships between items were usually positive (as re quired for theories involving monotone item trace lines), conditional independence was not found. De partures from independence were more marked in items within clusters rather than between clusters, and also among items based on mathematical rather than verbal material. Another finding was the tendency for departures from independence to increase with ability (as measured by the score on other items).
doi: 10.1177/014662168801200104pmid: N/A
In validity generalization research, the estimated mean and variance of the true validity distribution are often used to construct a credibility interval, an inter val containing a specified proportion of the true valid ity distribution. The statistical interpretation of this in terval in the literature has varied between Bayesian and classical (frequentist) viewpoints. Credibility in tervals are here discussed from the frequentist perspec tive. These are known as "tolerance intervals" in the statistical literature. Two new methods for construct ing a credibility interval are presented. Unlike the cur rent method of constructing the credibility interval, tolerance intervals have known performance character istics across repeated applications, justifying confi dence statements. The new methods may be useful in validity generalization research involving a small or moderate number of validation studies. Index terms: Bayesian statistics, Credibility intervals, Meta- analysis, Tolerance intervals, True validity distribu tion, Validity generalization.
doi: 10.1177/014662168801200105pmid: N/A
Unfolding data for unidimensional variables con structed from direct responses (e.g., agreement or dis agreement) are characterized by single peaked func tions involving the locations of each person and each stimulus. A continuous discrirninal process, of the form postulated by Thurstone when he proposed his Law of Comparative Judgment, is suggested. This process is transformed to a qualitative dichotomous re sponse in which the probability of endorsement is governed by the square of the distance between the lo cations of the person and the stimulus. Maximum like lihood estimates of the parameters are derived, and it is shown that the information associated with any re sponse is a bimodal function of the difference between the person and stimulus locations. The feasibility of parameter estimation is demonstrated with a limited simulation study. The model is applied to a set of statements designed to measure attitudes toward capi tal punishment and scaled by the methods of Thur stone. The responses conformed to the unfolding mechanism, and the scale values of the statements are statistically equivalent to those obtained by Thur stone's methods. Index terms: Attitude measure ment, Developmental data, Discriminal process, Item response theory, Person response theory, Thurstone scaling, Unfolding data, Unidimensional scaling.
Vale, C. David; Gialluca, Kathleen A.
doi: 10.1177/014662168801200106pmid: N/A
This study compared several IRT calibration proce dures to determine which procedure, if any, consis tently produced the most accurate item parameter esti mates. A new criterion of calibration efficiency was used for evaluating the calibration procedures; this cri terion considers the joint effects of individual item pa rameter errors as they relate to the accuracy of θ esti mation. Four methods of item calibration were evaluated: (1) heuristic estimates obtained from trans formations of traditional item statistics; (2) ANCILLES, a program that first fits the c parameter and then trans forms traditional item statistics to IRT a and b parame ters ; (3) LOGIST, a joint maximum likelihood proce dure ; and (4) ASCAL, a modification of LOGIST'S algorithm which applies Bayesian priors to the abilities and item parameters. These were compared with each other and with a constant item parameter baseline con dition. ASCAL and LOGIST produced estimates of essen tially equivalent accuracy, although ASCAL's estimates of the c parameters were slightly superior. The heuris tic estimates and those from ANCILLES were generally poor in comparison, particularly for smaller sample sizes. Index terms: Calibration efficiency, Item calibration, Item parameter estimation, Item response theory, Latent trait models.
Skaggs, Gary; Lissitz, Robert W.
doi: 10.1177/014662168801200107pmid: N/A
Previous research on the application of IRT method ology to vertical test equating has demonstrated con flicting results about the degree of invariance shown by these methods with respect to examinee ability. The purpose of this study was to examine IRT equating invariance by simulating the vertical equating of two tests under varying conditions. Rasch, three-parame ter, and equipercentile equating methods were com pared. Six equating cases, using different sets of item parameters, were replicated based on examinee sam ples of low, medium, or high ability or where ability was matched to the difficulty level of the test. The re sults showed that all three methods were reasonably invariant to examinee ability level under all conditions imposed. This suggests that multidimensionality is likely to be the cause of the lack of invariance found in real datasets. Index terms: Examinee ability; In variance in item response theory; Item response the ory, equating; Item response theory, invariance; Test equating; Vertical equating.
Cliff, Norman; Collins, Linda M.; Zatkin, Judith; Gallipeau, Dannie; McCormick, Douglas J.
doi: 10.1177/014662168801200108pmid: N/A
This paper reports the development and application of a method for ordering persons and items (or stim uli) when responses are ordinal. The method applies most directly to data where responses are dichoto mous, indicating agreement or acceptableness or simi larity, and can be assumed to reflect proximity rather than dominance. It orders rows and columns of the re sponse matrix into "parallelogram" form, using pair- wise interchange procedures, followed by other steps. The method was applied to several sets of question naire data and one set of archeological data, with rea sonable success. Other applications and extensions are suggested. Index terms: Dichotomous responses, Interchange methods, Ordinal scaling, Parallelogram scaling, Proximity data, Questionnaire responses.
Showing 1 to 10 of 12 Articles