DOES THE RASCH MODEL REALLY WORK FOR MULTIPLE CHOICE ITEMS? NOT IF YOU LOOK CLOSELYDIVGI, D. R.
doi: 10.1111/j.1745-3984.1986.tb00251.xpmid: N/A
This paper discusses various issues involved in using the Rasch model with multiple choice tests. By presenting a modified test that is much more powerful, the value of Wright and Panchapakesan's test as evidence of model fit is shown to be questionable. According to the new test, the model failed to fit 68% of the items in the Anchor Test Study. Effects of such misfit on test equating are demonstrated. Results of some past studies purporting to support the Rasch model are shown to be irrelevant, or to yield the conclusion that the Rasch model did not fit the data. Issues like “objectivity” and consistent estimation are shown to be unimportant in selection of a latent trait model. Thus, available evidence shows the Rasch model to be unsuitable for multiple choice items.
THE CHOICE OF SCALE FOR EDUCATIONAL MEASUREMENT: AN IRT PERSPECTIVEYEN, WENDY M.
doi: 10.1111/j.1745-3984.1986.tb00252.xpmid: N/A
Two methods of constructing equal‐interval scales for educational achievement are discussed: Thurstone's absolute scaling method and Item Response Theory (IRT). Alternative criteria for choosing a scale are contrasted. It is argued that clearer criteria are needed for judging the appropriateness and usefulness of alternative scaling procedures, and more information is needed about the qualities of the different scales that are available. In answer to this second need, some examples are presented of how IRT can be used to examine the properties of scales: It is demonstrated that for observed score scales in common use (i.e., any scores that are influenced by measurement error), (a) systematic errors can be introduced when comparing growth at selected percentiles, and (b) normalizing observed scores will not necessarily produce a scale that is linearly related to an underlying normally distributed true trait.
AN EXAMINATION OF THE ASSUMPTION THAT THE EQUATING OF PARALLEL FORMS IS POPULATION‐INDEPENDENTANGOFF, WILLIAM H.; COWELL, WILLIAM R.
doi: 10.1111/j.1745-3984.1986.tb00253.xpmid: N/A
Linear conversions were developed relating scores on two recent forms of the homogeneous GRE Quantitative Test (GRE‐Q) and the specially constituted heterogeneous GRE Verbal‐plus‐Quantitative Test (GRE‐V+Q), using randomly equivalent groups of about 13, 000 taking each form. Specially defined homogeneous subpopulations were then identified, and conversions between scores on the two forms were again calculated, this time based on l000‐case samples drawn at random from the subpopulations. Finally, in order to develop empirical measures of equating error, I00 samples of 1, 000 cases each were drawn at random from the two total groups and used to calculate 100 conversions between scores on the two forms. The conversions based on the specially selected subpopulations were then compared with the total‐group conversions and evaluated in terms of the empirical standard errors. The results showed that the conversions for the subpopulations agreed with the total‐group conversion quite satisfactorily for the GRE‐Q and almost as well for the GRE‐V+Q. It was concluded that the data clearly support the assumption of population independence for homogeneous tests, but not quite so clearly for heterogeneous tests.
TEACHER EDUCATION AND TEACHER‐PERCEIVED NEEDS IN EDUCATIONAL MEASUREMENT AND EVALUATIONGULLICKSON, ARLEN R.
doi: 10.1111/j.1745-3984.1986.tb00254.xpmid: N/A
Professors and teachers were compared relative to their perspectives on preservice educational measurement courses. Twenty‐four professors from different colleges in seven states and 360 teachers from elementary and secondary schools in one midwestern state responded via mailed questionnaire. Professors reported the emphasis given to each of eight topics in preservice educational measurement courses, and teachers reported the emphasis they believed should be given to each topic. In five of the eight content areas, the relative emphases given by professors differed from that recommended by teachers. Major differences emerged in nontest evaluation, statistical analysis, and formative and summative evaluation. Implications of these results are discussed.
THE SUBSET SELECTION TECHNIQUE FOR MULTIPLE‐CHOICE TESTS: AN EMPIRICAL INQUIRYJARADAT, DERAR; SAWAGED, SARI
doi: 10.1111/j.1745-3984.1986.tb00256.xpmid: N/A
The impact of the Subset Selection Technique (SST) for administering and scoring multiple‐choice items on certain properties of a test was compared with that of the two other commonly used methods, the Number Right (NR) and the Correction for Guessing Formula (CFG). Under SST, examinees are instructed to select any number of response alternatives, the objective being to include the correct answer in the chosen set. The effects of each scoring method on the psychometric properties of a test and on the performance of examinees with different achievement levels and/or risk‐taking propensities were investigated. Results indicated that SST outperformed the other two methods, producing not only higher reliability and validity coefficients for the test, but doing so without favoring high risk takers. The superiority of SST may be attributed to two interrelated factors: the efficiency of the technique in controlling for guessing and the encouragement provided examinees to use their partial knowledge in responding.
MEASURING THE ORGANIZATIONAL ASPECTS OF WRITING ABILITYBENTON, STEPHEN L.; KIEWRA, KENNETH A.
doi: 10.1111/j.1745-3984.1986.tb00257.xpmid: N/A
The present study assessed the relationship among holistic writing ability, the Test of Standard Written English (TSWE), and the following tests of organizational ability: anagram solving, word reordering, sentence reordering, and paragraph assembly. Based upon a sample of 105 undergraduate students, the main findings were that writing ability, as measured by the holistic method of scoring, was significantly correlated with performance on the TSWE and the four tests of organizational ability. A composite score on all four organizational tests was found to have the highest zero‐order correlation with the measure of writing ability. A stepwise regression analysis, with the measure of writing ability as the criterion, also indicated that the composite score explained a significant proportion of the variance beyond that explained by the TSWE. The results are discussed in terms of the Kintsch and van Dijk model of strategic discourse processing, which suggests that different organizational strategies operate at the levels of words, sentences, and paragraphs. It is concluded that tests assessing organizational strategies ought to be included in assessments of writing ability.