Applied Psychological Measurement

Applied Psychological Measurement | DeepDyve

journal article

LitStream Collection

The Influence of Test Characteristics on the Detection of Aberrant Response Patterns

1991 Applied Psychological Measurement

doi: 10.1177/014662169101500301pmid: N/A

Statistical methods to assess the congruence between an item response pattern and a specified item response theory model have recently proliferated. This "person fit" research has focused on the question: To what extent can person-fit indices identify well-defined forms of aberrant item response? This study extended previous person-fit research in two ways. First, an unexplored model for generating aberrant response patterns was explicated. The data-generation model is based on the theory that aberrant item responses result in less psychometric information for the individual than predicated by the parameters of a specified response model. Second, the proposed response aberrancy generation model was implemented to investigate how the aberrancy detection power of a person-fit statistic is influenced by test properties (e.g., the spread of item difficulties). Results indicated that detecting aberrant response patterns was especially problematic for tests with less than 20 items, and for tests with limited ranges of item difficulty. An applied consequence of these results is that certain types of test designs (e.g., peaked tests) and administration procedures (e.g., adaptive tests) potentially act to limit the detection of aberrant item responses.

journal article

LitStream Collection

Expert-System Scores for Complex Constructed-Response Quantitative Items: A Study of Convergent Validity

Bennett, Randy Elliot; Sebrechts, Marc M.; Rock, Donald A.

1991 Applied Psychological Measurement

doi: 10.1177/014662169101500302pmid: N/A

This study investigated the convergent validity of expert-system scores for four mathematical constructed-response item formats. A five-factor model comprised of four constructed-response for mat factors and a Graduate Record Examination (GRE) General Test quantitative factor was posed. Confirmatory factor analysis was used to test the fit of this model and to compare it with several alter natives. The five-factor model fit well, although a solution comprised of two highly correlated dimensions_GRE-quantitative and constructed- response—represented the data almost as well. These results extend the meaning of the expert system's constructed-response scores by relating them to a well-established quantitative measure and by indicating that they signify the same underlying proficiency across item formats.

journal article

LitStream Collection

The Effect of Numbers of Experts and Common Items on Cutting Score Equivalents Based on Expert Judgment

Norcini, John; Shea, Judy; Grosso, Louis

1991 Applied Psychological Measurement

doi: 10.1177/014662169101500303pmid: N/A

The effect of different numbers of experts and common items on the scaling of cutting scores derived by experts' judgments was investigated. Four test forms were created from each of two examinations; each form from the first examina tion shared a block of items with one form from the second examination. Small groups of experts set standards on each using a modification of Angoff's (1971) method. Cutting score equivalents were estimated for the matched forms using dif ferent group sizes and numbers of common items; they were compared with cutting score equivalents based on score equating. Results showed that a reduction in error is associated with using more experts or having more items in common between the two forms. For 25 or more common items and five or more judges, the error was about one item on a 100-item test. More than five experts or 25 common items made only a very small difference in error.

journal article

LitStream Collection

Effects of Passage and Item Scrambling on Equating Relationships

Harris, Deborah J.

1991 Applied Psychological Measurement

doi: 10.1177/014662169101500304pmid: N/A

This study investigated the effects of passage and item scrambling on equipercentile and item response theory equating using a random groups design. For all four tests and for both scramblings used, differences in item and examinee statistics were found to exist between all three forms used (the base form and the two scrambled forms). Up to 50% of the examinees administered a scrambled form would have received a different scale score if the base form equating, rather than the scrambled form equating, had been used to convert their number-correct scores. It is, therefore, suggested that caution be used when scrambled forms are being administered, because in applications such as that studied here, the effects of applying the equating results obtained using a base form to the number-correct scores obtained on a scrambled form can be quite substantial in terms of the numbers of examinees who would receive different scores.

journal article

LitStream Collection

Appropriate Moderated Regression and Inappropriate Research Strategy: A Demonstration of Information Loss Due to Scale Coarseness

Russell, Craig J.; Pinto, Jeffrey K.; Bobko, Philip

1991 Applied Psychological Measurement

doi: 10.1177/014662169101500305pmid: N/A

journal article

LitStream Collection

The Relationship of Power of Statistical Tests to Range of Talent: A Correction and Amplification

Humphreys, LIoyd G.

1991 Applied Psychological Measurement

doi: 10.1177/014662169101500306pmid: N/A

journal article

LitStream Collection

A Comparison of Two Area Measures for Detecting Differential Item Functioning

Kim, Seock-Ho; Cohen, Allan S.

1991 Applied Psychological Measurement

doi: 10.1177/014662169101500307pmid: N/A

The area between two item response functions is often used as a measure of differential item functioning under item response theory. This area can be measured over either an open interval (i.e., exact) or closed interval. Formulas are presented for com puting the closed-interval signed and unsigned areas. Exact and closed-interval measures were estimated on data from a test with embedded items intentionally constructed to favor one group over another. No real differences in detection of these items were found between exact and closed-interval methods.

journal article

LitStream Collection

An Empirical Study of the Effects of Small Datasets and Varying Prior Variances on Item Parameter Estimation in BILOG

Harwell, Michael R.; Janosky, Janine E.

1991 Applied Psychological Measurement

doi: 10.1177/014662169101500308pmid: N/A

Long-standing difficulties in estimating item parameters in item response theory (IRT) have been addressed recently with the application of Bayesian estimation models. The potential of these methods is enhanced by their availability in the BILOG com puter program. This study investigated the ability of BILOG to recover known item parameters under varying conditions. Data were simulated for a two- parameter logistic IRT model under conditions of small numbers of examinees and items, and different variances for the prior distributions of discrimina tion parameters. The results suggest that for samples of at least 250 examinees and 15 items, BILOG accurately recovers known parameters using the default variance. The quality of the estimation suffers for smaller numbers of examinees under the default variance, and for larger prior variances in general. This raises questions about how practi tioners select a prior variance for small numbers of examinees and items.

journal article

LitStream Collection

Fuzzy Fit Index Tutoring System (FFITS): An Intelligent System for Interpreting and Integrating Covariance Structure Modeling Solutions

Craiger, J. Philip; Coovert, Michael D.

1991 Applied Psychological Measurement

doi: 10.1177/014662169101500309pmid: N/A

journal article

LitStream Collection

On the Efficiency of IRT Models When Applied to Different Sampling Designs

Berger, Martijn P. F.

1991 Applied Psychological Measurement

doi: 10.1177/014662169101500310pmid: N/A

The problem of obtaining designs that result in the greatest precision of the parameter estimates is encountered in at least two situations in which item response theory (IRT) models are used. In so- called two-stage testing procedures, certain designs may be specified that match difficulty levels of test items with abilities of examinees. The advantage of such designs is that the variance of the estimated parameters can be controlled. In situations in which IRT models are applied to different groups, efficient multiple-matrix sampling designs are applicable. The choice of matrix sampling designs will also influence the variance of the estimated parameters. Heuristic arguments are given here to formulate the efficiency of a design in terms of an asymptotic generalized variance criterion, and a comparison is made of the efficiencies of several designs. It is shown that some designs may be found to be most efficient for the one- and two- parameter model, but not necessarily for the three- parameter model.

Showing 1 to 10 of 12 Articles

Articles per page

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

1989

1988

1987

1986

1985

1984

1983

1982

1981

1980

1979

1978

1977

Related Journals: