Quality & Quantity 35: 253–263, 2001.
© 2001 Kluwer Academic Publishers. Printed in the Netherlands.
On the Use and Utility of the Reliability Coefﬁcient
in Social and Behavioral Research
Department of Psychology, Fordham University, Bronx, New York, 10458, U.S.A.
Abstract. This note discusses the use and utility of the reliability concept in social and behavioral
research. The focus is on the meaning and limitations of the classical test theory reliability coefﬁcient.
Conditions are examined under which reliability is a meaningful notion, and the amount of inform-
ation it can provide then. Issues pertaining to limitations of the practice of social and behavioral
measurement and its implications for indices of measurement accuracy are subsequently discussed.
Key words: classical test theory, change scores, latent construct, measurement error, reliability
1. On the Use and Utility of the Reliability Coefﬁcient in Social and
One of the concepts that have attracted an impressive amount of social and behavi-
oral research interest over the past few decades is reliability. This is evidenced for
example by frequent recent publications related to reliability and its applications,
for instance on scale reliability estimation (e.g., Borchard and Hakstian, 1997;
Raykov, 1997), or on change score reliability (e.g., Rogosa, 1996; Williams and
Zimmerman, 1996a). The purpose of this note is to discuss the conditions under
which the classical test theory (CTT) notion of reliability is meaningful in social
and behavioral measurement and the amount of information it can provide.
2. When is it Meaningful to Talk about Reliability?
Within the framework of CTT (e.g., Lord and Novick, 1968), an observed score X
is decomposed into the sum of true score T and error score E, i.e., X = T + E,
whereby T and E are uncorrelated and are rigorously deﬁned, e.g., in Zimmerman
(1975). The reliability coefﬁcient (referred to in the remainder as ‘reliability’ for
short) of the measure X, denoted ρ
= 1 − σ
designate the variances of true, observed and error score,
respectively. For the present discussion, the distinction between population and