Quality & Quantity 33: 117–133, 1999.
© 1999 Kluwer Academic Publishers. Printed in the Netherlands.
Inconsistencies Across Three Contextual
Meanings of Reliability
OLIVER C. S. TZENG
and JANET WELCH
Indiana University, Purdue University at Indianapolis, Indiana, U.S.A.
Abstract. The purpose of this article is to assess the nature of reliability and its inconsistent deﬁ-
nitions across three contextual (conceptual, measurement and statistical) levels under the traditional
true score theory. Due to such inconsistencies, two existing quantitative approaches (using r and
covariance) are not uniformly understood in Psychology and other disciplines; consequently, their
applications to measurements and testings are limited to ambiguous interpretations at the conceptual
and measurement levels. To examine the extent of this problem, a questionnaire including various
contextual deﬁnitions and interpretations of reliability in the literaturewas distributed in a nationwide
survey. Results from six groups of experts representing editors, professors and advanced graduate
students in both quantitative and clinical areas indicate that all subject groups generally agreed that
a reliable instrument possesses the characteristics of the repeatability of responses of all test-takers
at the conceptual level, and the reproducibility of the instrument with little or no variations from
the underlying true scores at the measurement level. However, between the editors and noneditors,
the endorsements of the common deﬁnition at the measurement level show obvious discrepancies.
Further, at the statistical level, signiﬁcant differences were found not only between but also within
subject-groups in their interpretations of product-moment correlations and Alpha coefﬁcients for the
assessment of reliability at the conceptual and measurement levels. The causes of such inconsisten-
cies were discussed in terms of the inherent limitations of the two statistical approaches used and their
insufﬁciencies for indexing the conceptual and measurement meanings of reliability. Finally, this
paper called for developing new statistical indices that are coherent with conceptual and measurement
deﬁnitions. Before such development, the capacities of existing reliability indices shall be redeﬁned
and their application qualiﬁcations shall be proportionally re-established for educational, research
and clinical purposes.
Key words: reliability, classical true score theory, inherent lamellations of r and Alpha, discrepancies
among experts, re-establishment of application qualiﬁcations, need for new indices.
This research is supported in part by Indiana University School of Nursing where the ﬁrst author
is an adjunct professor and the second author was a doctoral student.
For correspondence contact the ﬁrst author at Osgood Laboratory for Cross-Cultural Research,
Department of Psychology, Indiana University – Purdue University at Indianapolis, 402 N. Blackfold
Street, LD120B, Indianapolis, Indiana 46202-3275, U.S.A.