psychometrika—vol. 83, no. 1, 203–222
RESAMPLING-BASED INFERENCE METHODS FOR COMPARING TWO
Markus Pauly and Maria Umlauft
TECHNICAL UNIVERSITY OF MUNICH
The two-sample problem for Cronbach’s coefﬁcient α
, as an estimate of test or composite score
reliability, has attracted little attention compared to the extensive treatment of the one-sample case. It is
necessary to compare the reliability of a test for different subgroups, for different tests or the short and
long forms of a test. In this paper, we study statistical procedures of comparing two coefﬁcients α
. The null hypothesis of interest is H
, which we test against one-or two-sided
alternatives. For this purpose, resampling-based permutation and bootstrap tests are proposed for two-group
multivariate non-normal models under the general asymptotically distribution-free (ADF) setting. These
statistical tests ensure a better control of the type-I error, in ﬁnite or very small sample sizes, when the
state-of-affairs ADF large-sample test may fail to properly attain the nominal signiﬁcance level. By proper
choice of a studentized test statistic, the resampling tests are modiﬁed in order to be valid asymptotically
even in non-exchangeable data frameworks. Moreover, extensions of this approach to other designs and
reliability measures are discussed as well. Finally, the usefulness of the proposed resampling-based testing
strategies is demonstrated in an extensive simulation study and illustrated by real data applications.
Key words: bootstrap, coefﬁcient alpha, Cronbach’s alpha, non-normality, permutation, reliability,
Reliability is a cornerstone concept in the classical true-score test theory of psychological or
educational measurement (e.g., Gulliksen 2013; Lord & Novick 1968; Crocker & Algina 1986;
McDonald 1999; Brennan 2001; Rao & Sinharay 2006). In this framework, it is assumed that the
observed test score variable Y can be divided in two components, namely the true score τ and
an uncontrolled and always unexplained random measurement error ε with error variance var(ε)
(e.g., Mellenbergh 1996). The reliability Rel(Y ) = var(τ )/
var(τ ) + var(ε)
is then deﬁned
as a normed measure ranging from zero to one and is decomposed into the proportion of the
explained or true-score variance var(τ ) and the observed total variance var(τ )+var(ε). Reliability
and methods for quantifying reliability, such as Cronbach’s alpha (discussed below), have been
employed in numerous substantial studies (e.g., Cortina 1993; Peterson 1994; Hogan et al. 2000).
Reliability is an essential quality criterion required for a “good” psychological or educational test,
whereby it represents the extent to which a test in repeated independent measurements under the
same conditions yields comparable test results.
The work of Markus Pauly and Maria Umlauft was supported by the German Research Foundation Project DFG-PA
Electronic supplementary material The online version of this article (https://doi.org/10.1007/s11336-017-9601-
x) contains supplementary material, which is available to authorized users.
Correspondence should be made to Markus Pauly, Institute of Statistics, Ulm University, Ulm, Germany.
Email: firstname.lastname@example.org; URL: http://www.uni-ulm.de/mawi/statistics/team/professors/prof-dr-markus-
© 2017 The Psychometric Society