Biometrics 74, 40–48 DOI: 10.1111/biom.12732
A Gatekeeping Procedure to Test a Primary and a Secondary
Endpoint in a Group Sequential Design with Multiple Interim Looks
Ajit C. Tamhane ,
Jiangtao Gou ,
Cyrus R. Mehta ,
and Teresa Curto
Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston,
Illinois 60208, U.S.A.
Department of Mathematics and Statistics, Hunter College, New York, New York 10065, U.S.A.
Department of Mathematical Sciences, University of Bath, Bath BA2 7AY, U.K.
Cytel Inc., 675 Massachusetts Avenue, Cambridge, Massachusetts 02139, U.S.A.
Summary. Glimm et al. (2010) and Tamhane et al. (2010) studied the problem of testing a primary and a secondary
endpoint, subject to a gatekeeping constraint, using a group sequential design (GSD) with K = 2 looks. In this article, we
greatly extend the previous results to multiple (K>2) looks. If the familywise error rate (FWER) is to be controlled at a
preassigned α level then it is clear that the primary boundary must be of level α. We show under what conditions one α-level
primary boundary is uniformly more powerful than another. Based on this result, we recommend the choice of the O’Brien
and Fleming (1979) boundary over the Pocock (1977) boundary for the primary endpoint. For the secondary endpoint the
choice of the boundary is more complicated since under certain conditions the secondary boundary can be reﬁned to have a
nominal level α
>α, while still controlling the FWER at level α, thus boosting the secondary power. We carry out secondary
power comparisons via simulation between diﬀerent choices of primary–secondary boundary combinations. The methodology
is applied to the data from the RALES study (Pitt et al., 1999; Wittes et al., 2001). An R library package gsrsb to implement
the proposed methodology is made available on CRAN.
Key words: Familywise error rate; Gatekeeping; Lan–DeMets error spending function approach; Multiple comparisons;
Multiple endpoints; O’Brien–Fleming boundary; Pocock boundary; Primary power; Secondary power.
Gatekeeping procedures for testing multiple hierarchical
objectives such as tests on multiple endpoints have been
been studied by many authors in the last 15 years, see,
for example, Dmitrienko and Tamhane (2007, 2009). For
the most part, these studies are restricted to ﬁxed sam-
ple designs. However, group sequential designs (GSDs) have
become increasingly more common in clinical trials since
the early works of Pocock (1977) and O’Brien and Flem-
ing (1979); Jennison and Turnbull (2000) have given a
thorough overview of the subject. Jennison and Turnbull
(1993), Tang and Geller (1999), and Maurer and Bretz (2013)
have addressed certain aspects of multiple testing in GSDs.
Still, there is a pressing need to develop procedures at the
interface of gatekeeping and group sequential designs. This
article addresses a practically important problem at this
[Correction added on September 25, 2017, after ﬁrst online pub-
lication: Edits made to the ﬁfth sentence in Section 5, Power
Hung et al. (2007) were the ﬁrst to study a gatekeeping test
on a primary and a secondary endpoint using a GSD with two
looks (or stages). They showed that the ﬁxed-sequence testing
strategy of propagating α from a rejected hypothesis to an
unrejected one, used eﬀectively in gatekeeping and graphical
procedures (Bretz et al., 2009), inﬂates the type I error rate
when used in a GSD. Glimm et al. (2010) and Tamhane et al.
(2010) studied this problem analytically and showed how to
determine particularly the critical boundary for the secondary
endpoint to control the familywise type I error rate (FWER).
Both these articles focused on the two-look (K = 2) case. In
this article, we study this problem in much greater depth and
extend the previous results to multiple looks (K>2).
The outline of the article is as follows. Section 2 sets
up the notation and gives the statement of the problem.
Section 3 discusses the choice of the primary boundary.
Section 4 discusses the choice of the secondary boundary. This
section is divided into three subsections. The ﬁrst subsection
reviews the previous results for K = 2, while the second sub-
section gives new results for K>2. Both these subsections
consider the least favorable case (in terms of maximizing the
2017, The International Biometric Society