Biometrics 74, 49–57 DOI: 10.1111/biom.12738
Covariate-Adjusted Response-Adaptive Randomization for Multi-Arm
Clinical Trials Using a Modiﬁed Forward Looking Gittins Index Rule
Sof´ıa S. Villar
and William F. Rosenberger
MRC Biostatistics Unit, School of Clinical Medicine, University of Cambridge, Cambridge, U.K.
Department of Statistics, George Mason University, U.S.A.
Summary. We introduce a non-myopic, covariate-adjusted response adaptive (CARA) allocation design for multi-armed
clinical trials. The allocation scheme is a computationally tractable procedure based on the Gittins index solution to the
classic multi-armed bandit problem and extends the procedure recently proposed in Villar et al. (2015). Our proposed CARA
randomization procedure is deﬁned by reformulating the bandit problem with covariates into a classic bandit problem in
which there are multiple combination arms, considering every arm per each covariate category as a distinct treatment arm.
We then apply a heuristically modiﬁed Gittins index rule to solve the problem and deﬁne allocation probabilities from the
resulting solution. We report the eﬃciency, balance, and ethical performance of our approach compared to existing CARA
methods using a recently published clinical trial as motivation. The net savings in terms of expected number of treatment
failures is considerably larger and probably enough to make this design attractive for certain studies where known covariates
are expected to be important, stratiﬁcation is not desired, treatment failures have a high ethical cost, and the disease under
study is rare. In a two-armed context, this patient beneﬁt advantage comes at the expense of increased variability in the
allocation proportions and a reduction in statistical power. However, in a multi-armed context, simple modiﬁcations of the
proposed CARA rule can be incorporated so that an ethical advantage can be oﬀered without sacriﬁcing power in comparison
with balanced designs.
Key words: Adaptive designs; CARA randomization; Ethics; Multi-armed bandit; Sequential allocation.
The Gittins index rule (Gittins and Jones, 1979) was devel-
oped as an optimal solution to the classic multi-armed bandit
problem. In the context of a clinical trial to test the eﬀec-
tiveness of several treatments with an inﬁnite number of
patients, it also provides a deterministic patient allocation
rule that aims to optimize patient beneﬁt on average. In order
to do so, the rule must dynamically address the ethical con-
ﬂict between learning (eﬃciency/power) and earning (patient
beneﬁt/ethics) after every patient is treated, its outcome
observed and considering the potential outcomes of the future
patients, given the observed history.
The multi-armed bandit problem and the Gittins index
are based on a set of assumptions which may be restrictive
when considered from a practical point of view (Villar et al.,
2015). Particularly important assumptions include the inﬁnite
size of the trial, the observability of each patient’s outcome
before treating the next patient, and the lack of randomiza-
tion of the resulting patient allocation rule. Any extensions
of the original model that result from relaxing some (or all)
of these assumptions would, in general, require either ﬁnd-
ing an appropriate extension of the Gittins index rule for the
relaxed model (e.g., an index for the ﬁnite horizon problem
investigated by Villar et al. (2015)), or otherwise relying on
a computational solution using dynamic programming (e.g.,
as in Cheng and Berry (2007) or Williamson et al. (2017)).
The latter approach requires the problem to be of a tractable
size. An alternative approach was proposed in Villar et al.
(2015) where the Gittins index rule was used to deﬁne a
non-myopic response-adaptive randomized procedure for the
design of ﬁnite-sized trials—namely, the block randomization
procedure referred to as the forward looking Gittins index
Incorporating covariates into the multi-armed bandit model
is one such extension. There are at least two reasons why this
would be relevant. First, including covariate information into
the model would imply relaxing the assumption that observa-
tions of a given treatment are exchangeable (i.e., that subjects
receiving the same treatment arm have the same probabil-
ity of success). This would, in turn, allow for the inclusion
of treatment–covariate interactions and the modiﬁed ban-
dit model with covariates would maximize patient beneﬁt by
assigning more patients to a superior treatment, given their
covariate proﬁle. Second, methods that promote balance on
important known covariates have become a general standard
among practicing clinical trialists. However, there are many
relevant instances in which balance does not lead to eﬃciency
or ethically attractive designs, as shown in Rosenberger and
Sverdlov (2008). A bandit model with covariates would illus-
trate this conﬂict, as balance on covariates would never be
achieved by its optimal solution rule if treatments are per-
ceived diﬀerently among covariate groups.
2017 The Authors. Biometrics published by Wiley Periodicals, Inc. on behalf of International Biometric Society
This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs
License, which permits use and distribution in any medium, provided the original work is properly cited, the use is
non-commercial and no modiﬁcations or adaptations are made.