Problems of Information Transmission, Vol. 41, No. 4, 2005, pp. 368–384. Translated from Problemy Peredachi Informatsii, No. 4, 2005, pp. 78–96.
Original Russian Text Copyright
2005 by Juditsky, Nazin, Tsybakov, Vayatis.
METHODS OF SIGNAL PROCESSING
Recursive Aggregation of Estimators by the Mirror
Descent Algorithm with Averaging
A. B. Juditsky
Laboratoire de Mod´elisation et Calcul, Universit´e Grenoble I, France
Institute of Control Sciences, RAS, Moscow
Laboratoire de Probabilit´es et Mod`eles Al´eatoires, Universit´eParisVI,France
Received March 16, 2005; in ﬁnal form, July 26, 2005
Abstract—We consider a recursive algorithm to construct an aggregated estimator from a
ﬁnite number of base decision rules in the classiﬁcation problem. The estimator approximately
minimizes a convex risk functional under the
-constraint. It is deﬁned by a stochastic version
of the mirror descent algorithm which performs descent of the gradient type in the dual space
with an additional averaging. The main result of the paper is an upper bound for the expected
accuracy of the proposed estimator. This bound is of the order C
(log M)/t with an explicit
and small constant factor C,whereM is the dimension of the problem and t stands for the
sample size. A similar bound is proved for a more general setting, which covers, in particular,
the regression model with squared loss.
The methods of generalized portrait (i.e., support vector machines, SVM) and boosting recently
became widely used in classiﬁcation practice (see, e.g., [1–4]). These methods are based on mini-
mization of a convex empirical risk functional with a penalty. Their statistical analysis is given, for
instance, in papers [5–8] (see also references therein). Note that the provided analysis is only ap-
proximate since numerical boosting and SVM algorithms do not necessarily minimize the empirical
risk functional exactly. Moreover, it is assumed that the whole data sample is available, but often
it is interesting to consider the on-line setting where observations come one-by-one and recursive
methods need to be implemented.
There exists an extensive literature on recursive classiﬁcation starting from Perceptron and its
various modiﬁcations (see, e.g., [9–11] and references therein, as well as overviews in [12, 13]).
We mention here only methods which use the same loss functions as boosting and SVM, and which
may thus be viewed as their on-line analogs. Probably, the ﬁrst technique of such kind is the
method of potential functions, some versions of which can be considered as on-line analogs of SVM
(see [10, 11] and [12, ch. 10]). Recently, on-line analogs of SVM and boosting-type methods using
convex losses have been proposed in . We also point out the paper , where the stochastic
gradient algorithm with averaging is studied for the general class of loss functions (cf. ). All
these papers use the standard stochastic gradient method for which the descent takes place in the
initial parameter space.
The work was made within the framework of Projects ACI NIM “BIOCLASSIF” and ACI MD “OPSYC,”
The research was made during visits to the Paris–VI and Grenoble–I Universities (France) in 2004–2005.
2005 Pleiades Publishing, Inc.