Mach Learn (2018) 107:767–794
https://doi.org/10.1007/s10994-017-5678-9
Semi-supervised AUC optimization based
on positive-unlabeled learning
Tomoya Sakai
1,2
· Gang Niu
1,2
· Masashi Sugiyama
1,2
Received: 29 April 2017 / Accepted: 16 September 2017 / Published online: 26 October 2017
© The Author(s) 2017
Abstract Maximizing the area under the receiver operating characteristic curve (AUC) is a
standard approach to imbalanced classification. So far, various supervised AUC optimization
methods have been developed and they are also extended to semi-supervised scenarios to
cope with small sample problems. However, existing semi-supervised AUC optimization
methods rely on strong distributional assumptions, which are rarely satisfied in real-world
problems. In this paper, we propose a novel semi-supervised AUC optimization method that
does not require such restrictive assumptions. We first develop an AUC optimization method
based only on positive and unlabeled data and then extend it to semi-supervised learning
by combining it with a supervised AUC optimization method. We theoretically prove that,
without the restrictive distributional assumptions, unlabeled data contribute to improving the
generalization performance in PU and semi-supervised AUC optimization methods. Finally,
we demonstrate the practical usefulness of the proposed methods through experiments.
Keywords AUC optimization · Learning from positive and unlabeled data ·
Semi-supervised learning
Editors: Wee Sun Lee and Robert Durrant.
The original version of this article was revised: Corrections made to title, author affiliations, and equations
on pp. 7 and 26.
B
Tomoya Sakai
sakai@ms.k.u-tokyo.ac.jp
Gang Niu
gang@ms.k.u-tokyo.ac.jp
Masashi Sugiyama
sugi@k.u-tokyo.ac.jp
1
Center for Advanced Intelligence Project, RIKEN, Nihonbashi, Chuo-ku, Tokyo, Japan
2
Graduate School of Frontier Sciences, The University of Tokyo, Kashiwanoha, Kashiwa-shi, Chiba,
Japan
123