Mach Learn (2018) 107:749–766
Crowdsourcing with unsure option
· Zhi-Hua Zhou
Received: 29 April 2017 / Accepted: 16 September 2017 / Published online: 26 October 2017
© The Author(s) 2017
Abstract One of the fundamental issues in crowdsourcing is the trade-off between the num-
ber of workers needed for high-accuracy aggregation and the budget to pay. To save cost, it
is important to ensure high quality of the crowd-sourced labels, hence the total cost on label
collection will be reduced. Since the conﬁdence of the workers often has a close relationship
with their abilities, a possible way for quality control is to request the workers to return the
labels only when they feel conﬁdent, by means of providing them with an ‘unsure’ option.
On the other hand, allowing workers to choose the unsure option can potentially waste part
of the budget. In this work, we conduct an analysis towards understanding when providing
the unsure option indeed leads to signiﬁcant cost reduction, as well as how the conﬁdence
threshold might be set. We also propose an online mechanism, which is an alternative for
threshold selection when the estimation of the crowd ability distribution is difﬁcult.
Keywords Crowdsourcing · Mechanism design · Unsure option · Cost reduction
Labeled data play a crucial role in machine learning. In recent years, crowdsourcing has been
a popular cost-saving way for label collection. The power of crowdsourcing relies on two
conditions. One is the possibility to obtain highly accurate estimation of true labels by aggre-
gating the collected noisy labels. Another is that the cost paid to the workers during the label
collection process is not large, hence crowdsourcing is much more economical than to recruit
Editors: Wee Sun Lee and Robert Durrant.
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023,