TY - JOUR
T1 - Active learning without knowing individual instance labels
T2 - A pairwise label homogeneity query approach
AU - Fu, Yifan
AU - Bin, Li.
AU - Zhu, Xingquan
AU - Zhang, Chengqi
N1 - Includes bibliographical references.
PY - 2014/4
Y1 - 2014/4
N2 - Traditional active learning methods require the labeler to provide a class label for each queried instance. The labelers are normally highly skilled domain experts to ensure the correctness of the provided labels, which in turn results in expensive labeling cost. To reduce labeling cost, an alternative solution is to allow nonexpert labelers to carry out the labeling task without explicitly telling the class label of each queried instance. In this paper, we propose a new active learning paradigm, in which a nonexpert labeler is only asked “whether a pair of instances belong to the same class”, namely, a pairwise label homogeneity. Under such circumstances, our active learning goal is twofold: (1) decide which pair of instances should be selected for query, and (2) how to make use of the pairwise homogeneity information to improve the active learner. To achieve the goal, we propose a “Pairwise Query on Max-flow Paths” strategy to query pairwise label homogeneity from a nonexpert labeler, whose query results are further used to dynamically update a Min-cut model (to differentiate instances in different classes). In addition, a “Confidence-based Data Selection” measure is used to evaluate data utility based on the Min-cut model's prediction results. The selected instances, with inferred class labels, are included into the labeled set to form a closed-loop active learning process. Experimental results and comparisons with state-of-the-art methods demonstrate that our new active learning paradigm can result in good performance with nonexpert labelers.
AB - Traditional active learning methods require the labeler to provide a class label for each queried instance. The labelers are normally highly skilled domain experts to ensure the correctness of the provided labels, which in turn results in expensive labeling cost. To reduce labeling cost, an alternative solution is to allow nonexpert labelers to carry out the labeling task without explicitly telling the class label of each queried instance. In this paper, we propose a new active learning paradigm, in which a nonexpert labeler is only asked “whether a pair of instances belong to the same class”, namely, a pairwise label homogeneity. Under such circumstances, our active learning goal is twofold: (1) decide which pair of instances should be selected for query, and (2) how to make use of the pairwise homogeneity information to improve the active learner. To achieve the goal, we propose a “Pairwise Query on Max-flow Paths” strategy to query pairwise label homogeneity from a nonexpert labeler, whose query results are further used to dynamically update a Min-cut model (to differentiate instances in different classes). In addition, a “Confidence-based Data Selection” measure is used to evaluate data utility based on the Min-cut model's prediction results. The selected instances, with inferred class labels, are included into the labeled set to form a closed-loop active learning process. Experimental results and comparisons with state-of-the-art methods demonstrate that our new active learning paradigm can result in good performance with nonexpert labelers.
KW - Active learning
KW - Data mining
U2 - 10.1109/TKDE.2013.165
DO - 10.1109/TKDE.2013.165
M3 - Article
SN - 1041-4347
VL - 26
SP - 808
EP - 822
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 4
ER -