节点文献

主动学习停止准则与评价测度研究

Study of Stopping Criteria and Performance Evaluation Metrics in Active Learning

【作者】 杨菊

【导师】 于化龙;

【作者基本信息】 江苏科技大学 , 计算机技术(专业学位), 2016, 硕士

【摘要】 主动学习是机器学习领域中最为活跃的研究方向之一,其旨在花费尽可能少的人类标注代价获得性能较高的分类模型。因此,在主动学习过程中,能否定义一个合适的停止准则对主动学习是否能发挥出最大效应具有重大意义。此外,在对一种主动学习算法的性能进行评估时,往往需要定义一些定量的评价测度,而这正是前人工作所忽略的问题。故本文主要针对上述两类问题展开研究。本文首先介绍了几种常用的主动学习停止准则,进而针对现有的选择精度主动学习停止准则仅适用于批量样例标注场景这一缺点,提出了一种改进的适用于单轮单样例标注场景的选择精度停止准则。该准则通过监督自本轮起前溯的固定学习轮次内的预测标记与真实标记间的匹配关系,对选择精度进行近似的评估计算,匹配度越高则选择精度越高。继而利用滑动时间窗实时监测该选择精度的变化,若当其高于事先设定的阈值时,则停止主动学习算法的运行。以基于支持向量机的主动学习方法为例,通过6个基准数据集对该准则的有效性与可行性进行了验证,结果表明当选取合适的阈值时,该准则能找到主动学习停止的合理时机。该方法扩大了选择精度停止准则的适用范围,提升了其实用性。目前,适用于主动学习的算法多种多样,但这些主动学习算法都共用一个统一的性能评估测度,即学习曲线。学习曲线在整个主动学习迭代过程中能够很好的区分分类模型间的性能差异,因此大多数文章都使用学习曲线作为比较不同分类算法性能的标准。但是对于两个分类性能相近的主动学习算法,很难从学习曲线的分布上观察到性能变化的细微差异。针对这一问题,通过深入挖掘学习曲线中所隐藏的信息,提出了四种定量的主动学习性能评估测度,分别为学习曲线下的面积、对数化的学习曲线下的面积、平均梯度角以及对数化的平均梯度角。在比较基于同质分类器的主动学习算法时,这四种度量测度均能够保证评估结果的公正性;而对于异质的分类器,在比较不同的主动学习算法性能时,平均梯度角以及对数化的平均梯度角比另外两种评估测度可能更加适用。此外,对数化的学习曲线下的面积与对数化的平均梯度角则更关注于主动学习初始学习阶段的性能提升速率。通过在9个数据集以及多个基准主动学习算法上的大量实验验证了上述四种测度的实用性。

【Abstract】 Active learning, which is one of the hot-spots in the field of machine learning, aims to minimize the amount of human labeling effort required for a supervised classifier to achieve a satisfactory performance. In active learning applications, the design of stopping criterion is very important and practical issue due to it makes little sense to continue the active learning procedure until all unlabeled samples has been labeled. In addition, as we know, we always need to pre-define some metrics for evaluating the performance of a given active learning algorithm. The issue has been neglected in previous work referring to active learning. According to what mentioned above, this thesis mainly focus on the research about two issues: stopping criteria and performance evaluation metrics of active learning.First, the thesis presents several simple stopping criteria over the unlabeled data pool for active learning. To solve the problem that selected accuracy stopping criterion can be only applied in the scenario of batch mode-based active learning, an improved stopping criterion applying for single-labeling mode was proposed. The matching relationship between each predicted label and the corresponding real label existing in a pre-designed number of learning rounds are used to approximately estimate and calculate the selected accuracy. The higher the match quality is, the higher the selected accuracy is. Then, the variety of selected accuracy can be monitoring by moving a sliding-time window. Active learning will stop when the selected accuracy is higher than a pre-designed threshold. The experiments are conducted on 6 baseline data sets with active learning algorithm based on support vector machine(SVM) classifier, indicating the effectiveness and feasibility of the proposed criterion. The results show that when pre-designing an appropriate threshold, active learning can stop at the right time. The proposed method expands the applications of selected accuracy stopping criterion.At present, there are various active learning algorithms, they share a common performance evaluation measure, i.e., learning curve. As learning curve can present the variance of the quality of the classification model during the whole iterative learning procedure, most research articles about active learning directly use learning curves to compare the performance of different algorithms. However, sometimes the learning curve can not directly present the slight difference between two similar active learning algorithms. In order to solve the problem, The thesis present four quantitative performance evaluation metrics by investigating the latent information behind the learning curve, namely area under the learning curve(ALC), logarithmic area under the learning curve(LALC), average gradient angle(AGA) and logarithmic average gradient angle(LAGA), respectively. In particular, all four metrics can present impartial evaluation results for active learning algorithms based on the homogeneous classifier, but when the qualities of different active learning algorithms based on the heterogeneous classifiers need to be compared, AGA and LAGA would be more suitable than two others. In addition, LALC and LAGA focus more on the learning speed at the initial stage of the learning procedure than two others. Experimental results on 9 data sets and multiple baseline active learning algorithms indicate the effectiveness of these four metrics.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络