节点文献

样本的几何信息在半监督学习中的应用研究

Research on the Application of Geometric Information in the Semi-supervised Learning

【作者】 徐雪

【导师】 周荷琴;

【作者基本信息】 中国科学技术大学 , 模式识别与智能系统, 2010, 博士

【摘要】 半监督学习(semi-supervised learning)是利用未标记样本进行学习的主流技术,是目前机器学习中非常活跃的研究方向之一。本文侧重于样本的几何信息在半监督学习中的应用研究,主要工作包括几何信息和标记信息的融合、基于图的归纳式算法、流形仿射对齐、多分辨率RandomWalk图像分割、几何信息对半监督学习影响的对比分析等。半监督学习主要靠挖掘未标记样本中的隐藏信息以提高分类器的精度,通常需要将标记信息和样本的几何信息融合在一起,其中的融合系数一般是固定的。本文通过研究两部分融合信息的关系,提出随着标记比例改变融合系数,以有效的提高学习精度。基于图的学习是近几年来半监督学习中一个相当活跃的方向,它用图来描述样本空间,利用近邻点的位置来控制标记信息的传播。由于图的特性的限制,大多数此类算法是直推式的,虽然推导过程直观、分类效果精度高,却没有给出显式的映射关系。本文提出了半监督局部线性调和算法,将混合模型和局部线性调和引入半监督学习之中,通过局部线性映射,实现基于图的归纳式的学习,给出了显式的映射关系。流形对齐是寻找两个或两个以上的数据集中的隐空间,并根据一些监督信息,将这些隐空间对齐在一起,以寻找数据集间对应点的相互联系。大多数流形对齐算法只能给出了训练集上的预测值,而没有给出整个数据空间上的映射关系。本文提出了一种流形仿射对齐算法,能通过线性变换实现流形对齐,便于直接映射新的数据点。由于内存消耗和分割时间的限制,大多数半监督图像分割的算法不能直接应用到大图像上,为此本文提出了基于多分辨率Random-walk的半监督图像分割算法。该算法利用低频子图的分割概率近似原始图像上的分割概率,同时迅速找出分割带有争议性的区域,然后在争议区域上进行精确分割。该算法较好地解决了大图片的分割问题,减少了内存消耗、缩短了分割时间,且具有一定的鲁棒性,在复杂复杂背景图像上也能取得较好的分割效果。为更好的研究几何信息与半监督学习效果的关系,本文提出了一个几何信息与标记信息的融合框架,并通过中间变量将局部混合模型也纳入到该框架内。最后通过实验对比和分析了几种几何信息对半监督学习效果的影响。

【Abstract】 Semi-supervised learning is the primary method of making use of unlabeled samples for learning, and is a very active research field in machine learning. This article focuses on the applied research of geometric information in the semi-supervised learning. The main work includes the integration of geometric information and tag information, graph-based inductive algorithm, affine manifold alignment, image segmentation by multi-resolution Random-Walk, comparative effects and analysis of geometric information in semi-supervised learning.Semi-supervised learning mainly digs the hidden information in the unlabeled data to improve the classifier accuracy, and usually contains the integration of label information and geometric information. In most algorithms the coefficient for melting the two parts is fixed. In this paper, we point out that the weights of the labeled information and manifold structural information could be changed with the proportion of the labeled points in order to effectively improve the learning accuracy.Graph-based learning is a very active direction of semi-supervised learning in recent years. It describes the sample space by graph, and uses neighbors to spread label information in point cloud. For the restriction of the graph feature, most of these algorithms are transductive that they can’t produce an explicit mapping. By introducing the mixed model and local linear coordinate into the semi-supervised learning, we propose the semi-supervised local linear coordination algorithm. The algorithm is an inductive graph-based method, and achieves better performance than linear methods by local linear transformation.Manifold alignment is to find the hidden space of two or more data sets, and align them in a global coordination where the corresponding pairwise relationship could be found easily. Most of manifold alignment algorithms can only give the predictive value of the training set instead of producing a mapping defined everywhere. We present a manifold affine alignment algorithm, which facilitates direct mapping of new data points.As the constraints of memory consumption and time of segmentation, most of the semi-supervised image segmentation algorithms can not be directly applied to large images. In this paper, a semi-supervised image segmentation algorithm based on multi-resolution Random-Walk is proposed. Low frequency sub-division is used here to approximate the segmentation probability of the original image, while the controversial area is quickly identified, then the accurate segmentation is imposed on the disputed area. The algorithm offers a better solution to semi-supervised segmentation on large images, and is robust in complex background environment.In order to better study the relationship between the geometric information and the semi-supervised learning effect, a framework for integrating the geometric information and tag information is present here. The methods based on mixed model are also incorporated into the framework by the definition of middle variable. The comparison and analysis of geometric information’s impact on semi-supervised learning is performed through the experiments.

节点文献中: