节点文献

高维空间非训练类例外模式可拒绝分类算法研究

Research on Classification Algorithm with Non-Training Pattern Reject Option in High-Dimensional Space

【作者】 贾千文

【导师】 胡正平;

【作者基本信息】 燕山大学 , 电路与系统, 2010, 硕士

【摘要】 经典分类模型总是假定测试样本属于训练类之一,然而在实际应用中往往存在非训练类例外模式作为输入的情况,这时由于分类器缺乏拒识能力,只能给出错误识别结果。可见,设计可拒绝分类模型有着重要的意义。在可拒绝分类问题中,由于搜集非训练类样本较为困难,所以通常假设训练阶段没有非训练类样本参与。这时,构建高维空间同类事物分布的合理覆盖模型,再判断测试样本是否在覆盖体内成为解决可拒绝分类问题的关键。本文以此为出发点,针对一些新的可拒绝分类模型展开研究。依据区分和认识相结合的设计思路,提出基于SRM(Structural Risk Minimization)自组织多区域覆盖的可拒绝近邻分类算法。该算法根据结构风险最小化原则对训练类构造自组织多区域多球覆盖认识模型,并利用k近邻综合策略构造区分模型。实验结果验证了该算法的有效性。根据同类样本分布在同一个非线性流形上的假设,研究基于稀疏表示结合流形子空间覆盖的可拒绝分类算法。通过在非线性流形上寻找局部线性模块,构建训练类的紧致覆盖模型,再利用稀疏表示策略构建不同类别的区分性描述。该方法取得良好的识别效果。为了在加强训练样本区分性描述的基础上构建样本分布的合理覆盖,提出基于区分性投影结合最小L1球覆盖的可拒绝分类算法。该算法通过L1范数最大化主成分分析提取样本的区分性投影特征,并在特征空间建立对离群点具有良好鲁棒性的最小L1球覆盖模型,提高了分类器的性能。在样本较少的情况下,统计可拒绝分类方法难以对样本分布建立紧致覆盖。为此,研究基于高维空间最小生成树覆盖模型的可拒绝分类算法,该算法将最小生成树的边作为虚拟样本以提供更好的类别分布信息,并通过引入覆盖半径调整策略解决因不合理虚拟样本造成覆盖冗余的问题。

【Abstract】 In the conventional classification problems, a typical assumption made during the design phase is that a new test object always belongs to one of a set of known classes. However, in many practical applications, outliers may appear that were not present during the training, which leads to wrong recognition results. Thus, it makes good sense to design a classification model with reject option.The classification problem with reject option usually assumed that no outlier samples are available in the training process. The reason for this assumption is that outliers may occur occasionally or their measurements might be very costly. In this case, finding an appropriate covering model for training class in high-dimensional space based on the complex geometric distribution of samples is the key problem of the above system. Then one point can be classified correctly by determining whether it is in the coverage area. Based on the idea, some novel classification models with reject option are presented in this paper.In order to combining“matter description”with“matter separation”in classification model design, a nearest neighbor classifier with reject option based on structural risk minimization self-organization multiple region covering model is presented in this paper. The algorithm construct a recognization based self-organization multiple region covering model for training data to reject outlier classes, according to the structural minimization principle. Then, the k-NN distinguish is as a following step to identified the exact class for accepted pattern. Experimental results demonstrate the effectiveness of the classifier.According to the assumption that the samples in each class can be supposed to distribute on a nonlinear manifold, a novel classifier with reject option based on manifold subspace covering model is constructed in this paper. Firstly, a compact coverage is built for the training samples by searching a collection of local linear models, each depicted by a subspace, on nonlinear manifold to describe the training class. Then, the SRC (Sparse Representation Classifier) is used for classification. The experiments show good performance of this method.In order to constuct a more compact coverage model by strengthening the discriminate description between training samples, a classifier with reject option based on minimum L1-ball covering model and discrimination feature description is proposed in this paper, which replaces L2 norm of hyperspherical covering algorithm with L1 norm. The algorithm extracts the discrimination projection feature of training samples by L1-norm maximization principal component analysis. Then, the minimum L1-ball covering model in feature space is constructed, which could improve the performance of a classifier.For small sample size problem, conventional classifiers with reject option based on statistical model could not construct appropriate covering decision boundary on data description. In this case, a novel minimum spanning tree (MST) covering model based classifier with reject option is proposed in this paper according to the data distribution in high-dimensional space. The algorithm describes the target class using MST with the assumption that the edges of the graph are also basic elements of the classifier which offers additional virtual training data for a better coverage. Furthermore, in order to reduce the degradation of the rejection performance due to the existence of unreasonable additional virtual training data, an adjustable coverage radius strategy is presented in coverage construction.

  • 【网络出版投稿人】 燕山大学
  • 【网络出版年期】2010年 08期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络