节点文献
多目标粒度支持向量机及其应用研究
Research on Multi-objective Granular Vector Machines and Their Applications
【作者】 刘宏兵;
【导师】 熊盛武;
【作者基本信息】 武汉理工大学 , 计算机应用技术, 2011, 博士
【摘要】 如何提高学习机的推广能力、训练复杂性和可理解性是机器学习领域的研究热点和重点。粒计算或粒度模型将复杂研究对象分割成一些简单.的研究对象,以便从微观的角度观察和分析问题,降低其复杂性,如训练复杂性。相反,对于微观的简单问题,通过逐步综合的方法将相关的简单问题合并,从宏观上研究问题以降低处理多个简单问题带来的复杂性。对于用户而言,分类错误率和训练复杂性是衡量分类算法的相互冲突的指标,构造分类错误率低且训练复杂性低的分类算法是多目标优化问题。近年来,多目标优化已融入了机器学习领域,并且可以构造给定训练集上的一簇多个目标折衷的学习方法,供决策者根据需求选取合适的学习机。对于支持向量机而言,不同的训练样本对于训练过程的贡献度不同,容易错分的样本对构造分类超平而的贡献度较大,而不容易错分的样本对构造分类超平面的贡献度较小。根据样本的贡献度构造训练集的粒度模型,选取对分类超平面贡献度较大的样本构造支持向量机是降低训练复杂性常用方法之一本文融合粒计算、多目标优化和支持向量机,研究了基于模糊格的粒计算、多目标粒计算和粒度支持向量机三种算法。论文的创新性工作如下:利用格和对偶格之间的同构映射消除向量集和粒集上两种偏序关系的不一致性,根据同构映射和正评价函数构造粒之间的模糊包含度和合并后的粒度对粒进行有条件的合并,构造了基于模糊格的粒计算分类算法,并从代数系统的角度证明构造的粒计算的可行性。数值实验结果证明了与支持向量机相比基于模糊格的粒计算不但加速了训练过程而目.具有较高的推广能力。针对粒计算产生的冗余粒,根据粒的数量和分类错误率两个指标的重要性不同,在Pareto支配关系的基础上,定义了基于重要度的Pareto支配关系(IPareto, Importance-based Pareto Dominance)比较个体的优劣,建立了粒的数量与分类错误率的多目标粒计算模型,设计了相应的演化算法。该演化算法用粒的两层结构表示个体,设计了个体之问的交叉算子、单一个体上的合并算子和变异算子,用先验信息指导算法收敛到IPareto前沿。实验结果表明与传统的粒计算相比多目标粒计算得到更多的分类机制,即多目标粒计算得到一簇分类器供用户选择,每个分类器都是分类错误率对应的最小规模的分类粒集。针对大量非支持向量导致了支持向量机较高的训练复杂性,通过支持向量的分布特征估计了其分布,建立训练集的粒度模型,剔除部分非支持向量,选取极有可能成为支持向量的样本构造粒度支持向量机。该类粒度支持向量机根据样本对训练的不同贡献度,选取贡献度较大的样本构造粒度模糊支持向量机;定义训练集上的等价关系,利用属性值的离散化构造训练集的粒度模型,选取包含不同类样本的粗糙集边界构造粒度模糊支持向量机;利用属性的重要性对属性集精简或粒化,构造粒度模糊支持向量机。实验结果表明该类学习机降低了训练复杂性,提高了推广能力。将提出的超盒粒计算分类算法用于无线传感器网络的节点定位问题。首先,利用已知节点与锚节点之间的通信量构造训练集;其次,对定位区域进行网格化,将训练集转化为分类问题的训练集;训练分类算法并用测试集验证算法的性能。实验结果表明超盒粒计算分类算法可以用于无线传感器网络的节点定位,并且取得了可接受的定位精度。
【Abstract】 The research focus and emphasis on machine learning lie in how to improve the generalization ability, training speed, and intelligibility of learning machines. Granular computing (granular model) divides the complex task into some sub-tasks, so that people observe and analyze the research objectives from microcosmic, and the complexity, such as training complexity, is reduced. Conversely, for the microcosmic and simple task, the complexity caused dealing with multiple simple tasks is reduced by the gradual integration approach of sub-tasks in the view of macroscopic. For users, misclassified rate and training complexity are two conflict objectives used to evaluate the classification algorithms, and forming the classification algorithm with the minimal misclassified rate and minimal training complexity is a multi-objective optimization problem. In recent years, multi-objective optimization is embedded into machine learning field, which is applied to achieve a group of learning methods on the given training set, these learning methods can satisfy the user’s selection in terms of their requirements. For support vector machines, different training samples have different contributions to the training process, and the sample being easily misclassified makes bigger contribution than the sample being classified correctly. It is a usual method to reduce the training complexity by training support vector machines on the reduced training set induced by their contributions.In this dissertation, fuzzy lattice-based granular computing, multi-objective granular computing, and granular support vector machines are proposed based on the fusion of granular computing, multi-objective optimization, and support vector machines. The innovative works are as follows:The inconsistency, between two partial order relations on vector set and granule set, is eliminated by the isomorphic mapping between lattice and its dual lattice. Fuzzy lattice-based granular computing classification algorithm is formed on the conditional union in terms of the granularity of united granule and fuzzy inclusion measure induced by the isomorphic mapping and positive valuation function. The feasibility of granular computing classification algorithm is proved in the view of algebraic system. Experimental results show that the proposed algorithms not only speed up the training process but also achieve the better generalization ability compared with SVMs and KNN algorithms.According to the redundancy granules generated by granular computing, Importance-based Pareto (IPareto) dominance is used to compare two individuals based on Pareto dominance and different importance between the misclassification rate and the number of granules. The multi-objective granular computing model including the number of granules and misclassification rate is proposed, and the corresponding evolutionary algorithm is designed based on IPareto dominance. The evolutionary algorithm represents the individual by two-level structure, designs crossover, mutation, and union operators to search the IPareto front by guide of prior information. Compared with the traditional granular computing, experimental results show that the multi-objective granular computing algorithms achieved a group of classifiers for users, and each classifier is the granule set with the minimal size corresponding to the misclassification rate.The granular model is formed by the contribution of samples to the training process, support vectors and non-support vectors are estimated by their distribution or contribution. A great deal of non-support vectors causing higher training complexity are discarded, and granular support vector machines are formed by the samples with higher contributions. The granular model of training set is formed by the equivalence relation induced by the discreterization of attribute value, and the boundary of rough set including samples with different class labels is used to form granular support vector machines. Granular support vector machines are formed by reduction or granular model of attribute set induced by significance of attribute. Experimental results show that the granular support vector machines not only downsize the training complexity but also achieve the better generalization ability compared with SVMs algorithms.The proposed hyperbox granular computing classification algorithm (HBGrCCA) is used to estimate location of sensors in wireless sensor network. Firstly, the communication measure between reference points and other points is composed of training set. Secondly, localization problem is transformed into corresponding classification problem by gridding of localization area. Experimental results show that HBGrCCA, used to estimate the location of blind nodes in wireless sensor network, achieved the acceptable localization precision.
【Key words】 Granular computing; multi-objective optimization; fuzzy lattice; support vector machines;