节点文献

支持向量机集成学习算法研究

Research on Support Vector Machine Ensemble Learning Algorithm

【作者】 程丽丽

【导师】 张健沛;

【作者基本信息】 哈尔滨工程大学 , 计算机应用技术, 2009, 博士

【摘要】 集成学习通过训练多个个体学习器并将其结果进行合成,显著地提高了学习系统的泛化能力,作为机器学习的第四大研究方向已经越来越引起人们的重视,为提高机器学习的泛化性能提供了另一种新的解决途径。支持向量机作为一种“稳定的”学习算法对集成学习技术提出了新的挑战,研究和探索新型的支持向量机集成学习算法成为目前研究的热点问题。支持向量机集成学习的研究开始较晚,研究较少。如何设计出更有效的支持向量机集成学习实现算法,是目前支持向量机集成学习的关键问题。本文分别从集成学习中的个体生成和结论合成两方面入手展开研究,充分挖掘支持向量机集成的优势和潜力。针对支持向量机对核函数类型及参数扰动的敏感性,而现有参数扰动方法都是预先选定一种核函数,没有考虑核函数类型对支持向量机性能的影响,引入更加灵活的混合核函数,对混合核函数各参数进行扰动,其实质是实现了对支持向量机模型的扰动,提出一种模型一重扰动的支持向量机集成算法。仿真实验结果表明,该算法通过引入更多参数参与扰动,提高了支持向量机集成的差异度和泛化性能。在模型扰动基础上,结合特征扰动机制,研究基于模型与特征二重扰动的支持向量机集成算法。已有特征扰动方法都是在原始特征空间中进行,没有考虑特征相关性对个体支持向量机性能及集成差异度的影响。引入ICA特征变换方法,利用ICA方法对特征空间进行变换,去除特征间的相关性;在变换后的独立成分空间中,给出基于模型与特征二重扰动机制的支持向量机集成算法。仿真实验结果表明,该方法进一步提高了集成的差异度和泛化性能。选择性集成方法从集成系统中选择出部分个体参与集成,达到了提高集成泛化性能的目的。经典的选择性集成算法还存在计算复杂度高、学习效率差、性能低等缺陷,提出利用人工鱼群优化算法优化结论合成的权值,引入人工鱼群算法求解的全局性、初值不敏感、鲁棒性强、收敛速度快等优点,实现一种新的选择性支持向量机集成算法,仿真实验结果表明,其在泛化性能、学习效率、集成规模等方面都有所改善。基于模糊积分法的支持向量机集成算法可以充分利用支持向量机的度量层输出信息,进一步提高集成的泛化性能。已有的模糊积分集成法利用训练样本的先验静态信息来确定模糊密度值,其对所有的测试样本都是固定不变的,不能充分体现不同个体支持向量机相对于不同待测样本分类的不同置信度。提出一种基于自适应模糊积分法的支持向量机集成算法,根据各个体支持向量机分类器的度量层输出信息确定个体支持向量机分类器对待测样本分类的置信度,并据此实现自适应模糊密度赋值。仿真实验结果表明,该方法进一步了提高支持向量机集成的泛化性能。

【Abstract】 Ensemble learning can greatly improve the generalization ability of learning system by training multiple base learners and combining their results, which has been the forth hot investigation orientation of machine learning.It also provides another way to improve the generalization ability of machine learning. Support vector machine, as a "stable" learning algorithm, is a challenge to ensemble learning. It is a hot research point to investigate and quest for new-style support vector machine ensemble learning algorithm. The research of support vector machine ensemble starts lately, and has few research productions. It is the key point of the support vector machine ensemble learning algorithm about how to work out much more effective ensemble learning algorithms. The paper focuses on the view point of individual production and conclusion combination to fully mining the predominance and potentiality of support vector machine ensemble.According to the sensitivity of support vector machine to the type of kernel function and model parameters. It aims at that the existed parameter manipulating methods did not consider the influence of the type of kernel function to the performance of support vector machine. The flexible hybrid kernel function is introduced and the involved parameters are manipulated. Actually a support vector machine ensemble algorithm based on model manipulating is proposed. The simulated experiments results show that the diversity and generalization performance is improved by introducing much manipulated parameters.The research of support vector machine ensemble algorithm is based on model and feature double disturbance mechanism. The feature disturbance is introduced into model disturbance. The existed feature manipulating methods did not consider the influence of the feature relativity to the performance of support vector machine and the diversity of ensemble.The feature transformation is introduced, the ICA is used to transform the feature space to take out the feature relativity, the support vector machine ensemble algorithm based on model and feature double disturbance mechanism is proposed. The simulated experiment results show that the diversity and generalization performance is improved further.Selective ensemble method selects partial individuals from ensemble system to improve generalization ability. The classical selective ensemble methods have the disadvantage of higher computation complexity, lower learning efficiency, and lower performance.A selective support vector machine ensemble algorithm based on artificial fish swarm algorithm, which has the virtue of overall solution, not sensitive to initial value, tough, rapidly converge, is proposed to optimize the combined weight. The simulated experiment results show that it can improve generalization ability and learning efficiency, decrease the scale of ensemble.Support vector machine ensemble based on fuzzy integral method can make full use of the measurement level information of support vector machine to improve generalization performance of support vector machine ensemble. The existed fuzzy integral fusion methods used the prior information of training samples to determine the value of fuzzy density, which is the same to any samples and can not reflect the different importance of support vector machine to different samples. A support vector machine ensemble based on adaptive fuzzy integral is presented, the classification confidence of individual support vector machine to test sample is determined according to the measurement level information and the adaptive fuzzy density is determined according to classification confidence. The simulated experiment results show that the proposed method can improve the performance further.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络