节点文献

基于决策树统合方法的最小最大模块化网络及其在专利分类中的运用

Module Combination Based on Decision Tree in Min-max Modular Network with Application To Patent Classification

【作者】 王玥

【导师】 吕宝粮;

【作者基本信息】 上海交通大学 , 计算机软件与理论, 2010, 硕士

【摘要】 超大规模的模式识别问题是现在机器学习算法在实际应用中遇到的越来越多的一个难题。随着信息时代的到来,现实中这种大规模问题是很常见的,例如专利分类问题。即便是像支持向量机这样高效率的学习算法,面对超大规模的分类问题,也是难以克服的。在这种情况下,利用丰富的计算资源,使得机器学习并行化,是当前机器学习领域的一个重要发展方向。最小最大模块化支持向量机(M3-SVM)是基于“分而治之”的思想解决大规模问题的有效的学习算法。它通过分解大规模问题,使之转化为大量小规模问题进行学习,并通过有效的分类器集成方法将它们重新组合成为大规模问题的原解,该算法具有天生的并行适应性。为了降低M3-SVM在模块统合阶段的时间复杂度,我们在原有的非对称选择和对称选择等分类器选择方法的基础上,提出了基于决策树的分类器选择算法。实验证明,决策树选择算法在分类效果上与原方法相似。但是大大提高了训练的复杂度。在此基础上,我们又提出了决策树训练数据的选择方法。该方法大大降低的决策树训练的时间,同时也降低的决策树的规模。与ACS与SCS相比更小规模的决策树在并行学习的计算复杂度方面更有优势,同时也节省了磁盘存储空间。在本文中我们设计了大量的实验,包括小规模的双螺旋线实验和大规模的专利分类实验来验证上述观点。

【Abstract】 Large-scale pattern recognition problems always restrict real applications of manymachine learning algorithms. These problems are usually very common, such as patentclassification. Even for efficient algorithms such as Support Vector Machine, large-scale problem are still too tough to learn. It is quite feasible to employ af?uent com-puting resources, and apply parallel computing environment ro solve these large-scaleproblems.Min-Max Modular Support Vector Machine(M3-SVM) is a”divide and conquer”based algorithm which can effectively solve large-scale problems. To reduce the timecomplexity in module combination step in M3-SVM, we proposed a Decision treeClassifier Selection(DCS) algorithms. DCS is an evolution of Symmetric ClassifierSelection(SCS). The results of experiments show that DCS can do classification asgood as SCS, and it can reduce the time complexity in prediction step.We also proposed a data selection method when training a decision tree. It canhighly reduce both the complexity of decision tree building and the decision tree size.Because of the smaller size, parallel computing can be applied better in DCS thanSCS. DCS can also save the memory space.We have done many experiments, such as two-spiral problem and patent classifi-cation, to prove our conclusion.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络