

Based on AP and SVM Algorithm Fusion Research and Application

【作者】 钟毅

【导师】 周春光;

【作者基本信息】 吉林大学 , 计算机应用技术, 2012, 硕士

【摘要】 支持向量机兴起于20世纪90年代,在随后的几十年内迅速发展,现已非常广泛地应用于机器学习与数据挖掘领域,成为不可或缺的标准工具之一,但是其结果直接依赖于所选取的训练样本,因此需要大量高质量的有标记样本,这在一定程度上限制了SVM的应用。针对此问题,本文提出一种基于AP聚类算法和SVM分类器相融合的新的AP-SVM分类器,使用PSOP-AP聚类算法优化数据集,得到高质量、小样本的SVM分类器的训练集,解决了目前已提出的各类SVM分类器分类精度的问题。实验结果表明:与传统的SVM分类器相比AP-SVM分类器具有更高的分类精度。尤其在心脏病的预测问题上,本文提出的AP-SVM分类器取得很好的效果。这为医学疾病研究提供了一种新的理论依据。

【Abstract】 In1992, Support vector machine (Support Vector Machine, SVM) was introduced intothe field of machine learning at the conference on computational learning theory, and hasaroused wide concern. Thorough and comprehensive development in the late1990s, now hasbecome the standard tools in the field of machine learning and data mining. SVM hasachieved good results in the handwritten numeral recognition, face recognition, functionregression and density estimation, but the SVM in dealing with large-scale set of trainingsamples to learn, there is a slow learning speed, storage requirements and other issues.Therefore, the SVM learning speed has become the bottleneck for its widely used one.Cluster analysis is also known as group analysis, is actually a process that make acollection of physical or abstract objects divided into a number of collections (clusters), eachcluster is composed of similar objects. In the same cluster between the objects similarity ishigh and with other clusters of objects similarity is low. Although clustering analysis is abranch of the taxonomy, but clustering and classification is not same, the difference is:classification need to know the classification of property of the data set in classificationproblem, but clustering need to find the classification attributes from the data sets, that is notrequired to achieve the regulations required divides the class number. Clustering analysisincluding AP clustering algorithm, fuzzy clustering, system clustering, dynamic clustering,graph theory clustering, sequential sample clustering method, clustering forecast method andso on.Although the traditional SVM classification accuracy is usually higher, but its resultsdirectly depend on the selected training samples, and therefore requires a large number ofhigh-quality labeled samples, to same extent, which limit the application of SVM. Thetraditional SVM in the selection of training set, usually randomly selected, making it difficultto choose a representative data as training samples, and the classification results are not stable.If Artificial selection, which the need to spend a lot of manpower and time, making the overallclassification efficiency is low. Clustering algorithm does not need any training sets, onlyneed to provide a set of data. In this data set to find the law, the automatic clustering, althoughthe clustering algorithm is fast, but usually the accuracy is not high.In summary, it should be combined with the advantages of both algorithms to achieve thedesired results. Firstly use clustering algorithm to cluster large data objects to selectrepresentative data points from each class as a training set of SVM classifiers, which willimprove the accuracy of the SVM classifier. The main work of this paper is as follows:First, read the works of several related fields, and listen to well-known biochemistry and molecular biology lecture. Reading a lot of literature, to understand the cluster analysistechniques and SVM theory research and write code with matlab language.Second, this paper proposed a novel PSOP algorithm based on the Particles SwarmOptimization, which used In-Group Proportion index as fitness function to search the optimalpreference of Affinity Propagation algorithm. This approach is not contrary to the AP scharacteristics of without a pre-given number of clusters needed, and can get the bestclustering results.Then, this article based on the AP and SVM algorithm fusion research, proposed a newAP-SVM classifier. Experiments reveal the feasibility of the proposed method, both in thecase of category of linear separable or nonseparable. Compared with the traditional SVMclassification, the AP-SVM classifier classification has a higher accuracy, while avoiding thetedious and difficult of the manually select the training samples, and save a lot of time andmanpower.Last, application of AP-SVM classifier to medical research for heart prediction, whichobtained a better effect, has certain significance.

  • 【网络出版投稿人】 吉林大学
  • 【网络出版年期】2012年 10期

