节点文献

基于公理模糊集与支持向量机的知识发现方法与应用研究

Approaches to Knowledge Discovery and Its Application Via Axiomatic Fuzzy Sets and Support Vector Machines

【作者】 任艳

【导师】 刘晓东;

【作者基本信息】 大连理工大学 , 控制理论与控制工程, 2011, 博士

【摘要】 公理模糊集(Axiomatic Fuzzy Sets,简称AFS)理论,是一种处理模糊信息的新语义方法,其本质是研究如何把蕴涵在训练样本、原始数据或数据库中的内在规律和模式转化到模糊集及其逻辑运算中的一种新的语义方法,现已经被应用于形式概念分析、聚类分析、模糊分类器、知识表示等方面。支持向量机(Support Vector Machine, SVM)是基于统计学习理论提出的一种新型有监督模式识别方法。SVM较好地解决了小样本、高维数及非线性等实际问题,具有拟合精度高、选择参数少、推广能力强和全局最优等特点。SVM现已成为机器学习领域中新的研究热点。本文聚焦于应用AFS和SVM理论研究知识发现与表示领域中的热点问题。主要研究工作包括:1.本文首先应用AFS理论在无监督条件下提出了模糊特征选择、主概念选择算法,它们能够为知识发现选取出重要的特征和简单概念;然后提出了一个概念范畴化算法,该方法能够有效地将具有很高相关程度的简单概念归为一类,这在人工智能领域是一个非常重要的问题,在实际问题中,它可以对数据集进行降维,从而避免维数灾难;最后提出了样本特征描述算法,该算法能够提取出样本最主要的特征,这样的描述是非常简单的,在识别问题中,它比复杂的模糊描述更实用有效。2.通过详尽地研究AFS模糊逻辑聚类分析算法(X. D. Liu, W. Wang and T. Y. Chai. IEEE Transactions on Systems, Man, Cybernetics,2005)及其在真实数据上的实用性,发现算法中存在一些缺陷,针对这些缺陷,本文在原算法基础上提出了一个控制样本模糊描述粗糙程度的算法,增加了进一步完善聚类结果的过程,并改进了原始AFS聚类有效性指标。在公开数据集Iris上的测试结果显示了新方法的有效性。3.聚类分析是知识发现领域中的热点问题,为了评价AFS理论框架下的特征选择、主概念选择、概念范畴化和样本特征描述这四项技术的有效性,本文基于这四项技术提出了一个新的AFS模糊聚类分析算法。该算法中求每类模糊描述的新方法非常简单,每类的模糊描述仅仅是简单概念的交集。这样的描述简单,且具有很好的可解释性。同时它使得样本隶属于它所属类的程度较大,隶属于其他类的程度会非常小,甚至趋于0。这使得类与类间的边界能够尽可能的清晰。在几组UCI数据集上的聚类结果显示,该算法获得的聚类准确率是可以与FCM,κ-means等传统聚类算法的聚类结果相比较的,甚至优于这些算法的结果。实验结果进一步显示在合理的范围内选择参数,聚类结果非常稳定,即该聚类算法对于参数的选取是不敏感的。4.应用马氏距离提出了一个新颖的基于密度的聚类算法DBCAMM。该算法的创新点在于:一是替代经典基于密度聚类算法DBSCAN算法中常用的欧氏距离,该算法采用了马氏距离;二是它给出了一个有效地合并领导者和追随者的方法。此外,DBCAMM算法使用局部子类密度信息来合并子类,从而克服了DBSCAN算法中全局密度参数问题。在人工数据集上的实验结果显示了该算法的有效性。该算法和DBSCAN算法在一些典型图像上的分割结果显示出DBCAMM算法能够制造出更优秀的可视效果。5.提出了一个模糊规则极其精简的分类算法PFRAS,它首先应用SVM删除了训练集中的离群点,然后基于AFS理论找到带有明确语义解释的模糊集来描述每类。该算法还具有另外两个优点,一是该算法获得的每个规则仅仅是一些简单概念的交集,因此规则更为简单,二是不需要调整参数来优化规则。与其他方法相比,由于在PFRAS算法获得的结果中,每类对应更少的规则(对于大部分数据集,每类仅对应一条规则),因此本文提供了一个更简洁,可理解和准确的分类模型。

【Abstract】 Axiomatic Fuzzy Sets (AFS) theory is a new method to deal with fuzzy information, which provides an effective tool to concert the information in the training examples and databases into the membership functions and their fuzzy logic operations. Support Vec-tor Machine(SVM) is a new supervised pattern recognition method based on Statistical Learning Theory. It has advantages such as high classification accuracies, few parameters, global optimums and strong generalization performances:it becomes a new research area in the field of machine learning research. This thesis focuses on some popular problems which are often encountered in knowledge discovery and representation based on AFS and SVM theory. Main topics include:1. In the framework of AFS theory, this paper firstly proposes a fuzzy feature selection algorithm and a principle concept selection algorithm in unsupervised learning, which can extract the important features and simple concepts for knowledge discovery. Secondly, it presents a concept categorization approach which is a new and important technique in the artificial intelligence area. It can cluster the simple concepts which have the great correlations to one class. Finally, it gives an algorithm for finding the sample characteristic description. It can extract the salient characteristic of sample, such description is very simple and it is more effective and practical than the complex description in pattern recognition issue.2. By an exhaustive study of the clustering algorithm proposed by X. Liu et al. in IEEE Transactions on Systems, Man, Cybernetics,2005 and its effectiveness on real datasets, some drawbacks are discovered. For these drawbacks, firstly, this paper proposes an algorithm to control the rough extent of fuzzy descriptions of objects:secondly, adds a refinement step, i.e.. the clusters can be further refined by the fuzzy description of each cluster; finally, improves the original AFS cluster validity index. The well known real-world Iris data set is used to illustrate the effectiveness of the new clustering algorithm.3. In order to evaluate the effectiveness of the feature selection, the principle concept selection, the concept categorization and the characteristic description algorithm proposed in the framework of AFS theory, a new fuzzy clustering algorithm based on AFS theory is proposed via these new techniques. The fuzzy description of each cluster obtained from the proposed algorithm is simply a conjunction of some simple concepts. Not only the membership degree of a sample belonging to its cluster is large, but also the membership degrees of this sample belonging to other clusters are as small as possible, even close to 0. Thus, the boundary among the clusters is more clearer with such fuzzy descriptions. Such description is very simple, and easily understandable for the users since each class corresponds to much fewer rules. Several benchmark data sets are used for this study. Clustering accuracies are comparable with or superior to the conventional algorithms such as FCM,κ-means. The practical experience has further indicated that our clustering algorithm is not sensitive to parameters if the reasonable parameters are selected.4. A new density-based clustering algorithm via using the Mahalanobis metric is proposed. There are two novelties for the proposed algorithm:One is to adopt the Ma-halanobis metric as distance measurement and the other is its effective merging approach for leaders and followers. In order to overcome the unique density issue in DBSCAN, we propose an approach to merge the sub-clusters by using the local sub-cluster density information. Extensive experiments on some synthetic datasets show the validity of the proposed algorithm. Further the segmentation results on some typical images by using the proposed algorithm and DBSCAN are presented and it is shown that the proposed algorithm can produce much better visual results than DBSCAN.5. A classification method that is based on easily interpretable fuzzy rules is pro-posed, it fully capitalizes on the two key technologies, namely pruning the outliers in the training data by SVMs; finding a fuzzy set with definite linguistic interpretation to describe each class based on AFS theory. Compared with other fuzzy rule-based methods, the proposed models are usually more compact and easily understandable for the users since each class is described by much fewer rules. The proposed method also comes with two other advantages, namely, each rule obtained from the proposed algorithm is simply a conjunction of some linguistic terms, there are no parameters that are required to be tuned. The results show that the fuzzy rule-based classifier presented in this paper, of-fers a compact, understandable and accurate classification scheme. A balance is achieved between the interpretability and the accuracy.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络