节点文献

构造性知识发现方法研究

Research on Structural Methods of Knowledge Discovery

【作者】 吴涛

【导师】 张铃;

【作者基本信息】 安徽大学 , 计算机应用, 2003, 博士

【摘要】 科学和网络技术的不断发展,数据的产生量急速增加,海量数据中知识发现成为人工智能领域研究的重要课题。决策树、神经网络、Bayesian网络等是当前知识发现的重要工具。但这些方法存在速度慢、网络结构难以确定等问题,难以满足知识发现对时效性的要求。张铃教授等在对BP等算法分析的基础上提出了基于覆盖的构造性机器学习方法,该方法根据样本自身的特点,构造神经网络,方法直观高效,较好地解决海量数据的处理问题。本文在分析当前知识发现中常用的分类方法的基础上,结合Rough集、SVM等理论,对该方法进行了深入研究,取得以下研究成果: (1) 基于覆盖的构造性学习方法直接根据样本数据构造覆盖网络,克服了传统神经网络计算中网络结构难以确定、运行速度慢、局部极小等问题,适宜于多类别、海量数据的处理。本文对该方法进行深入的分析,在领域构造、激励函数、距离函数等方面提出改进措施,实验证明这些改进进一步提高了覆盖算法的性能; (2) 学习样本的选择和学习顺序对神经网络的结构和网络的性能有直接影响,覆盖网络也与学习顺序密切相关,本文给出三种顺序覆盖方法,实验表明这些顺序覆盖方法不是最优的学习顺序,但其精度都接近或高于随机学习的平均值。在顺序覆盖的基础上,本文给出覆盖算法的增量学习和领域约简方法,有效地降低覆盖领域个数并提高覆盖网络的识别精度; (3) 由于描述对象的相关属性未知,现有的数据库使用大量的属性描述对象,大量冗余属性的存在,致使分类系统无法有效运行,合理选取属性特征,在保持分类能力的前提下,降低数据量,以提高分类的速度。Rough集理论为特征属性的选择提供了重要工具。本文利用Rough集方法选择属性,建立基于Rough的覆盖算法,在基本保持分类能力的前提下,提高分类的速度,并提出加权覆盖的设想。 (4) 建立在统计学习理论基础上的SVM方法,通过映射到高维空间和最大化分类间隔,构造最优分类超平面,具有较高的泛化能力和推广能力。本文分析了SVM与覆盖算法的共性和径向基函数的特点,提出基于径向基函数的覆盖算法,实验表明这一算法可以大幅度地降低覆盖个数和拒识样本数,同时实验也表明当参数选择适当时特征空间确实现线性可分。在 安徽大学博士学位论文商空间理论的指导下,本文提出覆盖领域溶合的概念,并给出领域的最大值融合和组合优化溶合的具体算法。领域溶合算法光滑了覆盖领域的分类边界,简化了SVM问题求解的复杂度,提高了覆盖算法的性能,将覆盖算法与统计学习理论结合起来,为覆盖算法提供了理论依据。 ⑤目前分类的方法众多,如何求出个数最少的分类超平面或者说隐层元,一般是很困难的。本文利用样本集和超平面的对偶关系,提出求解分类问题的对偶算法,它将样本集和超平面投影到各自的扩充空间,用遗传算法的思想,给出求划分矩阵的连线搜索法,然后用粗糙集的约简方法求出分类问题的解域,最后用求最大间隔解的方式求出问题的最优(次优)解。这一方法仍须进一步完善,但为我们求解分类问题提供一个全新的方法和思考问题的角度,具有广阔的应用前景和丰富的研究内容。

【Abstract】 As the development of science and network technology, the capabilities of both generating and collecting data have been increasing rapidly, so knowledge discovery from data set with huge amount of samples has become an important task of artificial intelligence. Decision tree, neural networks and Bayesian networks are the main tools of KDD. Traditional neural networks cannot satisfy the KDD’s requirement that information must be supplied promptly because of the low speed of processing and the difficulty defining the structure and estimating the parameters. Professor Zhang Ling etc propose a structural method for machine learning that designs the networks with spherical domains, which cover training samples. This kind of classifications is efficient for data set with huge amount of data. The author studies this structural method combining with Rough set theory and SVM, the main work and results are the following:(1) Structural machine learning method based on covering domains designs networks according to sample data, which is suitable for data sets with multi-class and huge amount of samples for its efficiency. The author analyzes the algorithm and proposes some strategies of ameliorating covering domains constructing, power function and distance function. Experiments show that these ameliorations improve the performance of domains covering neural networks.(2)Since the selecting of samples and the learning sequence affect the performance of covering networks deeply, three sequence covering methods are given in the thesis. Although these learning orders are not the best, in our experiments, the accuracy of networks designed with sequence learning is above the average accuracy of random learning. Algorithms of incremental learning covering and covering domains pruning based on order learning are also proposed. These algorithms can reduce the number of covering domains and improve the sorting accuracy effectively.(3)The existing databases employ large numbers of attributes to describe objects for which the relative attributes are unknown. It is necessary to selectattributes for classifier to make it perform well. Rough set theory provides an important tool for feature selection. An artificial network (RCSN) combining with rough set theory and covering design algorithm is introduced, which reduces condition attribute using rough set theory and designs the structure of neural network with covering design algorithm. An example shows that the algorithm can keep the sorting accuracy and cut down the occupying of memory and the cost of data collecting. The framework of feature weighting for covering algorithm is also proposed.(4) Support vector machine (SVM) that maps the samples to higher dimensional space and constructs an optimal hyper-plane to classify two classes samples based on statistical learning theory, has high abilities of generalization and extension. The resemblance between SVM and covering algorithm is analyzed and the algorithm of structural learning based on sphere covering in characteristic space is brought forward. Experiments show that this kind of networks has the virtue of both covering design algorithm and SVM; the existence of hyper-plane is proved in training processing. Directed by the theory of quotient space, the author puts forward the notion and the algorithm of covering domains fusion which band SVM and covering algorithm together. The fusion algorithm not only simplifies the solution of SVM and improves the performance of covering algorithm but also provides academic foundation for covering algorithm.(5) A dual representation of machine learning in classifications is introduced by mapping the samples and hyper-plane into their version space respectively. Using the dual representation, the classification problems solved in the original feature space is transformed into that solved in the characteristic space, and the classification problem of a given sample set is transformed into one that is minimal reduc

  • 【网络出版投稿人】 安徽大学
  • 【网络出版年期】2003年 03期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络