节点文献

基于PCA的贝叶斯网络构造算法与应用

A Study and Application of Learning Bayesian Network from Data Approach Based on PCA

【作者】 刘晓洁

【导师】 朱群雄;

【作者基本信息】 北京化工大学 , 计算机应用技术, 2009, 硕士

【摘要】 贝叶斯网络是用来表示变量间概率分布的图形模式,它提供了一种自然的表示因果信息的方法,用来发现数据间的潜在关系,具有稳固的数学基础,由于其具有图形化的模型表示形式、局部及分布式的学习机制、直观的推理;适用于表达和分析不确定性和概率性的事物;能够对不完全、不精确或不确定的知识或信息做出有效的推理等特性,而成为目前不确定知识表达和推理领域最有效的模型之一。如何通过有效的方法和算法利用现实数据学习贝叶斯网络,并准确地表达蕴含在数据中有价值的信息是目前研究的热点和难点。本文采用基于信息论的方法进行贝叶斯网络的结构学习,并针对其当节点集越大,计算效率越低的缺点采用PCA降维,减少节点集的数量,提高算法的效率,主要工作如下:1、用模糊聚类对连续数据或混合数据进行离散化;对数据集用PCA主元分析算法进行降维,减少其中节点的个数;2、运用Gibbs抽样算法对数据集中的缺失数据进行补充,用基于信息论的方法学习贝叶斯网络结构;3、用分类实验验证基于PCA的贝叶斯网络分类器的准确率及算法效率,并对乙烯生产中不同生产规模或不同技术的能耗及物耗相关数据进行贝叶斯数据融合,得到的结果对乙烯生产中能耗物耗水平的评价有一定的参考价值。

【Abstract】 The Bayesian belief network is a powerful knowledge representation and reasoning tool under conditions of uncertainty.A Bayesian belief network is a directed acyclic graph with a conditional probability distribution for each node,With a solid math foundation.Bayesian networks is one of the most efficient models in the fields of uncertain knowledge expression and inference.It has the following characteristics: the expression form of graph model,partial and distributed study mechanism and directly perceived inference;applicable in expressing and analyzing uncertain and probability things and efficiently reasoning partial,inaccurate and uncertain knowledge or information.In the field of graph model and data mining,the central issue and difficult point is how to learn Bayesian networks and to accurately express valuable information in the data through the efficient methods and algorithm.This paper using the algorithm of learning bayesian network from data on information theory and according to the disadvantage of it,using PCA to reduce the dimensionality of the database amended by Gibbs sampling to cut down the number of nodes of data and improve efficiency of learning bayesian network.The main work are as follows:1、Using fuzzy clustering discretize the continuous attribute and using PCA to reduce the dimensionality of the database to cut down the number of nodes of data;2、Amending missing data by Gibbs sampling,Learning the structure of bayesian network using method on information theory from data;3、To verity the accuracy and efficiency of the algrithm of learning bayesian network from data,using bayesian network to classify data;and using bayes data fusion to fuse the data from different installation of ethylene production,the results have a certain extent reference value.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络