节点文献

基于贝叶斯网络的知识发现与决策应用研究

Research and Application of Bayesian Networks for Knowledge Discovery and Decision-making

【作者】 张少中

【导师】 王秀坤;

【作者基本信息】 大连理工大学 , 计算机应用技术, 2003, 博士

【摘要】 贝叶斯网络是概率理论和图论相结合的产物,它提供了一种自然的工具,可以用来处理贯穿于应用数学和工程中的两个问题。不确定性和复杂性。80年代,贝叶斯网络多用于专家系统中,成为表示不确定知识和推理问题的流行方法。随着近年来数据库规模的不断扩大,贝叶斯网络逐渐开始应用于大规模数据库的数据挖掘和知识发现,从而为决策支持提供了有力手段,贝叶斯网络已经成为数据库知识发现和决策支持系统的有效方法。 本文以黑龙江省防汛指挥决策支持系统[黑龙江省政府黑讯字2001-8号文件]为背景,对贝叶斯网络的知识发现与决策理论进行了相关研究。本文研究了基于贝叶斯网络的知识发现与决策过程框架,在该框架基础上,研究了贝叶斯网络在知识发现和决策支持领域的应用理论,包括贝叶斯网络的结构学习、参数学习、推理和解释、以及应用贝叶斯网络进行防洪知识发现和决策的问题。本文主要研究成果归纳如下: 提出了用于结构学习的一种新的附加约束的最大相互信息记分函数(MMI-R)。该记分函数以信息论中的KL距离、相互信息理论为基础,将最大相互信息原则引入贝叶斯网络的结构学习中,并使用网络模型的维数和网络结构的复杂度作为组合约束函数,将最大相互信息原则与组合复杂度约束函数相结合,提出了一种新的附加约束的最大相互信息记分函数。以该记分函数作为结构学习的评价标准,将贝叶斯网络的结构学习转化为一个优化问题。本文采用Cancer数据集,将提出的MMI-R记分函数与最大相互信息原则进行了对比,结果表明MMI-R函数解决了最大相互信息的完全图问题,采用4个典型的数据集Cancer、College、Asia和Alarm对提出的MMI-R记分函数同贝叶斯测度(BDE)和最小描述长度(MDL)记分方法进行了对比实验,结果表明本文提出的附加约束的最大相互信息记分函数在结构学习精度上具有更好的性能。 提出了用于结构学习的一种改进的模拟退火算法(SA-MMI-R)。该算法以附加约束的最大相互信息记分函数作为模拟退火的能量优化函数,对基本模拟退火算法进行了三个方面的改进:一是邻近值产生策略,产生邻近值分为3个部分:交换结点、加入新结点和删除旧结点。这种邻近值产生方法尽可能使得所有的邻近值都能够被遍历并具有较高的效率;二是设置新的算法结束条件,采用一个与网络结点状态维数、温度下降系数、状态接受概率为组合条件的动态迭代次数控制策略,并增加了以网络结点状态维数和对父结点集的连续无效修正次数为附加条件的算法结束条件来提高收敛速度;三是对算法增加一个记忆变量,使得算法可以接受暂时的恶化解并能够跳出局部最优,尽可能使 人连理工人学博卜学位论文算法趋近于全局最优解。为了满足先验知识,设计了修正算法用于SA一MMI一R算法的结果修正。本文采用典型的数据集对提出的SA一MMI一R算法’。Chickering提出的模拟退火算法和Larranaga提出的遗传算法进行了性能对比实验,实验结果表明SA一MMI一R算法在结构学习精度、计算速度上都具有较大的优势。 提出了用于参数学习的改进的EM算法(ISA一EM)。为解决传统EM算法难以处理大规模数据集和高维变量以及收敛速度慢的问题,本文提出从期望计算E和最大化计算M两个步骤分别改进EM算法。首先在E步骤将大规模数据集划分为较小的数据块,分别对每个块进行块内优化处理,并且在块间进行合成。这样,一方面将处理大规模数据集转化为处理较小的数据块,降低了计算量,同时也适应了变量维数的增加;另一方面将期望值在块间进行合成,避免了重复计算。然后在M步骤采用改进的模拟退火算法进行期望最大化计算。在前面用于结构学习的改进的模拟退火算法基础上,又增加了两个方面的改进:一是初始温度选择策略优化,以问题信息和状态分布为指导确定初始温度,考虑了各状态的相对性能,能够赋予不同状态合适的突跳概率;二是采用Cauchy分布产生邻近值,尽可能使算法趋近于全局最优。本文采用Cancer、College、Asia和Alarm数据集对提出的ISA一EM算法与标准EM算法进行了性能对比实验,实验结果表明IsA一EM算法在参数学习精度、计算时间和算法收敛速度方面都优于标准的EM算法。 研究了贝叶斯网络的在线学习问题,在可信度理论基础上,给出了在线参数学习的可信度EM算法并利用该算法与标准EM算法在计算速度方面进行了 比较。 研究了贝叶斯网络的推理与模型解释的方法和内容,给出了随机样本推理算法的一般形式;从证据解释、推理解释和模型解释三个方面对贝叶斯网络模 型解释进行了分析,对概率依赖关系解释进行了详细描述。 关于防洪决策问题,本文以贝叶斯网络为工具,针对洪水灾害预报和洪水 灾害风险决策问题进行了研究。研究了防洪决策系统的分析和评价体系,给出 了基于贝叶斯网络的降一雨汇流预报模型、河道洪水预报模型和洪水灾害风险决 策模型,对预测模型的预测精度进行了分析并对洪水灾害风险决策模型的概率 依赖关系进行了分析和解释,结果说明将贝叶斯网络应用于防洪知识发现与决 策具有较大的应用价值。

【Abstract】 A Bayesian network is a graphical model for probabilistic relationships among a set of variables. This paper provides a natural tool for dealing with two problems, uncertainty and complexity, in applying mathematics and engineering. Over the last two decade, the Bayesian network has become a popular representation for encoding uncertain expert knowledge in expert systems. With the expansion of database scales, Bayesian networks have been applied in large-scales database for data mining and knowledge discovery and become a powerful tool for decision-making. Bayesian networks play an increasingly important role in the fields of knowledge Discovery and decision-making. This paper first presents a model for knowledge Discovery and decision-making based on Bayesian networks. And the paper discusses Bayesian networks theoretic in learning Bayesian network and application.This paper discusses the structure learning for Bayesian networks. The paper proposes a Maximum Mutual Information Metric with Restriction (MMI-R) based on Kullback-Leibler divergence, Mutual Information Metric, and Maximum Mutual Information Metric. This paper proposes using dimension and complexity of networks to be the combined restriction function. The function of MMI-R is composed of the restriction and Maximum Mutual Information. This paper proves that the function of MMI-R is the best choice in all structures and complexities. With the function of MMI-R, the problem of searching the best structure is turn to the optimization of MMI-R.This paper proposes an algorithm of Simulating Anneal (SA-MMI-R) to optimize the metric. The paper proposes three technologies to improve the algorithm in optimization. Firstly, contiguous states generating mechanisms is proposed. The mechanisms include three parts, exchanging, joining and deleting. Secondly, a new ending condition is proposed. Thirdly, a new variable is used tomemorize optimizations. Variational end is used to improve convergence speeds. Several experiments on standard data sets, such as Cancer, College, Asia and Alarm, are used to prove the advantages of SA-MMI-R proposed in the paper. The results indicated that the algorithm of Simulating Anneal with Restriction (SA-MMI-R) has more advantages than others.This paper discusses the parameter learning for Bayesian networks. The paper proposed a new algorithm to improve EM. Firstly, This algorithm divided the whole sample database into smaller blocks and deal with them respectively at E-Step. The algorithm composes their results outside the blocks. Secondly, an algorithm of Simulating Anneal is used to calculate Maximum. Several improvements are applied in algorithm, such as selection of initialization, neighborhood values and reiteration numbers. This paper also discusses the problem of Online Learning Parameter for Bayesian networks. Several experiments on standard data sets, such as Cancer, College, Asia and Alarm, are used to prove the advantages of improved EM proposed in the paper. The results indicated that the algorithm of improved EM has more advantages than standard EM.This paper discusses the inference and explanation of Bayesian networks. The paper discusses the basic metric of stochastic sampling algorithms and proposes a general stochastic sampling algorithm for Bayesian networks inference. This paper describes explanation of Bayesian networks from three ways, evidence explanation, inference explanation and explanation. This paper gives a detailed explanation of probability relationship.This paper applies the Bayesian networks in Flood Decision Supporting System. Three models based on Bayesian networks are proposed. Rainfall forecast model is presented for accordant rainfall junction conflux. Flood forecast model is presented for forecasting flood level and flux. Flood risk model is presented for analyzing risk and possible economic losing. The entire models are used for Flood Decision Supporting. In the end, the paper gives a particular explanation of Flood risk model for knowledge Discovery and decision-making

节点文献中: 

本文链接的文献网络图示:

本文的引文网络