节点文献

基于MRMR的贝叶斯网络结构学习算法研究

Research of Bayesian Network Structure Learning Algorithms Based on MRMR

【作者】 陈坤

【导师】 姚望舒;

【作者基本信息】 苏州大学 , 计算机软件与理论, 2013, 硕士

【摘要】 贝叶斯网络是一种概率图模型,能够高效表示随机变量之间复杂的独立依赖关系;即使在数据不完整的情况下,仍然具备高效的推理能力,因此越来越广泛的用于决策、诊断和复杂系统的控制等领域。如何从原始数据中学习到贝叶斯网络是其解决问题的前提。贝叶斯网络由结构和参数组成,其中结构学习是核心。本文研究了贝叶斯网络的理论知识和学习贝叶斯网络的相关算法,分别在完整数据集和缺失数据集下,结合最大相关和最小冗余特征选择技术,重点研究贝叶斯网络结构学习算法。其研究内容主要体现在以下几个方面:1)针对完整数据集,改进了基于节点次序的最大相关和最小冗余贪婪贝叶斯结构学习算法(OMRMRG),该算法引入了最大相关和最小冗余特征选择技术,并采用局部贝叶斯增量评分函数,在有限的数据集上提高了算法的精度和准确性。但由于是随机产生初始节点的次序,因此增大了结果的不确定性。本文提出了一种生成优化的节点初始次序的方法,在得到基本有序的节点初始次序后,再结合近邻交换算子进行迭代搜索,能够在较短的时间内得到更加正确的贝叶斯网络结构。实验结果表明了该方法的有效性。2)针对缺失数据集,对于BN-GS算法中的随机初始化缺失数据和随机生成节点次序带来的不确定性,利用节点次序预排序算法产生初始次序;对于每个缺值的节点变量,使用其在整个数据样本中出现次数最多的那个取值作为初始化,加快了收敛的速度,提高了为边确定方向时的正确率。基于初始化后的完整数据,应用克鲁斯卡尔算法建立最大权重生成树,并按照上述的节点次序确定生成树中边的方向。在使用吉布斯算法迭代修正数据的过程中,赋予了局部参数一个更准确的修正幅度。改进后的BN-GS算法能够在较短的时间内收敛到平稳分布,得到了较优的结构。实验结果表明了该算法的有效性和正确性。3)研究了基于评分搜索的方法和基于约束的方法,并通过实验进行了比较与分析。与基于评分的方法相比,基于约束的方法更为直观,且速度也比较快,有时得到的箭头可以理解为因果关系;但是独立性检验没有评分函数准确,而且一旦出错,就会对后面的计算产生很大影响。

【Abstract】 Bayesian networks(BN) are probabilistic graphical models that can efficientlyrepresent complex independences and dependences relationships among random variables.It has still the ability of efficient reasoning even without complete data. Now BN arewidely applied for the decision aid, the diagnosis, the control of complex system, etc. Howto learn BN from native data is a precondition of solving the problem. A BN is a directedacyclic graph with a probability table for each node, and structure learning is the core ofthe BN. This thesis discuss theoretical knowledge and learning algorithm related to BN.Based on the maximum relevance and minimum redundancy feature selection techniques,it focuses on the structure learning algorithm of BN for complete data sets and missingdata sets. Our researches are concluded as follows:1) In view of complete data sets, we improved the Ordering-based Max-Relevanceand Min-Redundancy Greedy algorithm, which introduces the maximum relevance andminimum redundancy feature selection techniques and uses local Bayesian increment scorefunction which improves the precision and accuracy of the algorithm on limited data sets.Because the initial node order is randomly generated, it increases the uncertainty of theresults greatly. In this thesis, a new method, which can find a basic orderly order, isproposed. After getting a basic order, it uses neighbor-swap operator to iteratively search abetter result. Experimental results show that the new approach is effective.2) In view of missing data sets, the BN-GS(Bayesian Network&Gibbs Sampling)algorithm will bring larger uncertainty because of initializing the datasets and producingnode’s order randomly. So in this thesis, a pre-sorted algorithm, which is proposed before,is used to produce initial order of nodes, and the missing value of one node is initialized bythe value which its number is the highest in the entire datasets. This method speeds up theconverge process and increases the accuracy when orientation for the edges. Then using Kruskal algorithm, a maximum weight spanning tree is constructed from complete datasets,and based on initial order of node, the orientation of edges is adjusted. In the process ofmodification of local parameters by Gibbs sampling algorithm, it is assign a more accuratemodification value. The improved BN-GS algorithm can converge to the stationarydistribution in a shorter time. Experimental results show that the new approach is correctand effective.3) In this thesis, the method based on search&score is compared and analyzed withthe method based on constraint from the experiments. Compared to the method based onsearch&score, the method based constraint is more direct, faster in terms of speed, andsometimes the arrows that obtained from data can be interpreted as a causal relationship.The drawback of the method based on constraint is that the independence test is worse thanthe method based on search&score, and if there is error, the later calculations is greatlyaffected.

  • 【网络出版投稿人】 苏州大学
  • 【网络出版年期】2013年 11期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络