

Research on Learning Methods and Application of Discriminative Bayesian Networks

【作者】 高妍方

【导师】 陈英武;

【作者基本信息】 国防科学技术大学 , 管理科学与工程, 2008, 博士

【摘要】 贝叶斯网络分类器在很多领域有广泛的应用。为了更好的解决分类问题,出现了两种不同的扩展贝叶斯网络分类器。一是网络结构的扩展,这方面的代表有朴素贝叶斯分类器和TAN分类器;另一种是学习方法的扩展,即基于判别学习的贝叶斯网络分类器。从学习的目的性来讲,贝叶斯网络的学习方法包括生成学习和判别学习。自贝叶斯网络出现以来,主要研究了生成学习方法,而判别学习相关的研究则相对很少。本文从判别学习的角度,围绕实际问题中的分类代价不平衡数据、属性缺值数据以及类别缺值数据,研究了基于判别学习的贝叶斯网络的几种算法。本文的主要研究内容如下:(1)论文总结了现有的贝叶斯网络的生成学习算法和判别学习算法,并从几个不同的角度对生成贝叶斯网络和判别贝叶斯网络进行了实验对比。(2)在分类代价不平衡数据的判别贝叶斯网络的学习中,针对样本数据分类代价的不平衡性,在判别贝叶斯网络学习的基础上,提出了贝叶斯网络的代价敏感参数和代价敏感结构的学习算法。在参数学习算法中提出了一种代价敏感损失函数作为目标函数,并应用共轭梯度法进行求解;而在结构学习中则提出了代价敏感准则用于贝叶斯网络的结构学习,这种代价敏感准则是关于分类代价和分类精度的双重评分准则。(3)在属性缺值数据的判别贝叶斯网络的学习中,针对实际问题中存在的属性缺值数据,研究了判别贝叶斯网络学习的CEM算法。提出了一种使得CEM算法收敛的Q函数,分析了收敛的CEM算法在判别贝叶斯网络学习中存在的缺陷,并在此基础上分别从E步和M步对CEM算法进行近似,降低了计算的复杂度,使得CEM算法在判别贝叶斯网络的学习中是有效且可行的。(4)在类别缺值数据的判别贝叶斯网络的学习中,针对实际问题中存在的大量类别缺值数据,研究了贝叶斯网络的半监督学习和主动学习算法。首先提出了一种生成-判别混合的半监督学习算法,应用对数联合似然函数度量无标签样本与模型的拟合程度,而应用对数条件似然函数度量有标签样本与模型的拟合程度;然后为了实现对类别缺值数据的代价敏感挖掘,提出了基于代价敏感样本选择策略的主动学习算法。(5)将本文相关的研究方法用于烟叶感官质量的评价中,从化学成分缺值、感官类别缺值和考虑分类代价等多个角度对烟叶感官质量进行预测和评价,为实际的卷烟生产提供了一种智能化的评价方法。

【Abstract】 Bayesian network classifier has been widely applied in many domains. Two types of Bayesian network classifiers have been extended so that they can handle more complex problem. One type of Bayesian network classifier is extended in structure and the representational classifier is such as na?ve bayes and TAN. Other type of extended Bayesian network classifier is discriminative Bayesian networks classifier. Bayesian networks have two learning paradigms such as generative learning and discriminative learning. Generative learning of Bayesian networks have been widely researched but discriminative learning is fewly researched. In this paper discriminative Bayesian networks learning from unbalanced data, attribute missing data and label missing data are researched. The main content and fruits of this paper are outlined as follows:In this dissertation, generative and discriminative learning methods of Bayesian networks are reviewed and then they are compared from various points of view.To study discriminative Bayesian networks learning from unbalanced data, cost sensitive learning method of discriminative Bayesian networks is presented. Cost sensitive Bayesian networks take into account classification cost. In the cost sensitive parameter learning, a cost sensitive loss function is proposed and in the cost sensitive structure learning a cost sensitive criterion is used in model selection.To study discriminative Bayesian networks learning from attribute missing data, CEM learning method is given. A Q function that has monotonic and convergence log conditional likelihood is proposed. However, convergency CEM has some faults when it is used in discriminative Bayesian networks classifier learning. Accordingly a simple Q function is proposed to replace it. Then, in M step of CEM optimal procedure is replaced by a search procedure of gradient descent. The approximation E step and M step make CEM simpler and more effective than standard CEM.To study discriminative Bayesian networks learning from label missing data, semi-supervised learning and active learning methods are presented. Fistly a generative-discriminative hybrid method is studied. In hybrid method, objective function is weighted between log joint likelihood of unlabeled data and log conditional likelihood of labeled data. Then active learning based on cost sensitive sampling method is proposed. In the method, two cost reduction sampling methods are proposed.In the end of this dissertation, tobacco quality is evaluated by discriminative Bayesian networks. This can be taken as auxiliary means in tobacco design.


