节点文献

基于互信息与先验信息的机器学习方法研究

Study of the Machine Learning Methods Based on Mutual Information and Prior Knowledge

【作者】 王泳

【导师】 胡包钢;

【作者基本信息】 中国科学院研究生院(自动化研究所) , 计算机应用技术, 2008, 博士

【摘要】 本文对基于互信息与先验信息的机器学习方法进行了研究。针对模式识别,本文研究了基于互信息的分类模型选择问题,提出了归一化互信息(Normalized MutualInformation)学习准则,分析并讨论了在处理二类和多类分类问题时,它与其它分类准则(准确率、精确率、召回率、ROC曲线、P-R曲线等)之间的非线性关系,并以支撑向量机方法中核函数选择问题为例,应用统计方法对互信息学习准则进行了研究;针对回归分析,本文研究了带广义约束的神经元网络模型(Generalized Constraint NeuralNetworks),讨论了神经元网络与部分已知关系进行结合的基本方法,通过应用先验知识来构造解决特定问题的神经元网络模型,以增加神经元网络的“透明度”。本文的主要工作和贡献有以下几个方面:①针对模式识别,研究了基于互信息的模型选择问题。提出了归一化互信息学习准则,推导并分析了在处理二类和多类分类问题时,它与其它分类准则(准确率、精确率、召回率、ROC曲线、P-R曲线等)之间的非线性关系,并对其应用特点与局限性作出初步解释。指出基于信息理论为学习准则的机器(分类、或聚类)学习原理就是将无序(类标、或特征)数据转变为有序(类标、或特征)数据的过程,其中转变效果是以信息熵为测量尺度。虽然不确定度(信息熵)为分类器设计者提供了独特的,不同于传统性能准则的有用信息,但该准则在分类问题应用上还有一定的局限性,特别是不确定度与传统分类性能指标并非有一致而单调的函数关系,在进行分类器设计选择时仍然需求传统分类性能指标的辅助计算。②以支撑向量机方法中核函数选择问题为例,应用统计方法对互信息学习准则进行了研究。通过综合实验和根据气象数据进行的特性实验表明:不同模型评估准则之间存在差异,但应用统计方法可以从这些差异中发现一些规律。同时,不同统计方法之间也存在差异,且这种差异对模型评估的影响要大于由于评估准则的不同而产生的影响。互信息学习准则作为一个综合性指标,在一定程度上可以弥补其它单一评估准则的不足。所以,在模型选择和模型评估时,要在应用多种统计方法的基础上,综合考评多种评估准则。③对人工神经元网络在解决“黑箱”问题方面的研究进展进行了文献综述。提出了“透明度”研究中的多层次划分分类框架,并针对回归分析,研究了基于先验信息的模型构造方法。讨论了带广义约束的神经元网络模型与部分已知关系进行结合的基本方法,特别是其中两种最为常见的“加和模型”和“乘积模型”。初步分析了广义约束神经元网络模型优于传统神经元网络学习性能的条件和应用特点。

【Abstract】 This work studies the machine learning methods based on mutual information and prior knowledge.For pattern recognition,this work studies the problem of classifier evaluation based on mutual information,and proposes the normalized mutual information criterion(NI).By analysis and deduction we prove NI is the nonlinear functions of other classification criteria(accuracy,precision,recall,ROC curve,P-R curve)and its statistical characters have been studied by applying it to the problem of kernel selections in support vector machines;For regression problems,this work studies the generalized constraint neural networks(GCNN) for the purpose of associating neural networks with partially known relationships, and reviews the methods of incorporating domain knowledge into neural networks for increasing its "transparency".The main contributions of this work include following issues:①For pattern recognition,this work studies the problem of classifier evaluation based on mutual information,and proposes the normalized mutual information criterion(NI).By analysis and deduction we prove NI is the nonlinear functions of other classification criteria(accuracy,precision,recall, ROC curve,P-R curve)and its application characters and limitations are primarily discussed.We point out that the classification(or clustering) based on information-based criteria is the process to transform the disordered data(label or features)into ordered data(label or features),and the transformation effect is measured by entropy.Though uncertainty(or entropy)provides a typical,unlike traditional,measurement for classifier designers,it has limitations for practical applications,especially it does not show the monotonous property with traditional classification criteria,and it needs the aids of traditional criteria for model evaluation.②This work studies the characters of NI with statistical methods by applying it to the problem of kernel selections in support vector machines.By synthesis experiments and special experiments on weather data,we point out that: there exists difference among different model evaluation criteria,but some rules can be found from the difference by statistical methods.Meanwhile, difference among different statistical methods also exists,and it affects the final results more seriously than the difference among different criteria.As a synthesis criterion NI shows statistical superiority than traditional criteria to a certain extent.So for model selection or model evaluation,it should be based on different statistical methods and different evaluation criteria need be analyzed.③For the "Black-box" problem of artificial neural networks,this work reviews the usually used methods of applying domain knowledge into neural networks for increasing its "transparency".A new framework of classifying different methods has been proposed,and for regression problems,this work studies the model construction method of applying prior knowledge. Furthermore,we discuss the generalized constraint neural networks for the purpose of associating neural networks with partially known relationships, and study two typical cases-Superposition and Multiplication.We deduce the conditions under which GCNN is superior to traditional neural networks, and elementarily analyze its application characters.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络