

Localized Generalization Error Model of Multilayer Perceptron Neural Networks

【作者】 杨飞

【导师】 杨苏;

【作者基本信息】 哈尔滨工业大学 , 计算机科学与技术, 2008, 硕士

【摘要】 多层感知器神经网络在模式识别、函数逼近、风险预测和控制等领域中有广泛的应用,泛化能力是评价多层感知器神经网络训练成功的重要标准。多层感知器神经网络从训练样本提取“知识”,实现从输入空间到输出空间的映射,最后用训练过的多层感知器神经网络分类器对新到来的未知样本进行有效的分类。现有的评价多层感知器神经网络泛化能力的方法主要有两类:解析模型和交叉验证方法。解析模型提供了一种数学的方法来评价多层感知器神经网络的泛化能力,而交叉验证方法是一种实验性方法来评价多层感知器神经网络泛化能力。这些方法有以下缺点:不能区分有相同隐藏层神经元数而权值不同的网络的泛化能力、忽略了未知样本和训练样本之间存在多大差异、对大的数据集时间复杂度比较高。在实际应用中,对于一个特定的分类问题,期望训练的多层感知神经网络分类器能够正确识别与训练样本相差很大的未知样本是不合理的。这是本文研究多层感知器神经网络局部泛化误差模型的动机所在。局部泛化误差模型利用与训练样本“相似”的未知样本来确定训练网络泛化误差上界。未知样本与训练样本是“相似”的,如果这个未知样本特征值与训练样本的特征值的差异小于给定的实数值Q。局部泛化误差模型包括训练集误差,随机敏感度测量和给定训练集常数。在局部泛化误差模型中,训练误差和随机敏感度测量之间达到最好的折中时有最小化的局部泛化误差。在本文中,用局部泛化误差模型对多层感知器神经网络进行结构选择。即对于给定的分类问题,选择的隐藏层神经元数的多层感知器神经网络具有最好的泛化能力。用15个UCI数据集实验仿真,实验结果表明用局部泛化误差模型进行多层感知器神经网络结构选择的方法结果好于其他现有的几种方法。最后多层感知器神经网络的局部泛化误差模型应用到图像标注问题中,实验结果表明该方法有很好的应用前景。

【Abstract】 Multilayer Perceptron Neural Networks (MLPNNs) have a wide range of applications, such as pattern recognition, function approximation, risk prediction, control, etc...The generalization capability of a MLPNN is always the major criterion to determine the successfulness of the MLPNN training. This is because the ultimate goal of MLPNN learning is to extract the input-output mapping of a given classification problem from a set of training samples and then recognize future unseen samples correctly.There are two major approaches for estimating the generalization capability of a MLPNN: Analytical Models and Cross-Validation Methods (CV). Analytical models provide a mathematical tool for us to analyze the generalization capability of a MLPNN while CV methods estimate the MLPNN generalization capability empirically. The major drawbacks of existing methods include: can not distinguish particular MLPNN with the same number of hidden neurons but different weight values; ignore how much differences occur between unseen samples and training samples and time-consuming for large datasets. In practice, one trains a particular MLPNN for a given classification problem and one should not expect that the MLPNN could be able to recognize unseen samples that are very different to the training samples correctly. So, these motivate us to propose the Localized Generalization Error Model (L-GEM) for MLPNN in this thesis.The L-GEM provides an analytical upper bound of the generalization error for unseen samples which is similar to the training samples. An unseen sample is considered to be similar to the training sample if the difference between its input feature values and that of a training sample is smaller than a pre-selected real value Q (i.e. local in the input space). The L-GEM model consists of the training error, the stochastic sensitivity measure and constants computed from the given training dataset. The L-GEM model shows that a MLPNN with minimum generalization error must have the best balance between training error and its stochastic sensitivity measure.In this thesis, the L-GEM is adopted to architecture selection for MLPNN. We find the best number of hidden neurons of MLPNN for a given classification problem. Experimental results using 15 UCI datasets show that MLPNNs built using L-GEM outperform several existing architecture selection methods. The L-GEM architecture selection method is also applied to an image annotation problem, and the experiment result show that our method has a good application prospect.
