节点文献

递归神经网络梯度学习算法的收敛性

Convergence of Gradient Method for Recurrent Neural Networks

【作者】 徐东坡

【导师】 吴微; 李正学;

【作者基本信息】 大连理工大学 , 计算数学, 2009, 博士

【摘要】 人工神经网络(Artificial Neural Networks,简写为ANNs)是一种模拟生物神经网络结构进行信息处理的数学模型,也简称为“神经网络”(Neural network,NNs)。按照网络结构可分为两类:前向神经网络(FeedForward NNs)和递归神经网络(Recurrent NNs)。在前向神经网络中,前一层的输出为下一层的输入,信息的处理具有逐层传递进行的方向性,一般不存在反馈环路。前向神经网络实现输入向量x到输出向量y的映射,通常称之为静态映射,可用于处理与时间无关的对象,如文字识别,曲线逼近等问题。而在非线性动态系统建模、辨识、控制、故障诊断以及时间序列预测等许多领域中,经常涉及到两个离散时间序列x(t)和y(t)之间的映射,其中y(t)不仅依赖于x(t),而且还依赖于x(t-1),x(t-2),…,以及y(t-1),y(t-2),…,一般称之为动态映射。处理这类问题的网络本身应是一个动态系统,为此需要在网络中引入记忆功能。递归神经网络通过它们自身的暂态操作能够处理时变的输入和输出,它实现的是动态映射,比前向神经网络更适合于解决动态系统的问题。类似于前向神经网络,在训练递归神经网络时经常使用简单的梯度搜索算法。由于其递归的特性,致使对梯度的计算也是递归的,从而使其学习较前向网络要复杂得多。递归神经网络梯度学习算法的重要研究的课题之一便是其收敛性理论,对其开展研究不仅有助于我们理解方法的本质与特性,而且对其众多的具体应用也有着重要的指导意义。第一章回顾了有关神经网络的一些背景知识。第二章讨论了全递归神经网络梯度下降学习算法的收敛性。我们给出了误差函数单调性及收敛性定理,并给出了数值试验结果。第三章考虑有限样本集上Elman网络梯度学习算法的确定收敛性。证明了误差函数的单调递减性,在此基础上,给出了一个弱收敛性结果和一个强收敛结果,即误差函数的梯度收敛于零,权值序列收敛于固定点。数值试验验证了理论结果的正确性。第四章研究了在Elman神经网络的误差函数梯度中部分地去掉反馈项时对其性能的影响,主要目的是为了解决计算量太大的难题。我们分析了这种近似梯度算法的收敛性,得到了在学习过程中目标函数的单调性及近似梯度趋近于零的结果。第五章揭示了递归神经网络梯度学习算法的等价性。递归神经网络的两种经典学习算法分别为实时递归学习算法和随时间演化的反向传播算法,当权值更新为批方式时,我们证明这两种算法是等价的,二者所生成的权值增量相同。第六章针对递归神经网络的一些改进学习算法,给出了收敛性结果。

【Abstract】 An artificial neural network(ANN),which is often called "neural network"(NN),is a mathematical model or computational model based on biological neural networks for processing information.According to structure,neural networks can be classified into two categories: Feedforward neural networks(FNNs) and Recurrent neural networks(RNNs).In FNNs,previous layer’s output is the input of the next layer.The processing of the information has the direction of passing layer by layer.There are no cycles or loops in the network.FNNs achieve the mapping from input vector x to output vector y,which can be called as the static mapping.It can be used to deal with the time-independent objects such as character recognition and curve approximation.However,The mapping between two discrete time series x(t) and y(t) is often used in many fields such as nonlinear system modeling,controlling,fault diagnosis and prediction of time series,in which the output y(t) relies not only on x(t),but also on x(t-1),x(t-2),…,and y(t-1),y(t-2),…,which can be viewed as dynamic mapping.The networks dealing with such kind of problems should be a dynamic system,in which the memory function should be added.RNNs can cope with the time-varying input and output through their own delay.Thus,RNNs achieve the dynamic mapping.They are more appropriate to solve the problems in dynamic system than FNNs.As in the case of FNNs,the simple gradient searching algorithms is often used in training RNNs.The computation of gradient is also recursive for its recursiveness,whose learning is much more complicated than FNNs.One of the key research subjects of the gradient method for training recurrent neural networks is its convergence theory.The research on it not only helps us to understand the nature and character of the method but also provides the significant guidance for a large number of actual applicationsChapter 1 reviews the background information about the neural networks.Chapter 2 discusses the convergence of gradient method for fully recurrent neural networks. In this chapter,we put forward the monotonicity of the error function and convergence. The theoretical results are supported by numerical experiments.Chapter 3 considers the convergence of gradient method for training Elman networks with a finite training sample set.Monotonicity of the error function in the iteration is shown,on the basis of which weak and strong convergence results are proved,that is,the gradient of the error function goes to zero and the weight sequence goes to a fixed point,respectively.A numerical experiment is given to support the theoretical findings.Chapter 4 studies the influences of cutting the recursion in the gradient of the error function, whose aim is to reduce greatly the computational effort.We analyse convergence of this approximated gradient method for training Elman networks,and obtain that the error function is monotonically decreasing and its approximated gradient goes to zero in the learning process.Chapter 5 shows the equivalence of gradient method for training recurrent neural networks. The two classical gradient-based algorithms for recurrent neural networks are Real-Time Recurrent Learning(RTRL) and Back-Propagation Through Time(BPTT),respectively.For batch scheme,we prove that RTRL and BPIT are equivalent.The weight increment(s) they produced is the same.Chapter 6 gives the conclusion about the convergence on some improved learning algorithms for Recurrent NNs.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络