节点文献

基于半监督多示例的径向基函数网络模型及学习算法研究

Research on RBF Neural Networks Model and Learning Algorithm Based on Semi-Supervied and Multiple-Instance

【作者】 于文韬

【导师】 许少华;

【作者基本信息】 东北石油大学 , 石油工程计算技术, 2011, 硕士

【摘要】 实际生产生活中的数据集合存在着多样性与不确定性,处理具有多样性的海量数据成为目前机器学习的重点任务。因此,针对半监督多示例机器学习模型及算法的研究将成机器学习理论研究的热点方向。本文针对半监督多示例问题,首先,在非时序样本空间下,以径向基函数网络与聚类算法为基础,提出一种基于半监督多示例径向基网络的训练算法,并分析了样本空间中的孤立点问题。该算法的基本思想是通过定义一种可以衡量集合间距离的Hausdorff距离,进而在该距离的定义下提出一种基于半监督多示例的聚类算法。该算法充分借助已标记多示例样本的先验经验,对无标记样本进行标识,从而探明样本空间在聚类假设下的分布情况,再以Hausdorff距离作为径向基核函数中的泛数,利用径向基网络对整个样本集合进行训练学习,从而达到提高网络训练能力的目的。本文对该算法进行了仿真实验,证明了其实用性。其次,针对更为一般的时序样本空间,在径向基过程神经元网络、时序聚类算法与遗传算法的基础上,提出了一种基于半监督多示例的径向基过程神经元网络算法。该算法的基本思想是将Hausdorff距离做时空维的扩展,得到一种广义时序的Hausdorff距离,进而得到时序半监督多示例聚类算法,再采用径向基过程神经元网络对样本集合进行训练。在训练过程中,需要对核中心函数的系数进行调整,为解决min { }函数不可微的问题,引入遗传算法;同时利用遗传算法可以得到全局最优解的这一特性,可以减少网络训练时所需的迭代次数。并通过仿真实验,证明了其有效性。最后,本文针对神经网络大样本集训练普遍存在的效率低这一缺陷,提出基于MPI与OpenMp混合编程技术下的半监督多示例径向基过程神经元网络并行训练算法。该算法主要是针对半监督多示例径向基过程神经元网络中的聚类算子与遗传算子进行并行化计算。并针对不同规模的训练函数样本集和计算节点进行了对比实验。实验结果表明,根据网络和样本规模适当选取并行粒度,可以有效地降低网络训练时间,达到提高网络性能的目的。

【Abstract】 There is too much diversity and uncertainty in the data of actual productive lifestyles, dealing with diversity and vast amounts of data become the focus of present machine learning tasks. So, the models and algorithms of machine learning base on semi-supervised and multi-instance will become a hotspot research direction.Firstly, aiming at the problems of semi-supervised and multi-instance, a training algorithm of RBF neural network based on semi-supervised and multi-instance was proposed in this paper, the proposed algorithm was in non- sequential sample space, which based on the RBF neural network, and the cluster algorithm. At the same time, the paper carried on the outlier analysis in the sample space. The basic ideas of the algorithm were recommended as follows: By defining a kind of Hausdorff distance which can measure the distance between two sets, then a clustering algorithm based on semi-supervised and multi-instance is proposed.The proposed algorithm marks the unlabeled sample with the help of transcendent experience of labeled sample. So that distribution of sample space under the cluster assumption was proved up, then the method used RBF neural network to train the whole sample set, among which Hausdorff distance is used as the norm of the RBF kernel function, so as to improve network training ability of neural networks. And in order to prove practicability of the algorithm, a simulation experiment is carried out.Secondly, in sequential sample space, base on the RBF process neural networks, timing clustering algorithm and genetic algorithm, a training algorithm of RBF process neural network based on semi- supervised and multi-instance was proposed in sequential sample space, which can be regard as a general case. The basic ideas of the algorithm were recommended as follows: By defining a kind of generalized timing Hausdorff distance which was extended by the Hausdorff distance, a timing clustering algorithm based on semi-supervised and multi-instance was proposed. Then, the method used RBF process neural network to train the sample set. In the training process, the neural network needed adjusting coefficients of kernel central functions. The method introduced genetic algorithm to solving non-differentiable problem of minimal function, at the same time, owing to the global optimal property of genetic algorithm, the proposed algorithm of neural network could reduces iteration times. And the practicability of this algorithm is proved by a simulation experiment.Finally, aiming at the problem in the ineffective training of neural networks, under the mixing MPI and OpenMP program technology, a parallel training algorithm based on semi-supervised and multi-instance is proposed. In the algorithm, the parallel computation would be realized in the clustering operator and genetic operators of the RBF process neural network base on semi-supervised and multi-instance. Under different magnitude samples and compute nodes, the comparative tests are carried out. The results show that the algorithm could reduce the training time, improve the property of neural network, when the parallel granularity was appropriate.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络