节点文献

基于RBF神经网络的网页分类技术研究

The Research of Web Pages Classification Based on RBF Neural Network

【作者】 史国强

【导师】 李村合;

【作者基本信息】 中国石油大学 , 计算机科学与技术, 2011, 硕士

【摘要】 随着Internet的普及,网络已经成为人们获取信息的主要途径,为了帮助人们从海量网页中获取有用的信息,网页自动分类技术应运而生,其可以快速有效地分析和组织海量网页信息,它是利用机器学习的方法对网页实现自动类别标注。在众多网页分类算法中,RBF神经网络因其出色的分类能力,成为机器学习的研究热点。介绍了网页分类的流程,分析了RBF神经网络技术发展、原理和相关技术,讨论了RBF神经网络在网页分类中的重要作用。阐述了目前RBF神经网络常用训练算法,研究了在多实例多标签框架下发展而来的MIMLRBF神经网络模型。针对MIMLRBF在不平衡样本下分类效果差的问题,提出了改进的训练算法,考虑了样本的整体分布情况,使各类上产生的隐含层神经元趋于平衡,减少了不平衡样本对网络模型的影响。针对SVD方法在含有噪声数据的样本集上会导致网络整体误差变大的问题,提出了基于最速下降法优化的权重训练算法,使用SVD方法初始化权值矩阵,采用最速下降法优化权值矩阵,并利用新权值矩阵的误差平方和函数计算学习率矩阵,提高了MIMLRBF神经网络在含有噪声数据的样本集上的分类精度。最后,将改进后的训练算法应用到网页分类系统中,并对改进算法进行了实验对比和性能分析。实验数据表明,本文算法具有更高的分类效率和准确率。

【Abstract】 With the popularity of the Internet, the Internet has become the main way people get information. Web pages classification can analyze and organize massive web pages quickly and efficiently, it is a kind of machine learning methods that assign labels to web pages automatically. Among the many web pages classification algorithms, RBF neural network become a research focus in machine learning because of its excellent classification ability.This thesis describes the process of Web pages classification, the development of RBF neural network, related technologies, summarizes the important role of RBF neural network in web pages classification. The common training methods of RBF are also studied, including the derived multi-instance multi-label RBF neural network. We proposed an improved method for the poor performance of MIMLRBF on unbalanced dataset. This method takes into account the overall distribution of the samples, so that the hidden neurons generated on all classes tends to balance, reducing the unbalance problem on the network.When the training data are noisy or not easily discernible, the SVD method will cause augmented overall error in network performance. In this thesis, the weights optimization method based on the steepest descent method is proposed for relieve this problem. Firstly, the weight matrix is initialized by SVD method, and then optimized by steepest descent method. The learning rate matrix is computed by minimizing the sum-squared error function of the new weight matrix. The performance of network is improved on noisy training data.Finally, the improved training algorithms are applied to the web pages classification system. The performance of improved algorithms are analyzed and compared. Experimental data show that the algorithms have higher efficiency and accuracy.

  • 【分类号】TP183;TP393.092
  • 【被引频次】2
  • 【下载频次】108
节点文献中: 

本文链接的文献网络图示:

本文的引文网络