节点文献

BP算法并行化及在数据挖掘中的应用研究

The Parallelism and Application in Data Mining of BP Algorithm

【作者】 胡月

【导师】 熊忠阳;

【作者基本信息】 重庆大学 , 计算机系统结构, 2003, 硕士

【摘要】 数据挖掘是帮助人们在海量数据中发现信息和知识的工具。近年来数据挖掘技术成了商业智能的核心技术,被广泛应用到了诸多领域,引起了学术界极大的关注。数据挖掘是一个决策支持过程,技术基础是人工智能。目前数据挖掘主要利用人工智能中的一些的算法和技术,包括人工神经网络技术等来进行预测、模式识别、分类和聚类分析。本文主要针对神经网络作为数据挖掘的一种手段,在商业行为趋势预测方面的应用研究。BP(Back Propagation)算法, 即误差反传训练算法,以其良好的非线形映射逼近能力和泛化能力以及易实现性成为人工神经网络应用最广泛的训练算法。但是BP算法也有其明显的缺陷,即训练速度慢、容易陷入局部极值等。通过反复的实验研究和分析发现,通常为了避免初始权值过大,导致训练伊始就使网络处于S型函数的饱和区,使训练陷入局部极小,在选取初始权值的时候,通常选取较小随机数。如果选取的权值范围距离目标极值区域很远,搜索空间越大,目标极值区域越窄,搜索时间就越长,训练速度就越缓慢。针对这种情况,本文提出了首先通过不等量划分权值搜索空间获取全局最小极值区域,在此基础上均衡分配训练样本集进行并行训练的二次并行搜索策略,实验证明这种新的并行算法能在迅速找到全局最小的基础上大大提高收敛速度,得到优于一般并行算法的加速比。此并行算法实现简单有效,能更好地应用于现实问题。本文选用通过商用网络连接起来的PC机,以及并行虚拟机PVM和分布式操作系统LINUX,共同构成了一个机群系统作为并行计算平台。在并行程序的模型上选用了Master/Slave模型。算法并行化方式采用了将训练数据平均分配到各节点机的数据并行方式。 最后,讨论了BP算法在数据挖掘中的应用。将此策略应用于医药物流系统的销售预测,建立了基于并行BP算法的物流销售预测模型。本文详细地讨论了销售预测模型的样本的选择和预处理方法、网络拓扑结构的选定,如输入输出层以及隐含层数和隐含层节点数的选择、网络参数的选择等。最后实现了一个可视化的预测系统,可以在此基础上方便的选择不同训练集重新训练网络,并将训练好的网络用于真实的销售趋势预测,取得了令人满意的效果。

【Abstract】 Data mining technology is used to help people finding the information and knowledge in the data. It has become the core technology of the intelligence commerce. It has been widely used in many areas and drawn the attention of the whole academe. Some algorithms and techniques of artificial intelligence, including neural networks, have been applied in data mining to do prediction, pattern recognition, classification and Clustering. One important application of neural network in data mining is sales trend prediction. BP (Back Propagation) algorithm is the most popular training algorithm in applications for its non-linear mapping approach capability and robustness. However, it is known to have some defects, such as converging slowly and immersing in local vibration frequently. Generally, we often choose small random initial weights to void training process immerse in local minimum. If it is far from chosen range of weights to goal area, the search space is wider, goal area is narrower, search time is longer and training speed is slower. To solve this problem, the paper proposed a solution named two times parallel search strategy, that is, obtaining global minimum area by dividing weight space unequally at first and then training network using data parallelism. The experiment results show that the strategy reaches global minimum soon and converges at high rate, especially to a large training samples. The hardware platform is PC connected with LAN. The software platform is PVM and LINUX. They construct the whole PC-cluster system. The parallel program model is master/slave model. The algorithm assign data set to each node realizes the data-parallel. The application of BP algorithm in data mining is discussed in this paper. The strategy mentioned is applied to sales prediction of medicine logistics system and a sales prediction model based on parallel algorithm is established. How to choose and preprocess training set and how to select network topology is proposed in detail in this paper. At last, a visual prediction system is realized to achieve prediction result, which makes prediction works easy.

  • 【网络出版投稿人】 重庆大学
  • 【网络出版年期】2004年 04期
  • 【分类号】TP18
  • 【被引频次】1
  • 【下载频次】360
节点文献中: 

本文链接的文献网络图示:

本文的引文网络