节点文献

基于GLOBUS的分布式数据挖掘模型研究与实现

The Research and Implementation of Distributed Data Mining Model Based on Globus

【作者】 陶舒亮

【导师】 王光明;

【作者基本信息】 浙江工商大学 , 管理科学与工程, 2008, 硕士

【摘要】 世界上万事万物都在不断变化发展,计算机应用模式随着企业应用的发展也在不断变化发展。计算机应甩模式在近50年的发展变化过程中,经历了从集中式到分布式的这一变化路线。网格技术的出现使计算机应用模式再次走向了分布。随着信息技术的发展,各部门内部或者企业内部产生的数据量在急剧增加。爆炸式的数据增长既给企业带来了机遇同时也带来了挑战,如何从这些海量数据中发现知识,以及如何有效的发现知识是当今信息社会遇到的重大挑战。传统的集中式数据挖掘方式虽然能在一定程度上解决由数据分布带来的一些问题,但是面对海量数据,传统的集中式数据挖掘方式在挖掘性能方面越来越不能满足人们的需要。网格应用模式的出现给分布式数据挖掘带来了新的契机。本文的研究重点是Globus环境下的分布式数据挖掘模型。分布式数据挖掘要解决的首要问题,是数据资源和计算资源的合理匹配,以达到挖掘性能的优化。传统的分布式数据挖掘模型——移动代码和移动数据模型,虽然各有优点,但是都没有解决数据资源和计算资源的匹配问题,不能对分布式数据挖掘任务进行性能优化。本文提出的PDS模型,结合了移动代码和移动数据模型的优点,并运用最小响应时间作为分布式数据挖掘任务分配策略,对基于多个数据集的分布式数据挖掘任务进行任务优化分配。论文还给出了分布式数据挖掘最小响应时间模型各组成部分的预测方法以及实验结果。GS模型是基于Globus网格服务的分布式数据挖掘模型,是PDS模型的简化模型。GS模型运用SOA的架构思想,将分布式数据挖掘功能以网格服务的形式进行封装,客户通过调用网格服务来完成数据挖掘任务,在第5章中作者开发了一个GS模型的服务端程序。

【Abstract】 All things are constantly changing and developing, computer application model with the development of enterprise applications are constantly changing and developing too. Computer application model in nearly 50 years of development and changes, has experienced from centralized to distributed models. With the presence of Grid technology, computer application model become distributed again. With the development of information technology, the data produced daily by various departments within the enterprise is increasing dramatically. Explosive growth of data in the enterprise not only brings opportunities but it also brings challenges, and how to discover knowledge and how to effectively discover knowledge from these massive data is a big challenge in today’s information society. The traditional centralized data mining approach to some extent, can solve a number of issues brought about by data distribution, but when faced with a mass of data the traditional way of data mining is increasingly unable to meet people’s needs. Grid technology brings new opportunities to the distributed data mining. This article mainly focused on Distributed Data Mining based on Globus environment. The first problem of DDM wants to slove is the rational matching between data resources and computation resources, in order to archive a good performance. The traditional model of distributed data mining-data transfer model and code transfer model, despite their different advantages, but did not solve the matching between data resources and computation resources, they can not performance task optimization. This article presents the PDS model(Policy , task Dispatching and Scheduling based DDM model, PDS Modle) combines the advantages of data transfer model and code transfer model, and apply minimum response time as a distributed data mining tasks allocation strategy. PDS model can assign task optimization based on multiple data sets DDM. The article also presented a prediction method of DDM minimum response time model.GS model is based on the Globus Grid Service, and it is a simplified model of PDS. GS model is a way of using SOA, it packs all function of distributed data mining services to a form of Grid Service, and allow the customer to call these services. In Chapter 5 , the author developed a model of GS.

【关键词】 网格分布式数据挖掘GT4PDS模型GS模型
【Key words】 gridddmgt4pds modelgs model
节点文献中: 

本文链接的文献网络图示:

本文的引文网络