节点文献
云计算中的MapReduce并行编程模式研究
Research on MapReduce Parallel Programming Model in the Cloud Computing
【作者】 吴贵鑫;
【导师】 许合利;
【作者基本信息】 河南理工大学 , 计算机应用技术, 2010, 硕士
【摘要】 云计算是并行计算、分布式计算和网格计算的发展,使并行技术走进了人们的生活。云计算、个人高性能计算机(PHPC)等技术的深入发展,使许多技术人员开始从单机工作模式向并行计算模式转变。云计算的逐步普及使并行程序设计成为许多程序设计人员必须面对和解决的一个关键性问题。Google提出的MapReduce并行编程模式极大的降低了并行程序的开发难度。与传统的分布式程序设计相比,MapReduce封装了并行处理、容错处理、本地化计算、负载均衡等细节,还提供了一个简单而强大的编程接口,极大的简化了并行程序设计的难度。本文首先介绍了云计算的概念、基本理论和研究现状,阐述了几种传统的并行编程模式,分析和研究的它们的原理和发展。对Google云计算和Hadoop云计算架构进行了简要的介绍,并将MapReduce与MPI进行比较,研究两者的区别与各自优势。文中详细地阐述了MapReduce的编程思想,分析和研究了MapReduce解决问题的工作原理、具体步骤和方法。介绍了MapReduce的容错机制,并对MapReduce作业的调度算法进行了详细的分析。研究了MapReduce在异构Hadoop集群环境下的性能差异,分析了异构环境对MapReduce性能的影响。本文提出一种新的数据分配机制HDDM,以集群中各异构节点的计算比率为依据来分配输入文件,提高了MapReduce在异构Hadoop集群中的性能。最后通过实验证明,我们提出的数据分配机制HDDM能够极大的提高MapReduce程序的执行效率。
【Abstract】 Cloud computing is parallel computing, distributed computing and grid computing’s development, and make parallel technology into people’s life. Cloud computing, technology of personal high-performance computer (PHPC) developed deeply, which make many technical personnel to start working from Stand-alone mode transfer to parallel computing mode. The popular of Cloud computing make parallel programming as a key problem many programmers must confront and resolve.Google suggest the MapReduce parallel programming model greatly reduced difficulty of the parallel programming. Comparing with traditional distributed program design, MapReduce encapsulates the parallel processing, tolerant, localization calculation, load balancing etc. details. Also provides a simple and powerful programming interface, and greatly simplifies the design of parallel programs.This paper firstly introduces the concept of cloud computing, basic theory and research status, and state several traditional parallel programming models, analyses and studies its principle and development. Briefly introduce Google computing clouds and Hadoop cloud computing structure, and compare MapReduce will with the MPI, studies the difference between the two with their respective advantages.This paper elaborates the thoughts of MapReduce programming in details, analyzes and studies principle of MapReduce solving work problems and specific steps and methods. MapReduce fault is introduced, and scheduling algorithm of MapReduce is analyzed in details when in working. then studies the difference for properties of MapReduce in heterogeneous Hadoop cluster environment, and analysis the influence on MapReduce in heterogeneous environment. This article suggests a new data distribution mechanism HDDM, according to calculation ratio of heterogeneous cluster nodes input file, improve performance of MapReduce in heterogeneous Hadoop cluster.Finally, the experiments show that the proposed data allocation mechanism HDDM can greatly improve the efficiency of MapReduce programs.
【Key words】 Cloud Computing; MapRedcue; Parallel Programming; Data Distribution; Hadoop; Heterogeneous Cluster;