节点文献

MPI高性能云计算平台关键技术研究

Research on Key Technologies of the MPI-based High Performance Cloud Computing Platform

【作者】 郭羽成

【导师】 杨克俭;

【作者基本信息】 武汉理工大学 , 计算机应用技术, 2013, 博士

【摘要】 云计算是一种新近流行的计算模式,是目前进行海量数据处理的主要技术手段。现有云计算平台技术虽然已较为成熟,但在处理兼具数据密集和计算密集的一类问题上仍有不足,效率较为低下。一方面,目前主流平台底层普遍采用虚拟化技术,所有软件和应用均运行在虚拟硬件之上,从而带来了一定程度上性能的降低,有文献指出,其性能损耗可达20%左右,本文使用非虚拟化方法构建云计算平台底层的模拟环境,在其上对其性能进行测试,实验表明,虚拟化技术的效率只有非虚拟化技术的50-70%;另一方面,现有云计算MapReduce模型,对中间数据采用先存储、再转发处理的策略,当中间数据规模变大时,产生了大量无用的远程I/O操作,其效率不能满足高性能计算的应用需求。本文在深入分析、研究现有云计算平台的缺点和MPI (Message Passing Interface)技术容错容灾能力的基础之上,自主研发了一种基于MPI的高性能云计算平台原型系统(MPI-based HPCCP)。该云平台不经过虚拟化,直接使用异构计算节点构建云平台底层;采用增加多级容错容灾功能的MPI技术和多线程技术重写MapReduce编程模型,避免大量无用的I/O操作,从而提高云计算的效率,以满足兼具数据密集和计算密集的海量数据高性能计算问题对云计算的需要。本文提出的"MPI高性能云计算平台”的主要创新点如下:1.针对现有云计算平台底层性能损失较大的问题,提出了一种云计算平台底层的非虚拟化构建方法。本文在节点异构环境下,不使用目前流行的虚拟化技术,而是利用MPI良好的异构环境开发能力,直接使用异构硬件架设云基础设施服务层,减少了虚拟化对云底层硬件性能的影响,从而提高了云平台效率,实验表明虚拟化技术的性能只有非虚拟化技术的50%-70%.这是本文的一个重要创新点。2.针对目前MPI在容灾能力上的不足,本文改进了MPI的容灾技术,提出并设计实现了一种MPI的多级容灾方案。虽然MPI有进行高性能计算的优异能力,但容灾能力一直以来是MPI的一个重要缺陷。此缺陷限制了MPI在海量数据处理上的应用。不解决此问题,无法将MPI技术应用于云计算之中。本文着重研究了MPI容错容灾技术,实现了“作业重调度”,“作业/任务恢复”和“动态任务迁移”三个不同层次的容灾方案,弥补了MPI容灾能力的不足,是本文的一个重要特色和创新点。3.针对现有云计算编程模型MapReduce计算效率的不足,本文提出并实现了一种基于多级容错MPI的MapReduce模型。现有MapReduce模型的数据传递由分布式文件系统所封装,计算过程中需要不断重复的对分布式文件系统进行I/O操作,从而影响了计算效率。本文使用多级次容灾MPI重写MapReduce编程模型,对中间结果进行直接处理,减少不必要的I/O操作,提高了云计算速度和效率,执行时间是当前主流云平台Hadoop的25%,这是本文的重要特色和创新点。本文从系统分片大小对性能的影响,多级容灾的健壮性、效率,新云计算平台的总体性能方面进行了详细的测试与分析。并将新云计算平台与传统Hadoop平台进行了比较。实验表明本文提出的MPI高性能云计算平台的执行效率是传统Hadoop平台的4倍以上。在文章的最后,本文对本文所做的研究进行了工作总结。并简要说明了有待于进一步研究的问题及将来的研究计划。本文下一步将针对如何解决各节点计算有依赖性的问题,以及如何在本平台实现节点间MPI并行、节点内CPU+GPU并行的问题进一步展开研究。

【Abstract】 Cloud computing is a main technique for massive data processing, however it is inefficient for dealing with both data intensive and computational intensive problems. The low layer of the cloud computing uses the virtual technique, so that all system and application softwares execute on the virtual hardware, which reduce performance up to20percent pointed out by a literature. In other hand the MapReduce paradigm of cloud computing adopts the store-forward stratagem for medium data, which would create great amount I/O operations for big data and cannot be applied efficiently to high performance science computing.Based on above considerations and in view of the MPI weakness in fault-tollerent capability the dissertation focuses on developing a MPI-based high performance cloud computing platform (HPCCP), which configures the low layer of the platform directly using heterogeneous computing nodes without virtualization, and reprograms the MapReduce paradigm with integration of multilayer fault-tolerant MPI techniques and multi thread techniques to avoid great amount unnecessary I/O operations and increase the efficiency. The proposed and implemented MPI-based HPCCP platform prototype can efficiently deal with the data-intensive as well as computing intensive problems to satisfy high performance cloud computing requirements.The main creations of the proposed MPI-based HPCCP platform are as followings:1. A methodology, which configures the low layer of the cloud-computing platform directly using heterogeneous computing nodes without virtualization.The proposed and implememted MPI-based HPCCP platform in the dissertation, instead of adopting the fashionable virturalization techniques, fully takes advantage of the MPI ability of exploration and adaptivity in heterogeneous computing nodes to directly construct the IaaS layer of the cloud-computing platform. This is an important creation that increases productivity of the cloud plateform by decreasing harm influences of the virtualization to hardware capability in the IaaS layer.2. Amelioration and implementation of the MPI multi-layer fault-tollerent techniques. The weak fault-tollerent ability is a crucial defect of the MPI, comparing with its excellent ability of high performance computing, which limits the MPI application in big data processing. The MPI technique could not be adapted in the cloud computing provided that the defect would not be solved. The dissertation has comprehensively studied the MPI fault torrelent technoques, proposed and implemented three different fault tollenent techniques:job rescheduling, job/task recovering, and task dynamic migration, which are allocated in three different layers. This creation has remedied the defect of MPI in the fault tolerant ability, which is another distinguishing feature of the dissertation.3. An efficient MapReduce prototype of the MPI-based HPCCP platform has been designed and implemented.The data transfer in current MapReduce paradigm implemtation is encapsulated by the distributed file system (DFS), so that repeated I/O operations are taken place to the DFS during the data processing, which seriously reduce system efficiency. The dissertation reprograms the MapReduce paradigm on a redisgned multi-layer fault-tolerant MPI platform, which can directly process the medium results, reducing unnecessary I/O operation, speeding up the cloud computing and obtaining higher efficiency. Comparing with the Hadoop, the current fashionable implementation of the MapReduce, our MPI-based HPCCP can reduce a big data processing time of the fingerprint recognition to25percent.The dissertation has done intensive tests and case studies for the MPI-based HPCCP platform. Among them there are some of them:the influence of data block size to data processing performance; robustness and efficiency of the multi layer fault tolerancy; gerenal performance of the MPI-based HPCCP platform. Finally the comparision between the Hadoop and the MPI-based HPCCP platform has been done. The experiments have shown that the proposed and implemented cloud-computing platform in the dissertation has four more times better runtime than the traditional Hadoop platform.In the last section, conclusions and some to be solved problems have been listed. The near future reseach proposal is also described briefly.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络