节点文献

基于HDFS的多用户并行文件IO的设计与实现

Design and Realization of Parallel File IO Based on Hadoop Distributed File System

【作者】 金松昌

【导师】 方滨兴; 杨树强;

【作者基本信息】 国防科学技术大学 , 计算机科学与技术, 2010, 硕士

【摘要】 随着计算机网络及其应用的快速发展,特别是Google提出基于Internet的海量数据存储和Map-reduce并行计算思想以来,网络化的数据存储管理和并行分析处理成为学术界和产业界研究的焦点,其中Hadoop作为该思想的参考实现之一,受到了广泛的关注。Hadoop的核心HDFS分布式文件系统采用锁机制控制文件并行IO,不支持多用户对同一文件的读、写并行,限制了多用户并行文件操作的性能,为此,本文针对海量日志类型数据的特点,提出了一种非基于锁机制的并行文件IO模型,并通过实验,验证了本模型的有效性。本文主要工作包括:(1)对Hadoop的相关工作进行了深入的分析,特别在深入分析其分布式文件系统HDFS的基础上,针对HDFS不支持多用户文件并行读写的不足,提出了使其支持多用户并行文件读写的改进思想。(2)通过分析HDFS的并发控制模型,针对海量日志类数据特点,提出了一种不使用互斥机制的分布式文件系统的多用户并行IO模型,基于该模型,在适当降低数据读取完整性的条件下,可以实现对于同一个文件的多用户读写并行、读读并行。(3)通过对原有HDFS实现的改进,设计实现了一个支持多用户并行IO的分布式文件系统。实验表明,本改进有效提高了多用户并行文件IO的性能。

【Abstract】 With the rapid development of computer networks and its applications, especially since Google proposed Internet-based mass data storage and Map-reduce parallel computing ideas, data storage management based on network and parallel analysis and processing has become the focus of academia and industry. As one of the reference implementation of the idea, Hadoop has been widespread concern.In order to control file parallel IO, the core of Hadoop—Hadoop Distributed File System(HDFS) use lock mechanism, but does not support multiple users read and write in parallel on the same file. So, this paper proposes a parallel file IO model based on Block granularity, and finally experiments to verify the availability of this model.In this paper, the main works are:(1) Related work on Hadoop was deeply analyzed, particularly on Hadoop distributed file system (HDFS), because of the deficiency of Hadoop on multi-user file parallel IO, improvement ideas was taken out in this paper.(2) By analyzing the implementation of Hadoop, A multi-user parallel IO model without mutual exclusion mechanism was proposed for distributed file system, based on the model, under the right condition of reducing the integrity of the data reading, multi-user reading and writing in parallel on the same file was realized.(3) By modifying the source code, we implement the function described in the model designed, and then carry out experiments to verify the function and performance of the model.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络