节点文献

面向海量邮件存储的分布式文件系统研究

Research of Distributed File System Dedicated to Massive E-Mails’Storage

【作者】 王瑞珩

【导师】 刘秉权;

【作者基本信息】 哈尔滨工业大学 , 计算机科学与技术, 2008, 硕士

【摘要】 随着互联网技术的迅猛发展和网络用户相互交流的迫切需要,电子邮件日益成为人们办公和沟通的重要途径,它的数据规模也呈飞速膨胀的趋势。传统的文件系统很难满足海量数据存储和读取的性能要求,而现有的分布式文件系统并没有对海量邮件存储提供很好的支持,本文正是在这种前提下,对面向海量邮件存储的分布式文件系统进行了研究。分布式文件系统主要是利用网络将多台机器构成一个虚拟的文件系统。本文主要研究并实现了一个面向海量邮件存储的分布式文件系统,它除具有很强的容错性、可用性和可扩展性之外,还必须具有很高的I/O性能。针对邮件来源的特殊性,系统必须支持多种数据源的直接写入。为此,本文重点研究了如下问题并依此实现了本系统:首先,本文根据项目对文件系统的的需求,在合理分析了已有的分布式架构的基础上,设计出本分布式文件系统的架构。根据架构,设计并实现了系统的各个组成部分。其次,在开始设计分布式文件系统的内部写入和读出算法时,引入读写锁和租约。在读出和写入数据的过程中,研究系统的不同组成部分的多策略的负载平衡。把块副本冗余作为系统核心的容错方式,设计出系统中的每个组成部分的容错方案。再次,针对邮件来源的不同,有一般的数据源FTP,HTTP,FILE,也有专门的邮件源SMTP,IMAP和POP3,研究多数据源的公共接口并实现了公共接口的分布式文件系统写入。为了增强系统的I/O性能和数据完整性,在存储的文件格式中加入压缩和同步信息。最后,对分布式文件系统进行I/O性能测试。在机器数量有限的情况下,为了使现有系统的I/O性能的测试结果,在更大规模的机群上也成立,提出了速度稳定性测试。写入速度的测试结果高于20MB/s,而读出速度测试则约为40MB/s,这个测试结果也证明了此系统具有很高的I/O性能。

【Abstract】 With the rapid development of the Internet technology and urgent need of the Internet users’communication, E-mail increasingly becomes one of important ways of communication, and the scale of its data has the trend to expand fast. But the traditional file systems are difficult to meet the performance requirement of massive data. Meanwhile, current general distributed file systems don’t give a good support to massive E-mails. In such context, this paper presents the research of distributed file system dedicated to massive E-mails’storage.Distributed file system is a virtual file system formed of multiple connected computers. This paper mainly studies and implements a distributed file system dedicated to massive E-mails’storage. Besides its excellent fault tolerance, availabity and scalabity, the system is of high I/O performance. As the speciality of Email’s source, the system must support writing several protocols’data source into the file system directly. Therefore, this paper focuses on the following research and implements the system according the research result.Firstly, according to the project’s need on the file system, based on reasonably analyzing the architecture that has been proposed, we design the architecture of the distributed file system. In accordance with the architecture, we design and implement each components of the system.Secondly, the system introducs read-write lock and lease at the start of designing the reading and writing algorithms of the file system. While in the process of designing and implementing, the paper studies the load balance on reading and wiring operation of the system. The core of the system’s fault tolerance is block replicas. With replicas we design special fault tolerance of each system’s component.Thirdly, there are many E-mail data sources. Generally, we have data source: FTP, HTTP and FILE (Local File System). Specailly, E-mail has its own data source: POP3, IMAP and SMTP. This paper studies the multiple protocols’common interface and implements the system’s writing support according the interface. For raising the system’s I/O performance and data integraty, the file format of the system adds compression and sync info.Finally, we evaluate the system’s I/O performance. Under the circumstance of limited number of machines, we try to make the result evaluated on smaller cluster fit to larger cluster’s evaluation. This paper proposes the test of speed stability. In the evaluation of speed test, writing speed is above 20MB/s while reading speed is about 40MB/s. The evaluation proves that the system is high of I/O performance.

  • 【分类号】TP393.098
  • 【下载频次】102
节点文献中: 

本文链接的文献网络图示:

本文的引文网络