节点文献
基于数据消冗和Chord协议的分布式存储技术研究
Research on Distributed Storage Technology Based on Data De-Duplication and CHORD Protocol
【作者】 金雪姣;
【导师】 李东;
【作者基本信息】 哈尔滨工业大学 , 计算机科学与技术, 2010, 硕士
【摘要】 随着信息时代数据规模急剧增长,信息量不断激增,数据信息已成为人类宝贵的财富,数据的价值已经远远超过了计算机系统本身的价值;另一方面,各种不确定因素又使得数据极易丢失,从而给用户带来了巨大的损失。因此,面对海量数据对存储系统各方面需求的挑战,高效率的数据存储技术受到了人们的广泛关注。为适应海量数据对存储系统各方面的需求,本文首先研究了现有的分块级数据消冗技术,比较了定长分块数据消冗和变长分块数据消冗的优缺点,分析了影响数据消冗效果的因素。接着重点研究了基于Rabin指纹的变长分块算法,提出了一种新型的文件切点查找算法。本文还根据基于分块的数据消冗技术和基于Chord的分布式存储技术的特点对文件资源定位进行了设计,并根据Chord协议的特点将文件分块的索引信息按区间分布在不同的节点中,以二级索引的方式解决了集中式分块索引的难题。本文最后提出了基于Chord协议的分布式存储技术和基于Rabin指纹的变长分块的数据消冗相结合的的分布式存储系统结构。实验结果表明,在基于Chord协议的分布式存储系统中引入数据消冗技术,可以降低整个分布式存储系统的存储负担。此外,数据传输量的减少也有利于提高低速网络下的数据备份与恢复的效率。
【Abstract】 With the rapid growth of data size in the information age, the amount of information increases quickly, data information has become a valuable asset to the mankind. The value of data has far exceeds the value of the computer system itself. On the other hand, various uncertain factors make the data vulnerable to lost, which will bring huge losses to the users. Therefore, faced with the challenge of massive data to the all various aspects of storage system, high efficiency data storage technology has been widely concerned.To meet the needs of massive data to the storage system, we study the existing block-level data de-duplication technology first, compare the advantages and disadvantages of the fixed-length block data de-duplication and variable length block duplication, analysis the factors which affect the efficiency of duplication. We then focus on the Rabin fingerprint-based variable-length block algorithm, propose a new document cut point search algorithm.According to the characteristics of Chord protocol and data de-duplication technology, we design the location of file resources and data duplication filtering strategy. By storing the block index information in different nodes according to the characteristics of Chord, we solve the problem of the centralized block indexing. Finally this paper proposes a distributed storage system architecture build up with Chord-based distributed storage technology as well as Rabin fingerprint-based variable-length block de-duplication technology.The experiment results show that the introduction of data de-duplication technology to the distributed storage system based Chord protocol reduces the system storage burden. Besides, the reduction of data transmission amount increases the efficiency of data backup and recovery under the low-bandwidth network.
【Key words】 distributed storage; Chord protocol; de-duplication; Rabin fingerprint;