节点文献

分布式存储系统中的数据冗余与维护技术研究

Research on Data Redundancy and Maintenance Technology in Distributed Storage System

【作者】 王禹

【导师】 赵跃龙;

【作者基本信息】 华南理工大学 , 计算机应用技术, 2011, 博士

【摘要】 分布式存储系统是解决海量数据存储问题的有效手段之一,它利用冗余数据维护技术,通过分散在网络上大量存储节点之间的协作,能够实现长久可靠的数据存储服务;现有大规模数据中心、P2P网络存储和无线网络存储技术等均属于分布式存储系统的范畴。然而,在分布式存储系统中,由于某些存储节点可能会暂时失效或永久失效,所以存储系统一般是通过附加冗余数据信息的方式来保证存储系统的可靠性和可用性,因此分布式存储系统中的数据冗余和维护技术是一个非常重要的研究课题。目前在分布式存储系统的数据冗余和维护技术中,所面临的主要问题有:1)当采取不同的数据冗余策略时,必须研究针对该策略的数据可靠性问题,从而预测系统的失效概率、所需的数据冗余大小和系统的生命周期等。2)针对不同的数据冗余策略,需要研究更加有效的存储编码。3)对于采用纠删码冗余的分布式存储系统,实现数据修复往往需要耗费大量的网络带宽,这对某些低速的存储网络可能是无法容忍的,所以必须研究改进纠删码冗余的数据修复方法。4)某些新的应用可能会使存储的数据从传统的静态文件共享转变为动态文件交互,文件副本需要经常更新,所以必须研究维护冗余副本的一致性问题。因此,分布式存储系统中的数据冗余与维护技术的研究课题有重要的理论意义和实际意义。基于此,本文分别从数据冗余的可靠性、最小存储与最小带宽的数据冗余编码、干扰准直技术的冗余数据维护和冗余数据的一致性维护四个方面对分布式存储系统中的数据冗余和维护技术问题进行了深入分析和研究,取得了若干创新性成果。本文的主要研究工作和创新性成果体现在以下几个方面:1.提出了一种能够预测数据冗余系统可靠性的数学模型(DRSRM,Data Redundancy System Reliability Model)。针对分布式存储系统存储节点的不稳定性,分析了复制和纠删码冗余维护的数据文件可用性,给出了存储节点失效与修复的数学分布,从而计算出存储节点的可靠性模型。在此基础上,提出了复制数据冗余存储系统的可靠性预测模型(DRSRM),该模型能模拟系统冗余数据的维护过程,并由此计算出系统的失效率、经历的时间段和系统生命周期等。2.提出了两种新的数据冗余编码,即:最小存储冗余再生码(MSRRC,Minimum Storage Redundancy Regenerating Code)和最小带宽冗余再生码(MBRRC, Minimum Bandwidth Redundancy Regenerating Code)。本文根据纠删码冗余数据维护中失效数据修复的理论极值点:最小带宽再生点(MBR, Minimum Storage Regeneration)和最小存储再生点(MSR, Minimum Bandwidth Regeneration),提出了最小存储冗余再生码(MSRRC)和最小带宽冗余再生码(MBRRC)的概念,分别给出了这两类编码的数据分布、失效数据修复和数据重构过程,理论证明了实现原理的正确性,并详细给出了两类编码的运行实例,最后通过实验证明了编码的有效性。3.提出了一种运用干扰准直技术实现分布式存储系统冗余数据维护的方法(RDMIA, Redundancy Data Maintenance based on Interference Alignment)。RDMIA方法的突出优点在于:1)丢失的编码分块能直接从其它编码分块的子集中修复,无需重构原数据;2)能从固定数目的存活编码分块就能修复失效分块,该数目只依赖于多少个编码分片丢失,而无需知道哪个分片丢失。运用该技术能极大减少分布式存储系统冗余数据维护时的网络开销。4、提出了一种运用副本信息传播树(RBT, Replica information Broadcast Tree)维护冗余数据一致性的方法(DCMRBT, Data Consistency Maintenance based on RBT)。DCMRBT方法的主要设计思想是:通过为每个数据副本节点的关键词构建RBT,使得系统能追踪副本位置并传播副本更新信息。该策略能有效避免热点和节点失效问题,同时由于避免显示记录节点的ID和IP地址来存储副本,因此能有效地保护节点私密性。

【Abstract】 Distributed storage system serves as one of the effective means for solving the problem of mass data storage. It uses redundant data maintain technology, by the collaboration of a large number of nodes distributed on the network, to achieve long-term reliable data storage services. Existing large-scale data centers, P2P network storage and wireless networking technology etc., all of which belong to the category of distributed storage system. However, some nodes in the system may be temporarily or permanently disabled. In order to ensure reliability and availability of systems, redundant data is generally adopted by the storage system. Therefore, the redundancy and maintenance technology has become an important research issue in distributed storage system.At present, for data redundancy and maintenance technology of distributed storage system, the major problems currently facing are: 1) When taking the different data redundancy strategy, we must learn more about the data reliability for the strategies, and then predict the probability of the system failure, the size of the required data redundancy, system life cycles and so on. 2) For different data redundancy strategies, we need to study more effective storage encoding. 3) Redundancy using erasure codes for distributed storage systems, its data restoration would consume a lot of network bandwidth, which may be unable to be tolerated for some low-speed storage network. We must improve data recovery method for the erasure codes redundancy. 4) The new application of distributed storage have changed from the traditional static file sharing to dynamic file interaction, the replicated files update frequently. So the problem of maintaining data replicas’consistency must be taken into account. In a word, it has an important theoretical and practical significance on studying the the data redundancy and maintenance in distributed storage system.In the dissertation, for these problems, an in-depth research was conducted from the four aspects, that is, the reliability of different redundant strategies, the realization of the minimum storage and minimum bandwidth redundancy coding, the use of interference alignment techniques to repair redundant data, redundant data consistency maintenance. We have made some innovative achievements for data redundancy and maintenance in distributed storage system.The main research work and innovative achievements are reflected in the following several aspects:1. Mathematical model (DRSRM, Data Redundancy System Reliability Model) is proposed which can help to predict the reliability of redundant distributed storage system. In my dissertation, Data availability of redundant maintenance of the replication and erasure codes are analyzed. The mathematical model of storage node failure and repairing is proposed and the reliability of the storage node is analyzed. On the basis of which reliability forcast model for the replicated data redundancy storage system is also put forward, which is used to simulate the maintenance process of data redundant, and then calculate the system failure rate, the time period, the life cycle of the system etc..2. Minimum storage redundancy regenerating code (MSRRC) and the minimum bandwidth redundancy regenerating code (MBRRC) are presented. According to some literature, the data regeneration has the two extreme points: the minimum bandwidth of regeneration points (MBR) and minimum storage regeneration points (MSR), therefore, MBRRC and MSRRC are presented. We analyze the principle of two types code’s reconstruction and regeneration, prove the reliability of realization principles, and describe in detail the implementation process by the examples. Finally, the experimental results prove it effectiveness.3. Interference alignment technology (RDMIA, Redundancy Data Maintenance based on Interference Alignment) is proposed to reduce network overhead when redundancy data need to be repaired. Its prominent advantage is that: 1) Loss of block can be directly repaired from the subset of other encoded block, no need to reconstruct the original data. 2) Invalid block can be repaired from a fixed number of survival encoded block. The number only depends on the number of missing coded patch, without knowing which patch is lost. Applying this technology can greatly reduce the network costs when redundancy data is maintained in distributed storage system. 4.The replica broadcast tree (RBT, Replica Broadcast Tree) is proposed to maintain the redundancy data consistency. By the construction of RBT for every replica’s key, system can trace the replica location and spread replica’s updated information. The strategy can effectively avoid the problem of hot spots and node failures. At the same time, system stores replicas by avoiding display ID and IP address of recording node, therefore, it can effectively protect the privacy of the nodes.

节点文献中: