节点文献

分布式存储系统中的节点自主性问题研究

Research on Node Autonomy Issue in Distributed Storage System

【作者】 宋玮

【导师】 赵跃龙;

【作者基本信息】 华南理工大学 , 计算机应用技术, 2010, 博士

【摘要】 当今信息化的快速发展已经使得个人、企业和政府部门的数据信息量呈几何曲线增长,相应地对信息存储服务需求的期望值也越来越高。此外,希望信息存储形式的多样化和随时随地、灵活地获得各种高质量的数据信息访问服务也已经逐渐成为许多部门一种新的信息服务需求。为适应这些新的信息服务需求,目前已经出现了一些新的技术解决方案,如:建立在网格计算和云计算体系结构上的网格存储与云存储为不同层次的网络实体提供了广阔的存储应用空间,使存储服务无处不在成为可能;而建立在P2P自组织覆盖网络上的P2P存储则充分发挥了不同层次网络实体的能力,使存储的提供不再局限于专业的存储设施,将共享的精神发挥的淋漓尽致。另外,随着桌面设备性能的提高,一种系统作为另一种系统的某个节点来达到结构上的融合,尤其是在分布式存储环境下的技术融合也已经成为一种新的发展趋势。因为在融合后的分布式存储环境中每个节点都具有很强的自主意愿和理性行为,并且又由于各个节点自身分别属于不同的组织和个人,所以每个节点均具有进行局部控制的能力和追求局部利益的期望,这为融合后的分布式系统中的整体控制和组织管理带来了很大的技术挑战,也就是说存储节点的自主性问题已经成为融合后的分布式存储环境中一个必须解决的重要问题。因此,展开对分布式存储环境下的节点自主性问题的研究有非常重要的理论意义和实际意义。本文在融合的分布式存储环境下研究了存储节点的自主性问题,分别从体系结构参考模型、底层覆盖网络、激励相容的资源选择机制和副本放置技术四个方面进行了系统的研究,取得了一些有意义的创新性研究成果。本文的主要研究工作和创新性成果体现在以下几个方面:1、提出了一种基于对等网络的自主管理的分布式存储系统的体系结构参考模型(Self-managed Distributed Storage Architecture Reference Model based on P2P ,SM-DSARM)。SM-DSARM是一种结合P2P覆盖网络与面向服务思想而建立的分层参考模型,首先从分层的角度描述了SM-DSARM的层次功能,其中异构的物理节点以统一的形式抽象成独立的存储服务实体(Storage Service Entitiy,SSE),SSE是分布式存储系统中的活动主体,它具有管理者、资源使用者和资源提供者三种身份,各SSE以P2P覆盖网络的形式进行组织实现存储资源服务管理的去中心化。其次,采用形式化的方法描述了模型的静态概念与动态行为,重点给出了一种SSE的功能部署结构,并采用Petri网进行了动态行为建模。SM-DSARM与其它面向服务的体系结构相比,它将虚拟的SSE作为系统的主体,使得物理节点能以不同的性能和方式参与到系统活动中,增强了灵活性和自主性,并且依然保留了SOA与P2P的分散控制、扩展性、自组织等优良特性。2、提出了一种适应自主节点的具有加速收敛和可用性改善的P-Grid覆盖网络。在充分研究P-Grid覆盖网络的基础上,首先从收敛性和可用性两个方面对P-Grid进行了改进,为SM-DSARM提供了P2P覆盖网络层。P-Grid覆盖网络中随机漫步形成树的速度是一个关键,基于此提出了针对节点无初始数据负载量(Ignore-of-Load)及有初始数据负载量(Care-of-Load)两种情况的改进构建算法。Ignore-of-Load算法可从加大路径延长的程度以及推荐成功率两方面提高收敛速度,实验表明Ignore-of-Load算法在收敛速度上的提高超过50%。其次,考虑到自主节点对索引存放的意愿,又提出和比较了以路径为主导、以数据为主导和具有符合度调整的3种Care-of-Load算法。具有符合度调整的Care-of-Load算法在收敛速度上表现良好,并且对数据索引的查找能保证90%左右的成功率。另外,完全分散控制的P-Grid覆盖网络通过大量冗余将低在线率的节点构建成高可用性的系统,根据融合分布式环境下节点具有周期性的特点对P-Grid覆盖网络的可用性进行改善。它以P-Grid构建算法和路由算法为基础,形成以长期节点为主体二叉树的虚拟多叉树的周期性组织方式,设计适当的信息表结构建立与周期节点和普通节点的关系。数值分析表明在相同的节点规模下,周期性组织方式可以达到更高的可用性而不影响维护消耗。改进的P-Grid在保留了原有的完全分散管理、自组织和分布式负载平衡等适应自主节点特点的同时也通过结合节点的意愿增强了对自主节点的适应性能。3、在研究现有侧重公平性和侧重真实性的激励机制基础上,提出了适应理性而自私的自主节点的激励相容的单向存储资源选择(1-M )机制和有服务差别的激励相容的双向选择( S-N-M )机制,为SM-DSARM的存储服务实体管理控制层提供了存储资源的选择机制。首先,研究了单独的真实性激励问题,提出了一种激励相容的单向存储资源选择(1-M)机制。该机制从单个用户角度看待存储资源的综合性能,通过设计合适的支付函数和效用函数来保证自主资源节点报告真实综合性能值。其次,通过在真实性激励机制中引入公平性,提出了一种有服务差别的激励相容的双向选择( S-N-M )机制,该机制使用历史贡献量和用户需求紧迫性参数,使得单位紧迫性对应的高历史贡献量的节点具有优先及获得较多资源的权利,同时真实的紧迫性及历史贡献量的提供仍然依赖于机制中支付函数与效用函数的合理设计。最后,理论分析证明了这两种机制是激励相容的,模拟实验也表明两种机制能达到自主节点真实性的激励,从而最大化节点的个体效用,并且后者在保证真实性的基础上较好体现了存储资源享用的公平性。4、分析了现有副本放置技术的研究现状及存在的问题,提出了一种兼顾自主节点利益的多数据对象、多节点的副本放置模型和算法。这是一种SM-DSARM中存储服务实体管理控制层使用资源选择机制的结果进行存储服务组合的策略。首先,建立了一种多数据对象、多节点有容量限制的副本放置模型,实现向博弈模型的映射,分析博弈模型中占优战略均衡及纳什均衡在不同容量状态下的存在性,同时讨论纳什均衡的效率PoA。其次,提出了副本放置纳什均衡的获取算法及分析该算法存在纳什均衡的条件,针对条件不满足的情况提出“删除受限纳什均衡”。最后,设计了一个节点交互控制方式解决信息获取、博弈发起及维护的问题,用以支持均衡获取算法能适应具有自主节点的分布式环境。模拟实验显示了系统平均副本数及系统总代价分别与容量和放置代价的关系;同时小规模情况下的实验表明采用纳什均衡获取算法产生的系统总代价与最优解决方案下的总代价不会有大的差异,也可以保证自主节点个体效益的最大化。

【Abstract】 Data of individual, enterprise and government department have greatly increased with the rapid development of informatization , which makes high expectations in storage service. What’s more, varieties of storage style and flexible access to high quality data anytime anywhere have being new information service demands. Adapting to these new demands, fresh technical solutions have been brought up, such as Grid storage, Cloud storage and P2P storage. Grid storage and Cloud storage are built on Grid computing and Cloud computing architecture which provides broad storage applications for different level entities in network making storage service anywhere possible.P2P storage is built on P2P self-organized overlay network which can fully exert different level entities’abilities. With P2P storage, supply of storage has not yet been limited to professional storage facilities, which shows the spirit of sharing incisively and vividly. Further more, with the performance increasing in desktop facilities, integration among those three storage systems has being a trend. In structure integration, one storage system will be an apart of the other storage system. In technology integration, the similar issue will take the similar solution.Because nodes in integrated distributed storage environment have strong autonomy in making decisions and have rational actions, and also because each node is belong to different organizations and individuals, they all hope to be controlled by themselves and pursue to local interests. Those features of nodes bring great challenge to global control and organization, which means that autonomy of storage node has been an important issue in integrated distributed storage environment. For above reasons, studying on node autonomy is very important in theoretical significance and in practical significance for integrated distributed storage environment.This dissertation systematically studies on node autonomy in integrated distributed storage environment from four aspects, including system architecture reference model, overlay network, incentive resource selection mechanism and replica placement technology. Some meaningful and innovative achievements have been got. The main contributions of this dissertation are as below:1. A self-managed distributed storage architecture reference model based on P2P (SM-DSARM) is presented. It is a layer model with combination of P2P overlay network and service-oriented idea. First, functions of each layer are described. In SM-DSARM, heterogeneous physical nodes are abstracted to independent storage service entities (SSEs) in unified form. SSE is a main subject which plays three roles, including manager, user and provider. SSEs are organized by P2P structure overlay network, which realize decentralization of storage resource management with P2P self-organized feature. Second, formal method is used to describe the static concepts and dynamic behavior of SM-DSARM. At last, functions deployment structure of SSE is presented and Petri net is using to describe dynamic behavior. Compared to other service-oriented architectures, using virtual SSE as main subject can enable physical node to join in system activities with different performance and style. By this way, flexibility and autonomy can be strengthened and features of SOA and P2P are still kept such as decentralized control, scalability and self-organization.2. After fully research of P-Grid overlay network, improved P-Grid with fast convergence and higher availability is presented, which provides overlay layer to SM-DSARM. On the one hand, The rate of forming tree by randomly meeting is key point which greatly effects system performance on it. Improved algorithms are proposed focusing on two situations which are node without initial data load (Ignore-of-Load) and node with initial data load (Care-of-Load).Ignore-of-Load algorithms improve convergence rate by two aspects which are extending path with more bits and increasing success of recommendation. Experiments show that convergence rate has been stepped up 50% by Ignore-of-Load algorithms. Then, considering node’s willing of index placement three kinds of Care-of-Load algorithms are designed and compared, that is algorithm focusing on path, algorithm focusing on data and algorithm with satisfy adjustment. Experiments show that algorithm with satisfy adjustment is also do better in convergence rate and successful search rate is up to 90%. On the other hand, in totally decentralized P-Grid, a large number of nodes are organized to form high availability system. Considering nodes have periodical feature, we improve system availability by forming virtual multi-branch tree with long term peer as main body binary tree and suitable information tables are designed to establish relations among long term peer, periodical peer and normal peer. Numerical analyses show that in the same number of nodes, periodical organization can reach higher availability and not affect maintenance cost. By considering nodes willing of index placement, adaption to autonomy nodes in P-Grid is strengthed and autonomy features of P-Grid are still kept such as complete decentralization, self-organization and decentralized load balancing.3. After research of incentive mechanisms focusing on fair and focusing on truth, incentive compatible one-way storage resource selection (1-M) mechanism and incentive compatible two-way selection (S-N-M) mechanism with service differentiation are presented, which provides storage resource selection to SM-DSARM SSE management layer and adapts to rational and selfish autonomy nodes. First, in 1-M mechanism, storage resource performance is handled from user’s sight. Then suitable payment function and utility function are designed to guarantee resource node to tell the truth. Second, S-N-M mechanism renders incentive mechanisms focusing on fair being introduced in incentive mechanisms focusing on truth. Nodes with high history contributions per urgent demand in S-N-M mechanism will have priority to obtain service and get more resource. Also, true reported values of urgent demand and history contributions depend on payment function and utility function. At last, theoretical analysis proves that both mechanisms are incentive compatible. Experiments show both mechanisms can stimulate nodes to tell the truth, and the second mechanism can embody fair without loss of truth.4. After analyzing related work in replica placement issue, a multi-object and multi-node replica placement technology focusing on maximizing node self-utility is presented, which is a way using resource selection in SM-DSARM SSE management layer to compose storage service. First, a model considering multi data objects, multi nodes and capacity restriction is established, and is reflected to Game model. Existence of dominant strategy equilibrium and Nash equilibrium are analyzed in different capacity status. Then, efficiency of Nash equilibrium is also analyzed. Second, a Nash equilibrium achieving algorithm is designed and condition which brings Nash equilibrium is analyzed. Meanwhile delete-restriction Nash equilibrium is defined to solve unsatisfied condition. At last, a node interaction method is proposed to solve information obtaining, Game initiating and maintenance, which makes Nash equilibrium achieving algorithm to fit for distributed environment with autonomy. Experiments show the relations among capacity, placement cost, system average number of replica and system total cost. Meanwhile, in small scale, total cost caused by Nash equilibrium achieving algorithm is not have great difference with total cost caused by optimal solution.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络