节点文献

结构化P2P网络性能分析与搜索算法研究

Performance Analysis and Search Algorithm Improvement of Structured P2P Network

【作者】 黄庆凤

【导师】 李之棠;

【作者基本信息】 华中科技大学 , 计算机系统结构, 2008, 博士

【摘要】 近年来,对等网络(P2P)日益成为Internet上的一个重要应用。P2P网络中的每个节点既是内容提供者,又是内容消费者。结构化对等网络是扩展性高、容错性好的P2P网络,是当前的研究热点。基于分布式哈希表的结构化P2P网络利用DHT进行资源定位,它将网络中每个资源和节点哈希到同一个值空间,每个共享资源被发布到和自己资源标识符最接近的节点上。这种定位机制虽然有效地解决了非结构化对等网络中洪泛机制所带来的不可扩展问题,却不可避免地带来了另外的一些问题。这些问题主要是:(1)由于节点的频繁加入和离开所引起的路由表更新代价非常大;(2)哈希过程丢失了节点的物理位置信息,使得重叠网络中的邻居物理上可能相距很远,从而使得结构化P2P网络中的查找延迟增大;(3)在抖动率很高时,某些结构化P2P网络随着网络规模的扩大而性能急剧下降;(4)只支持从key到value的精确匹配,不支持多关键字搜索,搜索效率不高。当前,结构化P2P网络的研究主要集中在对现有的DHT协议的直接改进上,很少通过对网络的性能进行分析而提出改进方案。这种方法存在着两个缺点,一是这种改进只针对某种DHT协议,如Tapestry,没有普遍意义;二是容易造成顾此失彼,如常量度数的DHT算法虽然降低了查询跳数,但对于节点的异常离开的处理能力较差,容错性能大大降低。为了更有效解决上述问题,有必要从路由表更新代价、查找延迟和抖动环境下的可扩展性等网络性能指标对典型结构化P2P网络的性能进行整体分析,进而提出改进算法和方案。从而提高结构化P2P网络的整体性能。为了找出影响网络抖动的关键因素,首次提出了反向邻居节点的概念,即把自己作为邻居节点的节点,在此基础上,计算了六种DHT网络的反向邻居节点数。分析了影响抖动的路由方式、邻居选择、节点加入和节点离开以及并行查找等策略因素。发现对任意两种DHT网络,它们分别采用的五种策略都至少有两种不同,于是,对两种DHT网络直接进行比较就很难确定哪些策略能更有效地降低抖动。因此,提出在同一网络内,用不同的单个策略对网络抖动进行比较和分析的方法,称之为CSP。对现有的Pastry算法进行改进,构造了使用快速加入策略的F_Pastry算法和周期性恢复策略的P_Pastry算法。并分别把F_Pastry、P_Pastry和原有版本的慢速加入、反应性恢复算法进行了比较。还把使用迭代路由的Chord协议和使用递归路由的Chord协议进行了比较,得出以下结论:迭代路由、快速加入和周期性恢复策略和有效的邻居选择算法能更有效地降低网络的抖动。研究发现,数据存放机制、路由方式、邻居选择和服务选择方式等,是影响结构化P2P网络查找延迟的关键因素。通过对一些典型结构化P2P网络查找延迟的比较和分析,提出了降低内容寻址网络节点间延迟的DCAN算法。该算法将内容寻址网络中的节点抽象成一个无向带权图,在此基础上,把Dijkstra算法求得的源节点到目的节点的最短路径作为路由。仿真实验表明,和路由过程中每次选择最小的下一跳的算法相比,DCAN算法能更有效地降低从源节点到目的节点的总延迟。为了研究抖动环境下各种结构化P2P网络的可扩展性,提出了LBE评价方法。该方法把成功查找的时间延迟和查找失败率随网络规模的变化作为衡量可扩展性好坏的标准。随机改变DHT的各种参数,得到一系列查找延迟时间,把它们形成的最低线作为网络的整体性能指标。固定某一参数,但随机改变其它参数再进行仿真,从而得到最佳网络整体性能条件下的参数值。用上述方法对三种DHT协议的可扩展性进行了综合分析,并得出了相应的结论。通过分析现有多关键字搜索算法的缺点,提出了ICCCS搜索算法,该算法在DHT协议层之上建立逻辑的关键字搜索层,搜索层采用改进的超立方体互联圈(Improved Cube-Connected Cycle)结构。通过向量空间模型选择每个对象的重要关键字,并将其映射到ICCC节点的环号,然后将描述对象的整个关键字集映射到ICCC节点的立方体标号。基于反向文档索引技术建立了索引算法,并使用生成树建立了搜索算法。实验结果表明,该算法对于关键字较少的查询有较高的查找精度和较低的查找延迟。

【Abstract】 In recent years,peer-to-peer network has been widely used in the Internet.Each node on the peer-to-peer network is both a content provider and a content consumer. Because of high scalability and good fault-tolerance capabilitysstructured peer-to-peer network becomes current research hotspot.Which identifies and locates resources based on Distributed Hash Table(DHT),hashing resources and nodes to the same key space so that each resource is mapped to the node whose identifier is closest to its resource identifier.While this resource location mechanism effectively addresses the unscalability issue caused by flooding mechanism of unstructured P2P network,new problems emerge and remain unresolved as follows:1) the cost of updating the routing table increases significantly due to the frequent entering and exiting of the nodes;2)Since the information of the node physical location is lost during the hashing process,the physical distance between two neighboring nodes in the overlay network could be very large, thereby causing long query delay in structured P2P network.3)When the churn rate is high,some structured P2P networks’s preference decrease sharply.4) the network only supports searches of precise key-value matching but does not support multi-key-words searches.Prior researches on structured P2P network mainly focus on the improvement of DHT protocol in order to solve a specific problem,There are little to resolution based on performance analysis.The method has two weaknesses:At first,the improvement is aimed at a specific DHT protocol e.g.Tapestry,and may not be generalized.The second,the method fails to take a balanced approach,e.g.while the constant-degree DHT algorithm reduces the query hops,its poor ability of handling the abnormal departure of the nodes significantly decreases the fault-tolerance capability of the system.Therefore, to address the problems,the algorithm should be improved based on insights from a detailed analysis of typical structured P2P network focusing on the cost of routing table update,query delay and the scalability in a churn environment.With regard to the churn issue,The concept of inverse-neighbourhood nodes is proposed.Which means the node is in their routing table.The number of inverse-neighbourhood nodes for six DHT netword is computed.we find that the most significant factors affecting churn is routing,neighboring nodes selection,bootstrapping,lookup parallely and recovery policy.In any two existing DHT,there are at least two kind of policy different.Therefore,the method that compares the routing tables update costs of two structured P2P network directly cann’t decide which policy could deal with churn,so we proposes a new method of analysis:CSP(A comparation based on single policy),this method analyzed the relationship between routing tables update costs and each single policy.We create F Pastry which uses fast bootstrapping policy and P_Pasty which uses periodic recovery policy.F_Pastry and P_Pastry are compared with the original version Pastry separately.At the same time,I_Chord which uses iterative routing and R_Chord which use recursive routing are compared.It’s concluded that iterative routing,fast bootstrapping,periodly recovery and effective neighboring nodes selection algorithm could decrease the consts of updating routing table in high churn.The mechanisms of storing data,routing policy,neighboring nodes selection and sever-selection is the key factor which affect the query delay of tructured peer-to-peer networkBased on the analysis of the query delay and the weakness of the algorithm aimed at reducing the delay,the paper proposes a DCAN algorithm in order to reduce delay between the nodes in the Content-Addressable Network(CAN).This algorithm abstracts nodes of CAN network into a undirected weighted graph.By dijkstra algorithm,the shortest path from source node to destination node is obtained as the query routing.Simulation experiments suggest that this algorithm can reduce the delay between the source nodes and the target nodes in CAN effectively.In order to address the scalability problem of the P2P network underthe condition of churn,a low-bandwidth evaluation(LBE) method is proposed.Which use the variation of delay for success query and percentage for failed query as the standard to evaluate scalability.A series of delay times are getted by changing DHT parameters randomly,The lowest line could be formed by so much points as the overall performance of DHT network.By fixing some parameter and changing other parameters,the optimal design parameter value could be obstained to get the best DHT network performance.We analyze the scalability of three DHT protocols from two respects:overall preference of the network and the change of optimal design parameters relative to the network scale,and get corresponding conclution.Based on the analysis of the weakness of current multi-key-word search algorithm, the paper proposes an ICCCS(Improved Cube-Connected Cycle) search algorithm, which establishes a logical key-word search layer on top of the DHT protocol layer. The search layer is structured using the Improved Cube-Connected Cycle.The top keywords for each object is choosed by Vector Space Model and is mapped to the cyclic index of ICCC nodes.The whole keyword set for each object is mapped to the cubic index of ICCC nodes.The index scheme is build on Inverse Document Index and the search algorithm is created on Spanning Binomial Tree.Experiments suggest that the algorithm results in higher search precision and lower query delay for searches with fewer key words.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络