节点文献

DHT覆盖网若干基础性问题研究

Research on Several Fundamental Problems of DHT Overlay

【作者】 聂晓文

【导师】 卢显良;

【作者基本信息】 电子科技大学 , 计算机系统结构, 2009, 博士

【摘要】 分布式哈希表(DHT)是一种新的P2P网络组网方式,具有高效、易扩展、低成本等优点,广泛应用于各种大规模分布式系统,如文件共享、内容分发、流媒体、VoIP等。在这些系统中,由DHT形成的覆盖网扮演着重要角色,支撑着上层各种应用。因此,它引起了研究人员的广泛关注,已成为当前的研究热点。论文对DHT覆盖网在应用中亟待解决的一些基础性问题进行了深入的研究,从分析DHT网络的基本模型入手,研究了DHT网络的管理范围(Zone)均衡问题,深入探讨了如何在DHT网络中防御Sybil攻击,并对网络规模估计问题进行了建模分析,提出了高效的随机节点选择算法。论文的创新点及其贡献在于:1.研究了线段随机分割问题,得出两个基本结论:在Chord网络中,节点间距近似服从指数分布,一段地址空间上出现的节点个数近似服从泊松分布。理论分析和仿真数据表明,这两个结论的近似程度非常高,误差很小。这两个结论不仅适用于Chord网络,还适用于所有满足节点ID在网络中均匀分布假设的DHT网络。它们是全文分析与建模工作的基础。2.提出基于静态副本Zone均衡策略,推导出Chord和Pastry网络、虚拟服务器(VS)和基于静态副本的Zone均衡策略下节点的负载分布。分析表明:Chord、Pastry和VS的Zone负载分布服从参数形式相似的伽马分布;在Chord网络中k个后继节点上放置副本,节点Zone负载服从参数为(k,n)的伽马分布。与VS和基于平衡树的Zone均衡策略相比,基于静态副本的均衡策略除了使节点Zone负载均衡外,还具有使系统更鲁棒的优势。3.提出一种ID自检验的安全框架(ISV),并结合洗牌策略高效抵御Sybil攻击(ICS)。ISV引入显式证书分发服务器(CD)对ID申请进行审计,分发节点签名;而身份验证工作由节点根据签名自行完成;有效降低了CD服务器的开销。ICS利用CD签发的票据记录节点加入过程,保证三轮替换规则强制实施;利用票据的替换区间和发布时戳来判定ID是否过期,防止敌手积累ID。论文对节点需要保存的票据数量进行了定量分析,得出问题的近似闭合解;理论分析表明平均每个节点上保存的票据数是O(log~2n);仿真数据表明,该近似解具有很高的精度,说明了理论分析的正确性。4.指出在DHT网络中估计网络规模等同于指数分布或泊松分布的参数估计问题,提出基于平均间距(AIE)和基于节点密度(NDE)两种网络规模估计算法。论文推导出AIE和NDE算法中网络规模估计值的概率分布,讨论了如何选择NDE测量范围和测量位置等问题。分析表明:AIE的测量精度只与测量间距个数相关,与网络规模本身无关,具有自适应网络规模变化的特点。仿真实验证明AIE算法的测量误差完全符合理论分析结果。5.提出一种基于取舍原则的DHT网络随机节点抽样算法(RPS),分析了单点启发式算法(HUR)和多点启发式算法(HURk)的抽样概率以及抽样间距的概率分布,构造了一种服从倒数分布的取舍算法(RDRP)。分析表明:RPS以等概率抽样在线节点,抽样间距服从指数分布;RPS的时间复杂度与网络规模无关,其复杂度不随网络规模的增加而增加;网络规模估计误差给RPS造成的影响与网络规模本身无关;当网络规模估计值偏小时,会在一定程度上削弱RPS的随机性。论文采用统计检验和经验检验两类方法对相关分析做出了严格检验,结果表明RPS对于网络规模估计误差具有很好的容忍性,同时也证实RPS是一种高效的随机节点抽样算法。

【Abstract】 Distributed hash table (DHT) is a new Peer-to-Peer (P2P) networking pattern,which has many advantages, such as efficiency, scalability and low cost, and has beenwidely deployed in various large-scaled distributed systems, ranging from file sharing,contents distribution, streaming media to VoIP. In such systems, the overlay formed withDHT plays an important role in supporting the upper layer applications. As a result, ithas drawn the attention of many researchers, and becomes the current research hotspot.This dissertation mainly focuses on some fundamental problems to be solved in theapplication of DHT. Starting from analyzing the basic model of DHT network, thedissertation investigates the zone balance in DHT overlay, explores how to resist theSybil attacks deeply, models the problem of estimating network size, and proposes anefficient random nodes sampling algorithm of DHT overlay.The major contributions of this dissertation are as below:1. The dissertation investigates the problem of dividing a line randomly, anddeduces two basic conclusions: in Chord network, the interval between nodes obeys theexponential distribution asymptotically, and the number of nodes appearing in a sectionof address space obeys the Poisson distribution asymptotically. The analysis andsimulation shows that the asymptotic accuracy of these two basic conclusions is veryhigh, and the error is very little. The conclusions are applicable not only to Chord, butalso to other types of DHT which satisfies the assumed condition that the identifiers aredistributed uniformly in the address space. They are the basis of analysis and modelingof the total dissertation.2. The dissertation proposes the zone balancing scheme based on static replica, anddeduces the zone load distribution of nodes in Chord and Pastry network, the virtualservers (VS) and the static-replica-based balancing scheme. The analysis shows that thezone load distribution of Chord, Pastry and VS obeys the Gamma distribution withsimilar type of parameters, and if putting replica on k succeeding nodes in Chord, thezone load distribution obeys the Gamma distribution with parameter (k, n). Compared with VS and the trie-based zone balancing scheme, the static-replica-based scheme cannot only achieve the balance of zone load, but also make the system robuster.3. The dissertation presents an identity self-verified secure framework (ISV), whichis combined with Cards-shuffling scheme (ICS) to resist Sybil attacks efficiently. Anexplicit certificate distributor (CD) is introduced into ISV to audit the requiring for newidentifiers, and distribute signatures to nodes, whereas the verifying of identity iscarried out by nodes themselves, which reduces the overhead of CD server greatly. ICSmakes use of the tickets signed by CD to record the joining process of nodes toguarantee the compulsive execution of k-rotation rule. To prevent the enemy fromaccumulating identifiers, ICS utilizes the substituting sections and publish timestampsto determine whether the identifier is overdue. The dissertation analyzes quantitativelythe number of tickets stored on the nodes, and gives the asymptotic closed-formsolution of the problem. The analysis shows that the average number of tickets stored oneach node is O(log~2 n), and the simulation demonstrates that the asymptotic solution isvery accurate, which illustrating the correctness of the analysis.4. The dissertation claims that the problem to estimate the network size is identicalto the parameter estimation of the exponential or Poisson distribution in DHT, andproposes the average-interval-based estimator (AIE) and the node-density-basedestimator (NDE). The dissertation deduces the probabilistic distribution of the networksize estimation in AIE and NDE, and discusses how to select the testing scope andposition of NDE. The analysis shows that the estimating accuracy of AIE is only relatedto the number of intervals measured, and is irrelevant to the network size. This showsthat AIE can adapt the variation of network size. The simulation proves that the measureerror is in line with the theoretical analysis entirely.5. The dissertation proposes a random node sampling scheme based on rejectionprinciple (RPS). The probabilistic distributions of the sampled probability and sampledinterval are analyzed against the single point heuristic sampling algorithm (HUR) andthe multi-points heuristic sampling algorithm (HURk). And a rejection algorithmobeying the reciprocal distribution (RDRP) is constructed. The analysis shows that RPSsamples online nodes with equal probability and the interval between nodes sampledobeys the exponential distribution; its complexity remains constant while the networksize increases; the impact of estimating error of network size on RPS is irrelevant to the network size itself; if the estimation is less than accurate network size, RPS will loserandomness to a certain degree. Finally, the dissertation utilizes the statistical testingand empirical testing methods to verify the analysis strictly. The result shows that RPSis tolerant to the estimating error of network size very much. This proves that RPS is anefficient sampling algorithm.

节点文献中: