节点文献

面向海量数据的副本定位与副本放置技术研究

Research of Replica Location and Replica Placement for Massive Data

【作者】 阮炜

【导师】 王意洁;

【作者基本信息】 国防科学技术大学 , 计算机科学与技术, 2006, 硕士

【摘要】 随着数字化革命和网络技术的飞速发展,Internet上的数据量呈现出指数增长的趋势,大量的数据密集型应用会产生海量的数据,如何有效存储管理Internet上的海量数据,已经成为研究的热点问题之一。在海量数据管理中采用数据复制技术能够减少访问延迟,提高数据可用性,降低结点失效。在研究海量数据及现有的海量数据管理系统的基础上,对海量数据的特点和海量数据管理系统进行了总结和分析。从副本定位和副本放置两个方面对海量数据管理中的数据复制技术进行了深入研究。针对结构化P2P系统中常用的DHT定位技术的重叠网拓扑与底层网络拓扑不一致以及存在网络热点问题等不足之处,提出了基于几何空间划分的层次式DHT定位方法(HLSP)。HLSP采用基于坐标的网络距离预测技术GNP,构建基于地理邻近性的重叠网拓扑;构造基于地理范围的层次式DHT,把数据对象的位置信息存放到网络中具有某种逻辑层次关系的多个结点上;在此基础上对资源进行发布、撤消和查找,采用一种贪婪算法向前转发消息,并对结点加入和离开时导致的局部结点邻居列表变化进行动态维护。模拟实验结果表明,HLSP方法是一种有效的DHT定位方法,基于地理邻近性的重叠网拓扑解决了重叠网拓扑与底层网络拓扑不一致的问题,基于地理范围的层次式DHT能够消除网络热点并且实现高效查找。针对满足延迟限制的副本放置问题,提出了基于延迟估算的动态副本放置算法(DLDRP),DLDRP中副本总是被放置在满足副本请求结点延迟限制范围内的结点上。DLDRP首先设计了一个抽象的几何模型,通过该模型来确定候选副本结点所处的区域,从而将副本放置问题转变成在几何空间中求解满足某些条件限制的区域位置的问题;然后根据新副本结点上的负载来源情况动态调整副本放置:在某个结点上新增副本或者合并多个旧副本。模拟实验结果表明,DLDRP能够满足客户端延迟限制,减少数据访问响应时间,实现副本个数最小化,副本分布均匀且结点负载平衡。

【Abstract】 With the high-speed development of digital revolution and network technology, the scale of data sets in Internet presents a trend of exponential growth. A large number of data-intensive applications often produce massive data, which makes it become a research hot that how to store and manage massive data effectively in Internet. Data replication can reduce access latency, improve data availability and depress node failure rate in massive data management. In this paper, based on the research of massive data and existing massive data management systems, the features are summarized and analyzed, and replica location and replica placement are studied.One of the shortcomings for popular DHT location technologies in structured P2P systems is topological disparity between overlay and underlying networks, another is the problem of network hotspot. A hierarchical DHT location method based on geometrical space partition called HLSP is proposed according to that. HLSP utilizes Global Network Positioning (GNP) to construct overlay networks on the basis of "geographical proximity". HLSP designs hierarchical DHT based on geographical scope so that location information of data objects can be distributed on several nodes which meet some logical hierarchy in network. HLSP offers several operations to data objects, such as object publish, withdraw and query operation. The actual packet forwarding in HLSP is based on a greedy algorithm guided by a destination coordinate stamped in the packet header. HLSP supports dynamic maintenance of local node-neighbor lists altered with node joining or leaving. The simulation results show that HLSP is an effective DHT location method, and overlay topology based on geographical proximity can resolve topological disparity between overlay and underlying networks, and hierarchical DHT based on geographical scope can avoid network hotspots and achieve high efficient searches.When it comes to replica placement problem under latency constraints, a dynamic latency driven replica placement algorithm called DLDRP is proposed, in which replicas are always placed on the nodes that are under latency constraints. In DLDRP, it first gives an abstract geometric model, following which the regions of candidate replicas are determined. In this way, the problem is transformed into how to get the region that meets some restrictive conditions in geometric space. Afterwards, DLDRP adjusts the placements of replicas dynamically according to the load source of new replica, that is, a new replica is added in a certain node or several existing replicas are merged. The simulation results show that DLDRP can meet the delay bound of client effectively, reduce access latency, minimize the number of replicas and achieve a uniform distribution of replicas and load balance of nodes.

  • 【分类号】TP393.02
  • 【被引频次】4
  • 【下载频次】244
节点文献中: 

本文链接的文献网络图示:

本文的引文网络