节点文献
P2P网络中Top-k查询算法的设计与实现
Design and Implement of Top-k Query in P2P Network
【作者】 于文婷;
【导师】 李克秋;
【作者基本信息】 大连理工大学 , 计算机系统结构, 2009, 硕士
【摘要】 近年来,随着信息技术的迅猛发展,信息资源极大丰富,如何在动态的P2P网络环境中对海量数据进行查找引起了很大的关注。Top-k查询就是从数量巨大的信息中选择最符合查询条件的k个结果呈现给用户,Top-k查询作为一种新的查询技术引起了学术界广泛的关注,主要包括聚合式top-k查询和非聚合式top-k查询两种。然而现有的聚合式top-k查询算法只按照分值标准选择合适的对象返回给查询节点,相同的阈值标准没有考虑到节点数据分布情况,非聚合式top-k查询算法只能排除非法节点,不能排除有效节点中的非法对象。针对聚合式top-k查询的缺陷,本论文提出了一种基于P2P网络的混合非一致阈值聚合top-k搜索算法HNUTA(hybrid non-uniform threshold algorithm),HNUTA结合位置选择标准和分值选择标准,通过对每个节点重新定义阈值,并且对每个对象估计极大值和极小值,通过比较当前top-k和候选集中对象的极大值,除去候选集中的非法对象,达到减少非法对象传输的效果。针对非聚合式top-k查询,提出了一种依托于超级立方体骨干P2P网络的控制答复数量top-k算法CRNTop-k(control reply number top-k),该算法利用向量空间模型在本地查询top-k,然后在父节点上合并结果,通过控制查询答复数量的方式来减少带宽消耗。最终通过实验评估和性能分析表明本论文提出的算法在网络带宽消耗和查询响应时间方面要优于其他同类方法。
【Abstract】 In recent years, with the rapid development of the information technology, information resources have been greatly enriched. How to retrieve the small amount of the most valuable information from the massive data in P2P network quickly has become a great challenge for the database field. Top-k queries based on ranking elements stop query processing when the top-k ranked results can be safely determined including aggregate top-k query and non-aggregate top-k query. Given an aggregate operation, a top-k query is to find the k objects with the highest (or the lowest) aggregate values. Existing top-k aggregate query algorithm has the uniform threshold and selects the object according to the value standard neglecting the data distribution. The existing non-aggregation top-k algorithm only supports peer pruning search, and users can not search the data based on pruning the illegal object.This paper proposes hybrid non-uniform threshold algorithm in P2P network, it refines the threshold according to the value and position standard. HNUTA can estimate the best and worst value of the object. Compared between the top-k and the best value in the candidate set, it can remove the illegal object from the candidate set. This paper proposes top-k query in P2P network. It dynamically adjusts of the upper score according to the loss function. This method reduces the reply traffic by controlling the number of query replies. In addition, through experimental simulation and theoretical analysis, the proposed algorithm is superior to the current top-k algorithm. It consumes less bandwidth and has better efficiency.
【Key words】 P2P Network; Histogram; Aggregate Top-k; Non-Aggregate Top-k;