节点文献

基于GPU的查询技术并行化研究

The Parallelization Research of Query Technology on GPU

【作者】 黄玉龙

【导师】 奚建清;

【作者基本信息】 华南理工大学 , 计算机应用技术, 2013, 博士

【摘要】 作为数据管理领域一类非常重要的技术,数据库技术在很多行业中得到了广泛应用。其中,查询技术是数据库技术的核心部分。当前,查询技术领域存在如何加快索引更新以及如何设计满足查询任务的高效查询算法这两个研究方向。因此,利用GPU等新兴并行硬件改善索引更新效率以及大规模数据的查询性能受到了越来越多的关注。在索引更新过程中,插入和删除是两个基本并且互逆操作。与此同时,随着互联网应用以及个性化服务的兴起,XML查询以及top-k查询也逐渐成为研究热点。为此,在总结以及分析相关工作的基础上,本文主要研究了以下内容:总结了现有的索引构建及插入、大规模数据查询算法。结合本文的工作内容,详细阐述了GPU通用计算技术及其主流开发工具—CUDA。在此基础上,进一步介绍和分析了本文使用的并行操作原语。针对当前线性哈希表的并发插入需要加锁的现状,结合GPU在排序方面拥有的显著优势,提出了一种无锁批量插入算法GSInsert4LHT以及相应的存储结构。在批量插入过程中,该算法采用一个线程插入一条记录策略,从而极大地提升了线性哈希表的插入性能。在不同参数设置条件下,与4线程多核算法相比,GSInsert4LHT算法的性能提升了10.26~13.54倍。在此基础上,进一步阐述了如何利用该算法来加快Wetoband系统的资源操作权限判断过程。为了优化B+树的构建及插入性能,首先设计了一种键值空间分配策略,在此基础上进一步给出了基于GPU的批量构建及插入算法CUBPT。实验表明,与4线程批量建树算法相比,CUBPT算法的建树性能提升了21.76~25.1倍。此外,由于CUBPT算法在键值的批量插入过程中负载非常均衡,因此该算法在此阶段的性能比4线程PALM算法提升了9.49~19.2倍。由于性能优势明显,在此还给出了如何将该算法集成到Wetoband系统,从而提升资源对象的查找效率。研究了基于GPU的XPath并行查询算法。首先分析了当前XPath并行查询算法M2以及GPU-Twig等在存储空间以及线程资源利用率方面的不足。在此基础上,扩展了流标签存储机制中节点的Zhang编码结构,提出了一种基于关系数组的大规模XPath查询算法GXQ。在构建关系数组时,GXQ算法首先计算出与每个关系节点关联的第一个以及第二个XML节点下标。在此基础上,快速构建关系数组。在并行查询过程中,采用对每个步骤的返回结果标记策略,从而减少了空间分配代价以及去重代价。通过实验比较发现,在不考虑M2算法中关系矩阵构建时间的情况下,与4线程M~2算法相比,GXQ算法的性能加速比最高可达16.1倍。研究快速top-k查询算法。针对当前并行top-k查询算法存储空间利用率差以及不具通用性等方面的劣势,结合经典的NRA以及LARA算法,设计了一种基于GPU的快速查询算法CNRA。与传统算法一次处理一个有序列表层次不同, CNRA算法在增长以及收缩阶段均采用一次处理STRIDE个层次的策略,从而显著减少了每个对象的分值上下界更新时间以及top-k对象的判断比较时间。实验结果表明算法的性能有了明显提升,最高可达107.6倍。然而当STRIDE值增大时,算法的性能提升效果反而有所下降,这与算法在更新分值下界过程中使用的策略有关。

【Abstract】 As a very important technology in database management field, database technology iswidely used in a lot of industries. In here, query technology is the core component. Currently,the query technology field exist two main research directions. One is to find the fast indexupdate algorithm.The other is to design an efficient query algorithm which fits the query task.For this reason, how to leverage the emerging parallel hardware such as GPUs to improve theupdate efficiency of index structure and the query performance of large-scale data set hasdrawn more and more attentions. In the update process of index structure, insertion anddeletion are two basic operations and both are opposite. At the same time, with the widelyapplication of the Internet and recommendation service, XML query and top-k query havegradually become two research hotspots. So, On the basis of summary and analysis of relatedwork, the main content of this paper as follows:Summarizes the existing build and insertion method of index structure and the querymethod of large-scale data set. According to the research work of this paper, the GPUgeneral computing technology and its mainstream development tool--CUDA are describedin detail. On this basis, the parallel primitives which used in this paper have been furtherintroduction and analysis;For the concurrent insertion of current linear hash table need to lock, a lock-free batchrecords insertion algorithm--GSInsert4LHT and its data structure are proposed in thispaper which combined with the significant advantage in GPU sorting. By using thestrategy in which one thread is responsible for one record insertion, this algorithm cangreatly enhance the insertion performance of linear hash table. Under the condition ofdifferent parameter settings, the performance of this algorithm is improved by10.26~13.54times compared with four threads batch algorithm.On this basis,how to usethis algorithm to improve the judgement efficiency of resource operation authority of“Wetoband” system is proposed in detail.For improving the build and insertion performance of B+tree, a space allocation strategyis introduced firstly. On this basis, a batch build and insertion algorithm--CUBPT isfurther proposed. The experimental results show that the build performance of CUBPT is improved by21.76~25.1times compared with4threads algorithm. Furthermore, In theprocess of batch keys insertion, the performance of CUBPT is improved by9.49~19.2times compared with four threads PALM; this is because the workload of CUBPT is verybalanced.Because of the performance advantages, how to use this algorithm to improvethe query efficiency of resources in “Wetoband” system is described in detail.In here, the parallel XPathquery method on GPU architecture has been researched. Firstly,the deficiencies of two parallel XPath query methods M2and GPU-Twig in storage andthread use ratio is analyzed. On this basis, the Zhang coding structure of the nodes instream labels storage mechanism have been expanded and an large-scale XPath querymethod GXQ based on relation array is further proposed. In the stage of building relationarray, GXQ firstly calculate the first and second node indexes which associated with eachrelation node. And on this basis, quickly build the relation array. In the stage of parallelquery, GXQ mark the results of each execution step, so the cost of storage allocation anddeleting duplicate nodes is greatly reduced. The experimental results show that withoutconsidering the building time of relation matrix, the performance speedup of GXQ is alsoup to16.1times compared with four threads M2method.According to the deficiencies ofcurrent parallel top-k query methods in storage utilizationand generality, a fast top-k query algorithm on GPU architecture CNRA is proposed whichbased on the combination of NRA and LARA algorithm. Different with the traditionalalgorithms which process a data level of the ordered lists in one time, CNRA processSTRIDE data levels in one time both in growth and shrinkage stage, So the upper boundand lower bound updating time of aggregation scores and the comparing time of top-kcandidate objects are greatly reduced. The experimental results show that the performanceof CNRA is greatly improved. But when the value of STRIDE is larger, the performanceof this algorithm has declined. This is related to the updating score lower bound strategyused in this algorithm.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络