节点文献

移动计算环境下非确定数据的索引与查询方法研究

Research on the Methods of Uncertainty Data Indexing and Querying in Mobile Environments

【作者】 丁晓锋

【导师】 卢炎生;

【作者基本信息】 华中科技大学 , 计算机软件与理论, 2008, 博士

【摘要】 在移动对象数据库的研究中,如何建模、索引并查询移动对象的位置信息是一个很重要的问题,学者们对此进行了大量研究,并提出了许多空间对象和时空对象的索引方法。但是传统的方法并不能支持此类数据的一个重要应用特性——非确定性,如何对非确定性数据进行查询处理成了数据库研究人员关注的焦点。在移动计算环境中,受限于测量设备误差,数据更新延迟,取样失真等因素,中心数据库很难全程记录移动对象的准确位置信息,鉴于传统的移动对象索引方法和查询处理技术均假设数据库中的数据是精确的,因此这些技术不能直接应用到非确定数据的索引和查询,或者效率极低,这就给研究者提出了新的问题和挑战。U树是具有代表性的一种非确定对象空间索引方法,它是在R~*树基础之上融合了数据非确定性的变种。U树固有的良好动态结构,这使它不仅可以支持非确定数据对象以任何次序更新或插入,而且其提出的域查询处理算法不局限于非确定数据本身的概率密度分布函数。但U树本身只是针对非确定静止对象的索引,并不能支持非确定移动对象的索引或执行效率甚微。基于移动计算环境,针对如何支持频繁位置更新的非确定移动对象当前及未来位置索引的问题,在基本U树结构上增加了记录移动对象非确定状态特征的数据结构,通过利用概率密度分布函数描述移动对象在非确定区域的位置分布,在保留原有位置记录的情况下加入时间特性,这样就可以预测移动对象在未来时间段内的大概位置信息,从而为当前及未来非确定位置信息检索提供可靠保证。在TPU树索引基础上,一种改进的基于p-bound的域查询处理算法MP_BBRQ,利用索引中记录的概率限定性区域和启发式判定准则,能高效的处理概率性域查询问题;一种基于分支定界思想的概率skyline查询处理算法B~2CPS,利用最近邻的最好优先遍历思想,能使查询处理的磁盘开销达到最优。相对于传统的空间查询概念,非确定数据库中常用的查询——概率空间查询,由于其结果集中增加了结果正确性的保证系数,从而使得基于非确定数据的查询更具有可信性。鉴于概率空间查询具有很高的计算代价,需要进一步提高概率空间查询效率,尤其是旨在减少CPU计算时间和磁盘I/O次数的概率κ最近邻查询算法,国内外目前尚无相关研究。概率κ最近邻查询κ-PNN就是返回κ个非确定对象,而且这κ个非确定对象分别作为查询对象Q的第κ个最近邻居的概率,相对于其它非确定对象是最大的。与传统的κ最近邻查询相比,计算非确定对象的κ最近邻概率值需要原始的积分运算或Monte-Carlo方法,在这种情况下,概率κ最近邻查询的计算代价是非常高的。因此,必须尽量缩小查询所需搜索的空间范围,从而在不影响返回正确结果集的情况下,进一步减少所要考虑的非确定对象,在很大程度上避免利用原始计算公式来返回结果集。高效处理κ-PNN查询框架包含四个步骤:R树索引的建立,空间裁剪,概率裁剪和精炼阶段。利用安全可靠的空间裁剪以及概率裁剪方法,把这些方法与R树索引进行完美的结合以减少查询的搜索空间,从而提高κ-PNN查询的处理效率。实验结果表明,空间裁剪和概率裁剪方法具有非常高的裁剪效率,整个κ-PNN查询处理过程是可靠高效的。非确定数据库需要研究的问题还很多,在许多查询问题上欠缺高效的处理算法,例如概率连接问题,概率Top-κ查询,概率反最近邻查询,以及概率反轮廓查询等等,因此,针对不同方面的概率空间和时空查询问题,将相应解决方法融入到数据库管理系统中以支持非确定数据的有效管理,将是研究人员面临的新问题。此外,将已有的研究成果应用于多维空间,并进一步考虑非欧式距离环境下的索引及查询处理方法可作为未来的研究方向。在非确定数据流环境下,如何利用有效的内存索引机制,提出各种高效的数据流查询算法也将是非常有潜力的研究课题。

【Abstract】 One of the most important research issues in MOD is the problem of modeling andquerying the location of moving objects. Much work exists on traditional spatial andspatio-temporal queries, but there has recently attracted much research interest inevaluating such queries in the presence of uncertainty. This is reasonable, due to locationupdate delay, sampling error, or limitation of measuring equipments, it is rarely possiblefor the database to record the exact positions of moving objects at all times. Theconventional objects indexing methods and query processing techniques assume that thedata are precise, so they cannot be directly applied for uncertain data processing. As aresult, in the context of uncertain database, efficient indexing methods and correspondingquery processing techniques should be developed to guarantee the accuracy of answer andspeed up the query performance.U-tree is currently the most popular indexing method for the imprecise positions ofstationary objects. It minimizes the amount of qualification probability computation inrange search and it’s fully dynamic for the objects can be inserted or deleted in any order.However, it can not deal with the moving objects with uncertainty.In the mobile environment, a spatio-temporal indexing structure TPU-tree, which is anovel U-tree based indexing technique, can combine the issues of timestamp to predict thefuture location of uncertain evolving data. Along with the data models capturing thespatial-temporal uncertainty and TPU-tree, a modified p-bound based range queryalgorithms (MP_BBRQ), using probabilistic pruning regions and corresponding decisionheuristic rules, can speed up the probabilistic range queries; and a branch and boundalgorithm for constrained probabilistic skyline (B~2CPS), using the best-first searchstrategy of nearest neighbor, can minimum the node access numbers. New probabilistic spatial queries are more expensive to compute. The disk I/O- andcomputationally-efficient algorithms should be explored to enhance the performance ofprobabilistic range queries, probabilistic k nearest-neighbor queries. Specifically, aprobabilistic k nearest neighbor query returns k objects with the highest probability ofbeing the k-th nearest neighbor to a given query point Q. Comparedto the k-NN query intraditional databases, which retrieves k closest objects to the given query point, thecomputation cost of k-PNN is very expensive due to the costly numerical integration orMonte-Carlo approach it adopts. In order to speed up the k-PNN query processing,efficient spatial and probabilistic pruning approaches are used to reduce the search space,thus the costly numerical evaluation of complex integrals can be avoided as much aspossible. The Spatial and probabilistic pruning approaches can be seamless integrated inthe query procedure. Extensive experiments have been conducted to demonstrate theefficiency and effectiveness of algorithms under various parameter settings.There are many research issues exist in the uncertain database. It would be interestingto apply new probabilistic queries techniques to solve new problems are explored, such asprobabilistic joins, probabilistic Top-k, probabilistic reverse nearest neighbors, andprobabilistic reverse skyline queries. For these kinds of probabilistic queries, how tointegrate these processing methods to DMBS will be new research issues. Furthermore,extend the current works to handle high dimensional data, and consider non-Euclideandistance could be another interesting directions. In the uncertain streaming environment,how to develop efficient memory indexing method, and speed up kind of streamingqueries will also be very potential research problems.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络