节点文献

分布式关系数据库上的关键字查询

Keyword Search on Distributed Relational Databases

【作者】 李一罡

【导师】 高宏;

【作者基本信息】 哈尔滨工业大学 , 计算机科学与技术, 2009, 硕士

【摘要】 在信息飞速增长的时代,分布式数据库成为大型企业存储信息的首选方式,方便快速的查询关系数据成为一个科研难题。随着网络技术和搜索技术的兴起,关键字查询与传统的SQL查询相比,显示出巨大的优势。首先用户不需要知道数据库的模式信息;其次用户不需要掌握复杂的数据库查询语言,如SQL等。如何将关键字查询技术运用到分布式数据库上就变得格外重要。本文主要研究分布式数据库上的关键字查询问题。本文首先提出单数据库上的关键字查询算法。该算法首先给出了一种新的相关性评价函数,新的评价函数重新定义了元组对关键字的包含关系,通过分析数据库模式与查询内容的语义信息来评价元组与查询关键字的相关性;接着基于新的评价函数,提出基于数据块迭代的TOP-K查询算法,该算法通过对未产生结果分值的估计有效的降低了算法的IO时间。本文接着在单数据库查询算法的基础上,提出了分布式数据库上的关键字查询算法。该算法首先给出了分布式数据库的数据模型,该模型之上关键字查询的结果定义以及适应于分布式环境的结果评价函数;接着提出扩展的连接表达式生成算法;为了降低分布式环境下查询的执行代价,设计了过滤无效查询的可达性索引以及索引的更新策略;最后给出了分布式环境下的TOP-K查询算法。基于以上提出的算法,设计并实现了真实分布式数据库环境下的关键字查询系统。该系统可以有效的支持单节点以及多节点上的查询。在该系统下,我们从多个角度设计了实验内容,实验结果表明本文算法在精确性和高效性都有所提高。

【Abstract】 For the information overload problem is acute day by day, distributed database become the first choice for large enterprises. How to query the relational data from it in a convenient and efficient way turns to be an important research problem in the database area. With the popularization of networking and searching technique, keyword search shows great advantage comparing with the traditional SQL query. By using keyword search, the users don’t need to know the schema of the database, neither to learn the complex query language such as SQL. Therefore, it is an essential job to apply keyword search onto the distributed databases. This paper mainly focuses on the keyword search problem on distributed relational databases.This paper first gives the semantic keyword search algorithm on single relational database. The algorithm adopts a new correlation ranking method which redefines containing relationship of the keywords. It evaluates the correlation between data tuple and query keyword by analyzing the semantic information of database schema and querying content. According the ranking function, we propose a top-k query algorithm based on iterating data block. This algorithm reduces execute time by estimating the score of candidate result.Based on the single database search algorithm, we then propose the keyword search algorithm on distributed relational databases. First we define the data model of the distributed relational database, the keyword search result and the result style. Then we give the ranking method of searching result. On the schema graph, we propose expending connection expression generation algorithm. In order to prune the invalid connection expression and reduce the cost of query executing, we design the reachability index and its update strategy. At last, we propose a top-k query processing algorithm in the distributed environment.In the end, by using the proposed algorithm, we design and implement a keyword search system in the real distributed databases environment. This system supports keyword search query on both single database and multi databases. We design a lot of test cases in all kinds of aspects to demonstrate our algorithms effectiveness and efficiency.

【关键词】 分布式数据库关键字top-k
【Key words】 distributed databaseskeyword searchtop-k
节点文献中: