节点文献

分布式数据库查询优化技术

【作者】 张旭中

【导师】 刘心松;

【作者基本信息】 电子科技大学 , 计算机系统结构, 2003, 硕士

【摘要】 计算机网络的发展和信息的共享,使得分布式数据库的发展成为必然和热点。人们对数据存储和检索的高可靠性和高速度性,要求越来越高,传统数据库的局限已经暴露得越来越明显。因此分布式数据库便迎合了这一需求。 在分布式数据库中,由于高可靠性和高速度性是其重要特点,所以对查询执行的要求也就更高。而查询执行中查询优化是执行的关键环节,查询优化在很大程度上决定查询的效率或快慢,因此查询优化技术一直是许多数据库专家学者研究的重要课题。传统的数据库查询优化主要是从查询的底层执行流程和实现技术出发,通过关系代数的手段进行理论上的探讨,而且大量研究集中在查询执行的语法分析阶段。其核心思想是查询编译器利用元数据和关于数据的统计数据来确定哪一个操作序列可能是最快的。例如,从物理查询计划的底层磁盘输入输出到语法分析阶段的语法分析树、用于改进查询计划的代数定律、逻辑查询计划的改进,以及操作代价的估计、基于代价的计划和连接顺序的选择等全过程,都进行了不懈的努力。在这方面的研究已经非常成熟。 但是,对分布式数据库的查询优化还很不成熟,这不仅因为分布式数据库技术目前发展还不完善,还因为分布式数据库本身的复杂性,它涉及的因素多且变化多端。存在于网络环境的分布式数据库系统,节点之间的通信代价和分布式计算处理,成为不可回避的重要内容。本文讨论的分布式数据库优化仅从上层入手,并假定下层的优化工作已经完善,即在分布式的全局处理层,重点是对分布式查询执行的全局处理策略进行优化,尽可能避免通信代价的开销,并着眼于查询执行的实际代价,从分布式系统中选出一个最优的执行节点。它从查询执行的效果出发,通过统计的方式,不断从最近的查询执行代价学习纠正最近查询执行的统计代价,为查询的全局处理提供参考,以达到优化执行、提高执行效率和速度的目的。 全文分为六章:第一章对分布式数据库做总体概述,第二章回顾了数据库优化技术的发展,第三章介绍了本文基于的数据库系统模型DPSQL,第四章在分析分布式数据库查询执行的基础上对基于局域网的分布式系统进行了优化研究,第五章是对第四章讨论的优化的实现,第六章对优化系统做了性能上的分析和探讨,最后的结语总结了全文。

【Abstract】 Databases are used widely in many fields. Because Centralized Database has many innate disadvantages when applied to Internet, the application of Distributed Database gets more and more popular. With the development of computer networks and information technology, Distributed Database Systems has become one of the research hotspots of computer science.However, there are still many challenging problems in this field that attract many researchers. It is well known that the performance of a database relies heavily on the efficiency of query execution. To obtain efficient query execution, optimization is the most important step. Many researches have carried on research on this subject by going deep into the bottom of Query Engine. Many mature technologies on this level have been brought out, such as the relational algebra law, the improved logical query plan, the cost estimation of operation, the selective plan based on cost and order of joint, etc. Although many methods have been tried out, no remarkable result or noteworthy technology has come to reality because of complexity of data decomposition and network effects.This article concentrates on how to optimize the global query at an upper level: database-level distribution. Based on statistical methods, the optimizing algorithm try to find a light-loaded server that can process the query with less cost. In fact, it uses the historical records of previous execution. Then, according to some algorithm, the optimizing processor can determine which node among the system is the best to execute this query. The whole system is based on MYSQL, an open source database, which is widely used in Internet application.The balance of this paper is organized as following: the first chapter reviews the progress of the Distributed Database. The second chapter discusses the conventional query optimizing technologies. In third chapter, we introduce the prototype of DP-SQL and its major features. The forth chapter discusses the optimization of Distributed Database in details. Then, in the next chapter, the implementation details are discussed. The last chapter analyzes the algorithm performance, and then draws the conclusion.

  • 【分类号】TP311.13
  • 【被引频次】17
  • 【下载频次】1266
节点文献中: 

本文链接的文献网络图示:

本文的引文网络