

Research on Query Optimization and Correlative Technologies in XML Database

【作者】 孙伟

【导师】 刘大昕;

【作者基本信息】 哈尔滨工程大学 , 计算机应用技术, 2006, 博士

【摘要】 随着互联网的快速发展,出现了大量的Web数据。这些Web数据多以XML文档形式出现。如何有效存储、处理XML文档和从大量XML文档中检索有用信息,已成为数据库研究领域的一个重要研究课题。本文的研究工作主要围绕XML数据库的查询优化技术展开,重点研究基于模式信息和语义信息的XML查询优化技术。 针对XML文档的模式信息不精确的特点,提出了基于模糊决策树的XML模式抽取方法。分析了已有XML模式抽取算法的缺点和存在的问题,提出抽取XML近似模式的方法。用一阶Datalog作为XML的表示,用自增量的聚类算法将各实例对象聚类得到模式中的对象,建立模糊决策树来确定模式中各对象的近似模式,解决了模式抽取中的多边和缺边问题。 提出了基于粗糙集的XML数据依赖关系的发现方法。重点讨论了函数依赖和多值依赖关系。给出了XML函数依赖和XML多值依赖的定义。还给出了基于粗糙集的XML函数依赖和XML多值依赖关系的判定定理。基于判定定理,提出了XML函数依赖和XML多值依赖的发现算法。 提出了基于DTD的正则路径表达式查询优化方法。给出了扩展正则表达式的定义,实现对DTD模式树的简化。给出了交结点的定义,提出简路径和补路径两个查询优化策略。该方法通过对XML查询语句的重写,实现对XML语言级上的查询优化。 提出了一种基于树代数的XML代数系统及其逻辑优化策略。给出了一种XML代数的描述,以模式树作为操作对象,定义了操作范围和三类操作符。针对XML查询处理及优化的问题,提出五种XML查询优化策略。针对XML压缩数据库,引入新的解压操作符,将ETA代数扩展到XML压缩数据库上。 提出了基于访问控制的XML查询优化方法。给出了一种高安全的XML访问控制模型,该模型基于访问控制视图,可解决隐推理和结构信息隐藏问题。利用XML访问控制视图实现对XML查询的重写,主要采用剪枝技术,实现XML语言级上的查询优化。

【Abstract】 With the rapid development of the Internet, a large number of Web data emerges on the Internet. The Web data formats as XML documents. It becomes an important research topic of database, how to store effectively and process large XML documents, and how to retrieve information from them. The research work in this thesis revolved around query optimization techniques on XML database, focusing on the research of the XML query optimization technology based on schema and semantic information.For the characters of inaccurate schema, a method of schema mining of XML based on fuzzy decision trees is proposed. Based on analysis of the problems and defects of existing methods of schema mining, the concept of approximate schema is proposed. XML documents are expressed on a monadic Datalog program. Using an incremental clustering method, the approximate schema is constructed by clustering objects with similar incoming and outgoing edge patterns. The perfect schema of the classified objects based on fuzzy decision tree is obtained. It can overcome the defects of the schema mining including two patterns of excess and deficit.A method of discovering data dependency of XML based on rough sets is proposed. Data dependency is an important concept in database research, included of functional dependency and multivalued dependency. The notions of functional dependency and multivalued dependency in XML are given. The determinant theorems on XML functional dependency and XML multivalued dependency based on indiscernibility relation of rough sets are given. Based on these theorems, the algorithms of discovering data dependency are proposed.An algorithm of query optimization of regular path expression based on DTD is proposed. The concept of extended regular path expression is defined, and is used to reduce the DTD. The concept of entrance-node is defined. Based on the entrance-node notion, two kinds of regular path expression optimizing principles are proposed, named path shorten and path complementing. Using these two kinds

  • 【分类号】TP311.13
  • 【被引频次】14
  • 【下载频次】1507
  • 攻读期成果

