节点文献

XML数据智能管理若干关键技术研究

Research on Some Key Techniques of XML Data Intelligence Management

【作者】 刘波

【导师】 杨路明;

【作者基本信息】 中南大学 , 计算机应用技术, 2008, 博士

【摘要】 随着海量XML数据的涌现与传递,XML已成为互联网上信息表示和数据交换的一个重要标准,继而产生了对XML数据管理的需求,如何有效地表示、查询与挖掘这些XML数据已经成为当前XML数据管理领域遇到的一个重要挑战。针对目前XML数据管理研究现状中存在的问题与不足,本文研究了XML数据模型、群体智能、模式识别、神经网络、数据挖掘与智能计算等原理与方法,在原型系统XBASE上提出了一系列基于XML键的数据清洗、查询、数据挖掘等新的智能管理方法,同时探讨了XML重构的有效途径等问题。本文围绕XML数据的查询与数据挖掘等智能管理问题展开研究,研究内容和取得的成果主要体现在以下四方面:1.XML数据管理框架—XPDM的建立现有的XML数据模型存在着四个问题影响了XML数据的有效管理,即:(1)数据的异构:给多数据源集成带来许多困难,影响了信息查询的有效性;(2)数据的非一致性:由于数据约束的不完整性,常导致数据前后不一致,影响数据查询的准确性;(3)多数据源之间数据依赖关系的不确定性:影响数据之间的归并与查询;(4)语义标准的规范:由于XML正处于发展之中,许多规范还不完善,往往导致了查询语句的繁琐与混乱。针对以上问题,本文提出了一种以XML键构建的向量空间模型为基础、利用概率理论进行操作的海量XML数据管理框架—XPDM。该框架通过对XQuery 1.0和XPath 2.0数据模型XDM进行语义规范新扩充及XML数据矢量转换,较好地解决了以上四个问题。2.数据智能清洗与查询策略为了解决XML文档中的“脏数据”问题,通过引入XML键组合及XML向量模型,利用贝叶斯学习方法与马尔可夫链概率转移策略建立XML数据清洗过程的元数据模型,利用XML树相似性判定算法,提出了一种智能清洗XML数据的新方法,通过相应规则库的预定义完成XML数据的清洗;另外为了解决XML数据清洗检测繁锁及灵活性差的问题,提出了通过合理组合XML键、融入粒子群算法、结合隐马尔可夫模型信息抽取策略构建XML数据清洗优化算法;为了提高XML数据查询的智能性与有效性,通过采用启发式方法,结合XML半结构化的特点,将粒子算法与蚁群算法融入到海量XML数据概率查询上,并进行相应改进,实现了数据查询范围的并行处理能力与收敛效率的提高。3.XML数据智能挖掘策略互联网上已聚集了海量的XML数据,为了有效地对XML数据进行挖掘,本文从以下几个方面进行研究:(1)为了提高海量XML文档集的聚类质量,分别以粒子群算法与矩阵迭代自组织算法为基础,提出了基于粒子群的XML自适应混沌聚类算法和基于向量空间模型的矩阵迭代自组织XML辅助聚类算法;(2)为了提高海量XML文档集的并行处理能力,根据混沌原理,融入蚁群聚类算法,通过定义相应混沌适应度函数衡量蚂蚁与其邻域的相似程度,提出了一种基于混沌原理与蚁群聚类模型的XML分片算法;(3)针对XML数据的流动性和无限性等特点及质量检测存在的不足,提出构建XML键的矢量矩阵作为窗口,利用矢量积小波变换多级分解与重构,再结合最小二乘支持向量机构建双滑动窗口进行XML数据自适应监测算法,满足对XML数据进行网络传递的质量管理要求。4.XML智能重构策略为了更好地优化XML的语义规范,解决随着用户需求的变化以及时间的推移、XML数据结构也会发生变化这一问题,对XML重构进行了探测性研究。在XML文档片段重构的基础上,利用XML语义约束关系及XML路径层次性,再结合向量机原理与频繁模式的特点,提出了XML频繁模式树XFP-tree算法进行XML结构重构策略,有助于进一步保证XML的质量。

【Abstract】 With the emergence of massive XML data and its transmission, XML has been the important standard of the information expression and data exchange on the Internet. So requirements for the XML datamanagement have been evolving and presenting an important challenge in the current XML database domain. Problems as how to express effectively, query and mine these XML data have important values in both theory and application aspects.In view of the existent problems and shortages of the XML data management in the present research, this paper has adopted a series of current researches on theories and methods of XML data methods, swarms intelligent principles, pattern recognition, neural networks, data mining and intelligent calculation, and has proposed some renewed intelligent management methods to data cleansing, query, data mining using XML keys based on prototype system XBASE(XML DataBase), simultaneously discusses some efficient methods to the XML refactoring and so on.This dissertation focuses on the following four aspects to solve the intelligent management’s problem of querying and mining based XML data:1. XML data management frame’s foundation—XPDMThe existing XML data model has four problems which affect the effective management of the XML data, they are:(1)heterogeneous data: The dissimilar individual often has the difference to the identiacal data object’s naming and the description, which has caused many difficulties to the multi-dataset integration operation and affected the validity of information query; (2)inconsistent data: Without integrality of data restraint, the disagreement data has affected the accuracy of information query; (3)uncertainty of the data dependent’s relations among various data sources:It has tampered with the merging and query operations among data sources; (4)standard code of the semantic: Because XML evolved so that many standards are imperfect especially and there is no unified standard so far esulting the query sentence tedious and confusing.In view of those questions, the paper has proposed an object oriented massive XML data management frame—XPDM(XML-based Probability Data Model) based on vector space model by XML keys and the probability theory. This frame has solved the four problems above well through carrying on the new expansion of semantic standard to XQuery 1.0 and XPath 2.0 data model (XDM) and the XML data vector conversion.2. Intelligent data cleansing and query strategyTo solve the dirty data problem of XML document, this paper has proposed a new intelligent data cleansing algorithm on the method of XML keys combination and XML vector model, and the strategy of Bayes learning method and the MarKov chain probabilistic model to attempt a new XML data cleansing meta-data model, and on the algorithm of similarity XML trees’ checking, which can accomplish the XML data cleansing by predefined rule warehouses. Moreover, in view of the multifarious detection and bad flexibility formerly of the XML data cleansing, the paper has considered an optimization algorithm of XML data cleansing through combining the XML key, combining the PSO algorithm, introducing the hidden Markov model information extraction strategy; Simultaneously the introduction of intelligent algorithm to enhance and the validity of the XML data query, so this paper uses the heuristic method, combining with the XML semi-structured feature, integrates the PSO algorithm and the ACO algorithm in the massive XML probability query, and makes the corresponding improvement, enhances the scope of query and the efficiency of restraining.3. Intelligent XML data mining strategyIn view of the massive XML data has already gathered in the Internet, to carry on the effective mining to the massive XML data, this paper has studied in the direction:(1)To enhance the clustering quality of massive XML documents, this paper has proposed a XML document clustering algorithm based on an adaptive PSO with Chaos and a vector matrix iterative self-organizing assistant clustering algorithm of XML document, which bases on the PSO algorithm and the vector space model’s matrix iteration;(2)To improve the parallel disposal’s capability of massive XML document clustering, this paper proposes a parallel xml documents placement algorithm which bases on the chaos principle and an ant clustering model, through defining the corresponding chaos sufficiency function to weight ant with its neighborhood’s similar degree; (3)In allusion to fluidity and infinity of XML data, and the present insufficiency of quality detection by XML data, the paper has proposed an algorithm which construct the XML key’s vector matrix as the window, and restructures the XML data using the vector product wavelet transformation multistage decompositions, recombining the least square support vector machines to construct double sliding window to carry on the query and the monitoring of XML data, the method can adapt the request of the XML data’s quality management on network tranfer.4. Intelligent XML refactoring strategyFor optimizing the XML semantic consistency and settling the XML structure transformation with consumer dissimilar request went by time, the paper has proposed the research on intelligent XML refactoring. In view of the XML semantic consistency and its path layer, and uniting the vector machine principle and the frequent pattern’s characteristic, the XML frequent pattern XFP-tree algorithm has been considered to carry on the strategy of XML structure refactoring based on the document segment refactoring method, which can more ensure XML quality.

  • 【网络出版投稿人】 中南大学
  • 【网络出版年期】2010年 02期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络