节点文献

基于XML的信息管理系统的数据集成技术研究

Research on the Data Integration Engineering of the Information Management System Based on XML

【作者】 翟学敏

【导师】 刘渊;

【作者基本信息】 江南大学 , 计算机应用技术, 2008, 硕士

【摘要】 随着Web技术及其应用的快速发展,XML已经成为互联网上信息表示和数据交换的一个重要标准,XML在电子商务、数据交换、科学数据表示、数据建模与搜索引擎等领域有着广泛的应用,其作用已深入到网络社区的每个角落;而且当前数据库的发展呈现三个主要特征:支持XML数据格式,具有商业智能,支持SOA(服务导向架构)。随着大量XML数据的涌现与传递,产生了对XML数据管理的需求,因此如何有效地表示、存储、管理、查询与挖掘这些XML数据或数据流已成为当前XML数据库领域中一个重要挑战,具有十分重要的理论和应用价值,本文正是基于此背景研究XML数据智能管理的。本文围绕XML数据/数据流的表达、查询和聚集等问题展开研究,研究内容和取得的成果主要体现在数据智能清洗与查询方面:数据清洗是提高数据质量、并提高数据查询效率的一种有效手段。随着互联网的发展,XML数据智能清洗与查询的重要性逐渐为人们所认识;针对以往XML数据清洗检测繁锁及灵活性差的缺陷,本文尝试通过合理组合XML键、融入粒子群算法、通过引入贝叶斯学习方法及隐马尔可夫模型信息抽取策略构建XML数据清洗过程的元数据模型,综合清洗结构化数据中相似重复记录的思想,提出一种利用粒子群算法改进XML数据清洗的新方法;同时引入群智能算法提高XML数据查询的智能性与有效性,特别是粒子群算法具有快速随机的全局搜索能力,但无法利用反馈信息,而蚁群算法通过信息素的累积和更新收敛于最优路径上,具有分布式并行全局搜索能力,但初期信息素匮乏,求解速度慢等特征,采用启发式方法,结合XML半结构化的特点,将粒子算法与蚁群算法融入于XML概率查询上,并进行相应的改进,采用粒子群算法快速生成信息素分布,利用蚁群算法精确求解,达到优势互补,提高数据查询的范围和收敛的效率。

【Abstract】 Along with the fast development of web technology and the application,XML already became the important standard of the information expression and data exchange on the Internet,XML has the widespread application in the electronic commerce,the data exchange, the scientific data expressed, data model and search engine and so on,its function penetrated into each corner of the network community. Moreover the current database’s development presents three chief features:Supporting the XML data format,having the commercial intelligence,supporting SOA (service-oriented architecture).Along with the massive XML data’s emergence and transmission,demand to the XML data management has produced, therefore how to express the effectively the memory,the management,inquire and unearth these XML data or the data stream have become in the current XML database domain an important challenge,has the very important theory and the application value,this article gives a research on the XML data intelligence management precisely based on this background.This article gives the research on expression, inquiry and clustering of the XML data/data stream’s,the achievement which the research content and obtains mainly manifests in the Intelligent clustering and inquiry of the data aspects:The data cluster is an effective measure to improve the data quality and raise the data inquiry efficiency. Along with development of Internet, the importance of XML data intelligent clustering and inquiry is known gradually by the people; in view of the multifarious detection and bad flexible formerly of the XML data clustering, this article attempts a new clustering method using PSO algorithm through combining the XML key, integrating the PSO algorithm, introducing Bays studying method and the hidden Markov model information extraction strategy reasonably constructs the meta-data model in the process of XML data clustering, and considering the similar redundant record in the clustering of structured data;Simultaneously introduces intelligent algorithm to enhance the intelligence and the validity of the XML data inquiry,specially the PSO algorithm has the fast stochastic overall search ability, but is unable to use the feedback information,but the ant swam algorithm has the distributional parallel overall search ability through accumulating and renewing the information of the element to restrain in the optimal choice,but in the initial period information is deficient,solution speed is slow,so use the heuristic method,unify the XML half structure characteristic,integrates the PSO algorithm and the ant swam algorithm in the XML probability inquiry,and makes the corresponding improvement,produce information distribution using PSO algorithm fast,solve precisely using the ant swam algorithm,achieve the survival of the fittest, enhances the scope of data inquiry and the efficiency of restraining.

  • 【网络出版投稿人】 江南大学
  • 【网络出版年期】2009年 03期
节点文献中: