节点文献

一种基于流的XML查询算法的设计与实现

【作者】 徐哲

【导师】 牛纪桢;

【作者基本信息】 大连理工大学 , 计算机应用技术, 2004, 硕士

【摘要】 XML在信息管理、电子商务、个性化出版、移动通信、网络教育、电子文档交换等诸多领域得到了广泛应用,已经开始成为Internet上数据描述和交换的事实上的标准。随着XML技术的不断发展及其应用领域的不断扩展,越来越多的数据开始采用XML进行描述、存储、交换和表现。传统的信息管理技术由于XML文档的出现而正面临新的挑战,因此增强面向XML文档信息查询能力变得越来越重要。 通过对现有的XML文档查询算法分析发现:算法的实现都是把被查询文档全部载入内存之后再进行处理,因此要消耗大量内存,尤其是在XML文档很大以致于无法全部载入内存的情况下,现有的算法就无能为力了。针对这一问题,本文设计并实现了一种新的查询算法。该算法根据XPath查询表达式,生成一个查询自动机;将查询条件隐含在查询自动机的结构和状态中;XML流经过解析转化为事件流,这些事件作为查询自动机的输入,触发状态转换。查询自动机依据不同的输入事件,例如元素开始事件、文本事件和元素结束事件等,在各个状态之间进行转换。文档尽可能少地占用内存,一旦确认某一部分文档完全匹配查询表达式,就输出查询结果。 论文中详细地介绍了由查询表达式构造查询自动机的步骤;实现了一个基于流的XML文档查询系统的原型,它可以在对XML流的一次单向读取过程中处理XPath,输出查询结果。论文中还对基于内存的XML查询算法和基于流的XML查询算法进行测试、比较,并对结果进行了分析。 基于流的XML查询算法是为了满足一些数据密集型应用对数据查询处理的需求而引入的,这类应用处理的数据不宜用持久稳定的关系建模,而应采用数据流建模。这类应用的领域包括金融服务,网络监控,电信数据管理,生产制造,传感检测等。本论文的研究对这类实际应用将具有一定的理论意义和使用价值。

【Abstract】 In many fields such as Information management, E-business, Personalize publication. Mobile communication, Online Education, and Electronic data interchange, XML has been put to extensive use and has become the defacto standard for data description and exchange. As the evolution of XML technology and the spread of XML application, more and more information has been described, stored, exchanged and presented in XML. Conventional information management technology meets the challenge of XML. It becomes more important to develop the technology of querying information from XML document.After investigating the existing algorithm, we find that the implementation is based on the idea that whole XML document must be loaded into memory. While XML document is too large or cann’t be loaded into memory, algorithm is of no effect. To solve this problem, we present a different method for query processing in this paper. According to the query expression, a query automaton is build, whose struct and states imply the query predicate. The XML stream is parsed into element tag and text events stream. Those events such as startelement event, textevent or endelement event trigger states transition of query automaton. To reduce memory requirement, the document fragment should be put into output immediately as soon as it meets query expression.We present a method of building query automaton and an implementation of XML stream query system. This implementation could evaluate the XPath expression in one-pass scan of xml stream. Finally we make a comparative experiment to investigate the memory use of memory-based algorithm and the steam-based algorithm.XML stream querying problem is introduced to meet the querying requirement of some data-intensive application. These applications adapt to stream data model instead of persistent data model. Stream data model can be used in many fields such as financial service, network monitor, communication data management, manufacture, sensor network. Research in this paper contributes to the theory and practice of this kind of applications.

【关键词】 XML流XPath自动机查询
【Key words】 XML StreamXPathAutomatonQuery
  • 【分类号】TP311.13
  • 【被引频次】2
  • 【下载频次】175
节点文献中: 

本文链接的文献网络图示:

本文的引文网络