节点文献

一种快速的非提取式XML解析器的设计与实现

Design and Implementation of a Fast Non-extractive XML Parser

【作者】 张运嵩

【导师】 钱培德;

【作者基本信息】 苏州大学 , 计算机软件与理论, 2010, 硕士

【摘要】 随着XML技术的广泛应用,如何提高XML解析器的性能是一个亟待解决的问题。XML解析模型直接影响XML解析器的性能,因此解决这个问题应从XML解析模型入手。当前的研究工作大多基于提取式XML解析模型,对非提取式XML解析模型的研究很少。VTD-XML是一种新型的非提取式XML解析模型。本文在VTD-XML的基础上设计并实现了一种快速的非提取式XML解析器,称为NEM-XML。首先,NEM-XML是一种非提取式XML解析器。它抛弃了XML DOM模型中为每个XML节点创建节点对象的做法,取而代之的是使用64位长的整数保存XML节点的元信息,极大地减少了解析XML文档所需的时间和内存空间。NEM-XML以静态链表的方式组织内部的数据结构,既方便了元素节点的添加和删除,又提高了XML文档的遍历速度。其次,探索了复用XML解析结果的方法,也就是在第一次使用XML文档时进行正常的解析并将解析结果保存到二进制文件中,以后使用时直接利用二进制文件还原原始的解析结果。这在那些仅对XML文档进行访问而无更新操作的应用中有很大的实用价值。为了复用NEM-XML的解析结果,本文改进了NEM-XML的数据结构,以减少保存解析结果所需的空间以及还原解析结果所需的时间。最后,并行计算是当前的一个重点研究领域,XML并行解析也得到了越来越多的关注。本文研究了NEM-XML的并行解析算法,提出了一种受限制的XML文档划分方法,可以很快地确定各个文档片段的初始解析状态。这个划分算法兼顾XML文档的层次结构和负载平衡,划分结果比较理想。本文对XML解析技术的研究具有一定的现实意义。它不但扩展了VTD-XML所体现的非提取式XML解析思想,还进一步研究了如何复用NEM-XML的解析结果,可以促进XML在各个领域的应用。另外,本文提出的受限制的XML文档划分方法对其它XML并行解析方面的研究具有一定的参考价值。

【Abstract】 As a platform- and language-neutral markup language, XML plays an important role indata representation and data exchange over Internet. However, how to improve the perfor-mance of XML parsing is an urgent task. Nowadays most research is based on XML DOMwhich is the most widely used XML parsing model. This paper presents a fast non-extractiveXML parser based on VTD-XML, called NEM-XML.Firstly, NEM-XML is a non-extractive XML parsing model, which means that it doesnot create node objects for all XML nodes during parsing. Instead, it encodes the nodeinformation in 64-bit integers. In this way, a lot of memory space is saved and parsing per-formance is improved. To gain more ?exibility and usability, NEM-XML keeps the structureinformation in a static linked array. This kind of data structure can facilitate the updatingand navigation operation among XML nodes significantly.Secondly, it is quite promising to reuse the XML parsing results where there is no needto perform updating in XML document. This is a good way of avoiding parsing the sameXML document repetitively. This paper makes a further change on NEM-XML to reducethe space needed to save the XML parsing results and the time to restore them.Finally, parallel computing is a hot research field nowadays and parallel XML parsingtherefore becomes more and more popular. This paper proposes a restricted XML partitionmethod to reduce the uncertainty of each chunk. This partition method takes both the docu-ment structure and load balancing into consideration. The partition result is quite satisfied.The work on XML parsing technology has certain practical significance. On one hand,it extends VTD-XML to make it more ?exible and makes a deep research on reusing XMLparsing results, which promotes the application of XML in various fields to some extent.On the other hand, the partition algorithm this paper presents is quite e?cient and providessome reference for relative research on parallel XML parsing.

【关键词】 XML解析VTD-XML非提取复用性并行计算
【Key words】 XML ParsingVTD-XMLNon-extractiveReusabilityParallel Computing
  • 【网络出版投稿人】 苏州大学
  • 【网络出版年期】2011年 01期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络