节点文献

XML关键字查询中数据索引和查询结果排序算法研究

Research on Data Index and Query Result Sort Algorithm in XML Keyword Query

【作者】 王雪松

【导师】 陈俊杰;

【作者基本信息】 太原理工大学 , 计算机应用技术, 2007, 硕士

【摘要】 90年代以来,信息技术进入了一个历史上前所未有的飞速发展时期。INTERNET为用户提供了资源共享和信息交互的方便快捷的手段和平台。目前,大多数WEB上的文档是用HTML格式存放和传送的,但在扩展性、结构化和可验证性等方面的不足限制了HTML的应用能力。于是,可扩展标记语言(XML)应运而生,它是专门为WEB设计的一个简单的SGML的子集,既克服了HTML的不足,又去除了SGML中那些对于当前WEB用户来说不必要的特性。目前,XML已成为万维网数据表示和交换的标准。随着XML文档的大量涌现,针对XML文档的检索查询成为目前研究的热点方向之一。尽管基于HTML关键字的搜索引擎已取得很大的成功,但由于XML与HTML在诸多方面存在差异,若不加修改直接用于XML数据查询,则不能完全发挥XML所带来的好处。特别是在关键字搜索查询时,针对XML数据建立有效的索引机制是实现高效查询的重要手段;此外,由于关键字搜索查询有可能返回大量的查询结果,对关键字查询的结果进行有效的排序,也显得尤为重要。本论文即是针对XML数据索引和查询结果排序这两个与XML关键字查询相关的技术开展研究。针对XML数据索引问题,本文在对现有的XML数据索引技术进行分析的基础上,并通过对XML文档树进行压缩的方式,建立了Ttree变形树索引结构和相应算法。我们详细地讨论了Ttree变形树索引的数据结构及其相关算法,并对现有的XML数据索引和Ttree变形树索引进行了对比实验,通过实验验证了Ttree变形树索引的优越性。针对XML关键字搜索查询结果的排序问题,本文探讨了一种关键字搜索查询结果的排序算法ElemRank,我们也详细地讨论了该算法的每个步骤,并通过实验验证了该算法的有效性。

【Abstract】 Since the 90’s,the information technology entered in a history the unprecedented rapid development time. INTERNET was the user has provided resources sharing and the information interactive convenience quick method and the platform. At present, on the majority WEB documents are with HTML form depositing and the transmission, but in the extension, the structure and might the confirmation and so on the aspect insufficiency limits the HTML application ability. Thereupon, may expand the mark language (XML) to arise at the historic moment, it’s specially is a WEB design simple SGML subset, both has overcome the HTML insufficiency, and removed in SGML these to say the nonessential characteristic regarding the current WEB user. At present, XML has become the World Wide Web data expression and the exchange standard.Massively emerges along with the XML documents, inquires one of hotspot directions in view of the XML documents retrieval which into at present studies.Although has obtained the very big success based on the HTML essential character search engine, but because XML and HTML have the difference in many aspects, if does not revise directly uses in the XML data inquiry, then cannot completely display the advantage which XML brings. Specially when essential character search inquiry, in view ofthe XML data establishment effective index mechanism is the realization highly effective inquiry important method; In addition, because the essential character search inquiry has the possibility to return to the massive inquiries result, carries on the effective arrangement to the essential character inquiry result, also appears especially importantly. The present paper is arranges these two in view of the XML data index and the inquiry result with the XML essential character inquiry related technology development research.In view of the XML data index question, this article in carries on the analysis to the existing XML data index technology in the foundation,and through carries on the compression to the XML documents tree the way, has established the Ttree distortion tree index structure and the corresponding algorithm. We in detail discussed the Ttree distortion tree index construction of data and its the conelation algorithm, and has carried on the contrast experiment to the existing XML data indexand the Ttree distortion tree index, has confirmed the Ttree distortion tree index superiority through the experiment. In view of the XML essential character search inquiry result arrangement question, this article has discussed one kind of essential character search inquiry result sort algorithm ElemRank, we also in detail discussed this algorithm each step, and has confirmed this algorithm validity through the experiment.

  • 【分类号】TP312.2
  • 【被引频次】3
  • 【下载频次】303
节点文献中: 

本文链接的文献网络图示:

本文的引文网络