节点文献

黄页搜索引擎系统扩展技术研究与实现

【作者】 杨义传

【导师】 胡运发;

【作者基本信息】 复旦大学 , 软件工程, 2009, 硕士

【摘要】 为处理计算机系统内存储的海量非结构化全文数据,国内外对全文数据库技术展开了广泛的研究。其中全文检索技术作为全文数据库技术中关键技术之一引起了研究人员的普遍关注。首先在已有研究结果的基础上介绍了目前尚处在起步阶段的新型全文检索模型——互关联后继树(简称IRST)模型,就互关联后继树及其主要改进模型同几种主流的全文检索模型如位图模型、倒排表模型、Pat数组模型等性能方面的差异进行了比较。同时为进一步探讨全文检索模型性能和存储方式间的关系问题,对互关联后继树模型和Pat数组模型在不同储存方法下的性能差异进行了详细研究。此外,针对互关联后继树现有模型中存在的部分问题提出了改进方案。主要引入了双排序互关联后继树二分加验证检索以及预处理后继区间表检索算法:并改进了现有关系型数据库和全文数据库协同检索模型。最后,为推进我国民用航空适航审定基础能力建设;充分将先进技术应用于工程实践。针对我国民用航空适航审定能力建设所急需的审定情报中心建设项目提出了初步建设设想。

【Abstract】 To deal with a mass of unstructured data stored within the computer system,researchers home and abroad carried out extensive research on full-text database.Among these,full-text information retrieval as one of the key technologies in full-text database has aroused general concern of researchers.First,on the base of existing studies,introduced a new full-text information retrieval model,Inter-Relevant Successive Tree(IRST),which is still in the initial stage and then make comparison between the model of the IRST and its major improved models and several mainstream models of full-text information retrieval,such as Bitmap,Inverted Files,Pat Array in terms of performance.At the same time in order to further explore the relationship between full-text information retrieval model performance and storage means,the writer conducted a detailed study of the performance difference between IRST and Pat array under different storage status.In addition,the writer proposed improvements for the problems in the existing models of IRST.Mainly a area binary search together with verifying process are introduced and cooperative query of IRST and B-Tree are improved as well.Finally,in order to promote the basis capacity-building on China’s civil aviation airworthiness certification and to fully apply advanced technology to engineering practice,the writer put forward a preliminary vision on validation information center construction project which is urgently needed for the capacity-building on China’s civil aviation airworthiness certification.

  • 【网络出版投稿人】 复旦大学
  • 【网络出版年期】2011年 S1期
  • 【分类号】TP391.3
  • 【下载频次】28
节点文献中: 

本文链接的文献网络图示:

本文的引文网络