

Re-ranking Methods Research Based on Cloud Model Theory

【作者】 娄振霞

【导师】 张茂元;

【作者基本信息】 华中师范大学 , 计算机应用技术, 2012, 硕士

【摘要】 近年来,计算机以及互联网技术在我国信息化建设方面取得了自订所未有的普及和发展,这导致信息量不断增长。面对持续膨胀的海量信息,如何提高检索的效率以提升用户的检索体验,这给信息检索带来了巨大的挑战。本文首先介绍了文档重排的概念及其研究现状,并通过分析基于统计的和基于语义的两类文档重排方法,发现这两类文档重排方法均忽视了自然语言本身具有的不确定性特点,然后结合云模型理论,从发现不确定性知识的角度研究信息检索中的文档重排方法。本文通过发现查询词层次的不确定性知识,提出了一种基于云模型的文档重排方法。该方法通过获取查询关键词在文档中的分布情况,利用云模型实施定性定量转换,获取文档表征查询的不确定性,以此进行文档重排。论文进一步通过发现查询语句层次的不确定性知识,提出了一种基于概念跃升的文档重排方法。该方法是在获取查询词层次文档表征查询的不确定性的基础上,利用云综合算法对查询词进行概念跃升,得到查询语句层次文档表征查询的不确定性,综合这两个层次的不确定性知识进行文档重排。本文成功设计并实现了基于云模型理论的信息检索系统。该系统是在获取了首次检索结果的基础上,利用云模型理论的三个数字特征,分别从查询词以及查询语句两个层次获得用文档表征查询的不确定度,基于此不确定度由低到高完成文档重排,将重排后的结果返回给用户。本文采用NTCIR-5信息检索测试集,根据TREC评测标准对所提出的方法进行对比实验。实验结果表明,所提出的方法在relax和rigid这两种评测标准下均有所提高,尤其在rigid评测标准下有更好的效果。

【Abstract】 In recent years, computer and Internet technology in the information construction of our country has made unprecedented popularization and development, which leads to the continual growth of information content. It’s a big challenge for information retrieval (IR) to improve the retrieval efficiency and the user experience in respect of the continuous expansion of massive information.This thesis first introduces the concept of document re-ranking and its research progress, and thoroughly analyzes the two main methods of document re-ranking, the statistics-based method and the semantic-based method. It has been found out that the two methods both neglect the uncertainty in native language. So this thesis researched the method of document re-ranking in information retrieval based on cloud model theory from the perspective of uncertain knowledge discovery.This thesis has proposed a re-ranking method based on cloud model by means of the uncertain knowledge discovery on the query terms level. The re-ranking method based on cloud model acquired the distribution of the key terms in the documents, used cloud model to convert the distribution into the uncertainty of the document representing the query on the query terms level, and then re-ranked the documents. And then this thesis proposed a re-ranking method based on concept hierarchy using cloud model by means of the uncertain knowledge discovery on the query level. The re-ranking method based on concept hierarchy using cloud model first acquired the uncertainty degree of using the document to represent the query on the query terms level, and then elevated the query terms level to the query based on the concept hierarchy theory using the cloud model synthesized algorithm, therefore acquired the uncertainty degree of using the document to represent the query on the query level, finally used the uncertainty of the two level’s to re-rank the documents.This thesis makes use of the methods proposed in this thesis in document re-ranking, and have designed and implemented the IR system successfully. The system firstly employs the three numerical characteristics of the cloud model to obtain the uncertainty of using the document in the first time research results to represent the query at two levels:the query terms level and the query level which is obtained from the cloud concept hierarchy promotion of the query terms level. And then re-rank the documents based on that uncertainty, returned the re-ranked documents to the user finally.This thesis performed experiments on the information retrieval test collections of NTCIR-5, and evaluated the results under TREC assessments. Experiments showed that the methods make improvements in both relax and rigid assessments and perform more excellent in the rigid assessment.
