节点文献

基于语义处理技术的信息检索模型研究

Research on Semantic Processing Technology Based Information Retrieval Model

【作者】 王瑞琴

【导师】 孔繁胜;

【作者基本信息】 浙江大学 , 计算机科学与技术, 2009, 博士

【摘要】 信息爆炸是当今信息社会的一大特点,当前信息检索技术面临着互联网网络信息更新越来越快,用户检索结果要求越来越精确的严重挑战。如何在海量的信息中有效地找到所需信息因而成为了一个关键问题,语义检索技术是解决这一问题非常有潜力的方法。然而,在语义网还没有完全实现的情况下,研究过渡时期的语义检索技术已成为近年来一个快速发展的新兴研究课题。本文对信息检索中的若干关键问题进行了研究,提出了基于语义处理技术的信息检索模型——SPTIR(Semantic Processing Technology based InformationRetrieval)。该模型围绕查询扩展和检索结果重排序而展开,主要由四个部分构成,即:基于词义消歧的语义查询扩展、基于词汇语义相关性度量的查询优化、基于文档语义相关性的检索结果重排序和语义加强的个性化信息推荐。1.在基于关键字的搜索引擎中,一个构造良好的查询是用户主观信息需求的客观表现,也是信息检索服务质量的基本保证。本文以用户查询关键字之间的语义关联为切入点,辅以隐式反馈技术获取消歧上下文,使用无导词义消歧的方法实现了查询关键字到本体概念的映射,基于概念词语关联进行语义查询扩展。基于词义消歧的语义查询扩展解决了传统的信息检索系统不能很好理解用户查询意图的问题。2.针对部分消歧失败的查询关键字,本文提出使用隐式反馈技术从相关文档中直接提取候选扩展查询词的策略。为了进一步精简和优化反馈产生的扩展词汇,避免查询扩展的“主题偏移”现象,本文采用基于词汇语义相关性度量的方法对扩展查询词进行过滤来优化查询。3.由于传统关键字检索返回的数据量过大,检索结果相关性评价成为研究的焦点。本文根据查询消歧的具体情况(成功、失败),提出两种文档语义相关性度量的方法:基于语义向量空间模型的文档相关性和基于词汇向量空间模型的文档相关性。根据文档相关性对检索结果进行重新排序,优先返回与查询语义相关性强的文档供用户浏览。4.本文对如何满足不同用户的个性化查询需求进行了研究,提出了一种语义加强的个性化信息推荐方法。该方法综合利用语义数据源和历史评分数据进行混合推荐,语义数据源的引入解决了传统协同过滤系统的数据稀疏性和冷启动问题。另外,为了提高推荐系统的可扩展性和实时性,在数据的离线预处理阶段,本文使用数据挖掘方法对用户和项目进行了模糊聚类。

【Abstract】 We are in an information age that mainly characterized by information explosion, and information retrieval techniques are now challenged a lot by more frequent Internet information updating, as well as increasing user demand for more precise search results. Semantic search technique, fortunately, is a hopeful way that leads to the key to the issue of finding exact information from mass number of them effectively. However, as a result of the incomplete realization of semantic web technique, recent study has been more focused on semantic retrieval technique in transition period, making it a hot topic of research.Several key problems in Information Retrieval (IR) domain are addressed and a novel Semantic Processing Technology based Information Retrieval (SPTIR) model is proposed in this dissertation. SPTIR is an extension on Query Expansion (QE) and Search Result re-Ranking, which consists of four parts, namely semantic query expansion based on Word Sense Disambiguation (WSD), query optimization based on word semantic relatedness, search results re-ranking based on document semantic relevance, and semantic enhanced personalized information recommendation.Firstly, in the context of keyword-based search engine, a well-structured and good-meaningful user query not only expresses user’s personal needs precisely, but also guarantees the QS (Quality of Service) requirement for information retrieval. Starting with the issue of semantic associations of query keywords, supplemented by implicit feedback technique, and using unsupervised Word Sense Disambiguation, this dissertation presents a technique that maps query keywords to ontology concepts, and a semantic query expansion technique based on concept-word association. The WSD based semantic query expansion solves the problem of not well understanding user’s query intension in traditional retrieval systems.Secondly, for those query keywords that fail to disambiguate, this dissertation presents a strategy that directly selects candidate expanded query keywords from the relevant documents using implicit feedback technique. In order to further condense and optimize the expansion keywords that generates from feedback, and to avoid the "topic shift" phenomenon in query expansion, this dissertation uses a semantic relatedness measurement between terms to filter expanded keywords to optimize the query.Thirdly, traditional keyword-based search always returns millions of search results, thus the relevance evaluation of retrieval results has become a hot topic of research. Based on the specific situation (success, failure) of Query Disambiguation, two distinct types of Document Semantic Relevance Measure, namely Semantic Vector Space Model based Document Relevance and Word Vector Space Model based Document Relevance, are proposed in this dissertation. With Semantic Relevance, the search results are re-ranked and the documents with a strong semantic correlation to query words are presented to user with high priority.Fourthly, the problem of how to meet the information needs of different users is studied, and a semantic-enhanced personalized information recommendation model is proposed. This model utilizes the semantic data sources and historical rating data to implement a hybrid recommendation. The introduction of semantic data sources solves the sparse problem and the cold start problem in traditional collaborative filtering system. In addition, in order to improve the system scalability and realize real-time recommendation, data mining method of fuzzy clustering is used to cluster the users and items in offline data pre-processing stage.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2011年 03期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络