节点文献

基于本体的文本内容相关性的研究与实现

The Research and Realization of Text Content Relevance Based on Ontology

【作者】 秦久英

【导师】 王宏生;

【作者基本信息】 沈阳工业大学 , 计算机应用技术, 2010, 硕士

【摘要】 当今互联网时代,用户虽然能够非常方便快捷地获取大量信息,但在所获取的海量信息中,并非所有内容都与用户需求相关,那些不相关信息大大影响用户获取所需信息的效率。因此,探索新的检索模式,进一步提高检索相关性,满足用户快速、准确获取所需信息的要求,是信息检索研究发展的必然趋势。本体作为语义网中的关键技术,是近年来学界研究的热点,它有着良好的概念层次结构和对逻辑推理的支持,通过对领域知识的建模,表达出机器可理解的语义知识,实现基于内容的检索。因此,本体的应用对信息检索相关性有较大的积极影响。本文通过本体和相关性这两项技术的探索与研究,将二者相结合,设计了基于本体的语义查询扩展算法。算法的目的是对用户输入的查询词进行语义扩展,从而提高查全率和查准率。提出了多重因子加权文档排序算法对文档进行相关度排序操作,大大提高了查询词和检索结果的相关性。基于本体的语义查询扩展算法,在Eclipse平台下用Java语言设计并实现了基于本体的搜索系统。它包括六部分:用户接口、查询请求处理模块、本体处理模块、网络资源预处理模块、检索模块和查询结果处理模块。实验表明,在系统各模块的协同运作下,能够返回较准确的查询结果。本文通过本体知识库来扩展查询词和基于多重因子加权文档排序算法以提高搜索结果与查询词的相关性,也提高了查全率和查准率。展示了利用查询扩展和相关性排序来提高信息检索性能的搜索系统,为构建智能信息检索系统提供了有力的参考。

【Abstract】 In Internet era, user can easily get abundant information, but some information is not related to user’s needs. Therefore, it is inevitable to search for new retrieval model to improve the relevance and help people overcome the problem of information overloading. Ontology is a very important technology for semantic Web, it is research hotspot in recent years, and it has concept hierarchy structure and supports logical reasoning. By modeling domain knowledge, it can conveys the semantic knowledge that machine easily understand and achieve content-based retrieval. Therefore, the application of ontology is bound to have a greater positive impact on information retrieval.Through studying lots of related works on Ontology and relevance, this paper combines the two technologies and designs a semantic expansion algorithm base on domain ontology. The aim of the algorithm is to expand query words which user has input and improve the recall and precision ratio. The paper also sorts the query results by multiple factors weighted sorting algorithm and greatly improved the relevance between query words and retrieval results.Base on the semantic expansion algorithm’s presentation, an information retrieval system is designed and built with Java language on Eclipse platform. It consists of six parts. They are user interface, query processing module, ontology processing module, Internet resources preprocessing module, search module and query results processing module. Experiments show that the results are accurate with the six modules working cooperatively.This paper through expand initial query words in Ontology knowledge database and use multiple factors weighted sorting algorithm to improve recall ratio and precision ratio of the search system. Demonstrated information retrieval system which performance has greatly improved by semantic expansion and relevance sorting, it is provides powerful reference for building intelligent information retrieval system.

【关键词】 相关性本体查询扩展信息检索
【Key words】 RelevanceOntologyQuery ExpansionInformation Retrieval
节点文献中: 

本文链接的文献网络图示:

本文的引文网络