节点文献

基于本体的知识表示及信息检索研究

Research on Ontology-Based Knowledge Representation and Information Retrieval

【作者】 李丹丹

【导师】 黄文培;

【作者基本信息】 西南交通大学 , 教育技术学, 2011, 硕士

【摘要】 随着网络信息的快速增长,基于关键词的传统信息检索技术已逐渐不能满足人们的需求。而知识检索注重知识和语义的匹配,具有较高的查准率和查全率,成为了人工智能领域及信息检索领域的研究热点。本体良好的概念层次结构恰好适合知识表示,可以充分描述领域知识模型,反映概念间的语义关系,且支持逻辑推理,因而基于本体的知识检索可以更好的实现语义检索及提高检索准确度。本文在对本体的知识表示、本体描述语言、编辑工具及建模方法的深入分析及比较的基础上,设计了计算机科学与技术领域本体的构建过程,依次获取计算机领域本体的知识表示元素:概念、继承关系、属性关系、实例等,对每个步骤中所涉及的算法与技术进行了分析与实现。主要包括:首先在ICTCLAS开源系统基础上二次开发实现批量语料的分词与去除无用词的预处理;其次采用特征词权重计算算法TF-IDF实现了对计算机领域语料库的特征词提取,从而获得计算机领域候选概念集;然后通过计算概念间的相关度构建概念向量,并采用夹角余弦公式计算概念间的相似度,经过人工聚类获得计算机领域继承关系的知识层次结构;最后基于概念继承关系,获取概念与概念间的对象属性、概念的数据属性及属性的限制。完成了以上领域本体知识元素的获取后,采用protege构建了计算机领域本体并进行评价。基于计算机领域本体,本文探讨了基于本体的信息检索关键技术。首先分析比较了数据检索、全文检索与知识检索技术的检索特点,指出基于本体的知识检索的优势。其次在本体的通用推理规则和本体典型关系的推理规则的基础上构建了计算机领域本体的一系列领域推理规则,为知识检索系统的推理功能提供支持。最后基于本体提出了一种启发式的查询式扩展算法和流程,以保证信息检索的查全率。最后在理论技术研究的基础之上,设计并实现了基于计算机领域本体的论文检索系统实验原型。系统提供了条件检索和导航检索两种检索方式,系统具有良好的语义推理及查询式扩展功能,同时也验证了本文理论技术的正确性。

【Abstract】 With the rapid growth of network information,the traditional information retrieval based on keywords technology has gradually can’t satisfy people’s needs.And knowledge retrieval emphasizing knowledge and semantic matching, with high precision and recall, became the research hotspot of the fileds of artificial intelligence and information retrieval.Ontology’s good concept hierarchical structure is suitable for knowledge representation. Ontology can fully describe domain knowledge model,reflecting the semantic relations between concepts and supporting logical inference. So the knowledge retrieval based on ontology can be better implementing semantic retrieval and improve retrieval accuracy.Based on analysis and comparison of the knowledge representation of ontology, ontology description language, editing tools and modeling method,this thesis presented the computer science and technology domain ontology building process,which ordinal achieved computer domain ontology’s knowledge expression elements:concepts, inheritance relationships, attribute relations, instances and so on, and the algorithm and technologies involved were analyzed and implemented for each step.Firstly,a second development based on ICTCLAS open source system, implemented pretreatment of the corpus including word segmentation and removing stop words.Secondly,the feature words were extrcated from computer filed corpus by weight calculation algorithm method TF-IDF,thus they made up computer science filed’s candidate concepts.Then through the calculation method of degree of correlation,builded concept of vector model, and used cosine formula to compute the concept similarity,after artificially clustering, the computer domain knowledge inheritance relationship hierarchical structure was obtained.Finally based on concept inheritance relationship, obtained object attribute between concepts, data attribute of concepts and restrictions. After obtained the above domain ontology knowledge representation elements,using protege builded computer domain ontology and evaluated it.Based on computer domain ontology, this thesis studied the key technologies of information retrieval based on ontology.First analysised and compared retrieval characteristics of data retrieval,fulltext retrieval and knowledge retrieval technology, and this thesis indicated that the advantages of knowledge retrieval based on ontology.Secondly on the basis of general reasoning rules and typical relationship reasoning rules, builded a series of field computer domain ontology reasoning rules,supporting reasoning functions of knowledge retrieval system.Finally, this thesis proposed a heuristic inquires extended algorithm and processes based on ontology, to ensure the recall rate of information retrieval. Finally on the basis of theory technical research and computer domain ontology, designed and implemented a paper retrieval system. The system provided conditions retrieval and navigation retrieval the two retrieval methods. It has functions of good semantic reasoning and query expansion, also verified the correctness of the theory and technology.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络