节点文献

基于叙词表的领域本体构建方法研究

Research in Thesaurus-based Ontology Building Method

【作者】 李梦莎

【导师】 姜同强;

【作者基本信息】 北京工商大学 , 管理科学与工程, 2010, 硕士

【摘要】 本体构建的方式可分为两种:一是基于领域专家的手工构建方式;一是基于机器学习的自动/半自动构建方式。前者以人工工作为主体,所构建本体的语义内容依赖于构建者的个人知识,因此这种方式对知识瓶颈问题只是起到了一种缓解作用。而后者是通过机器学习的方式从海量信息中自动获取知识,是从根本上解决本体构建中知识瓶颈问题的重要途径。目前关于本体自动构建方面的研究越来越多,然而本体构建中领域依赖性强、自动化程度低、学习效果不理想等问题尚未得到很好的解决。特别是在中文本体构建方面,国内外对中文本体自动构建的研究非常少。因此,本文在对当前本体构建技术及本体学习方法进行深入研究的基础上,提出一种领域本体自动构建的新思路,并重点研究以下几方面内容:(1)提出一个基于叙词表的领域本体学习系统模型。该模型将叙词表的本体转换技术以及本体学习中关系获取的技术相结合,利用叙词表的固有优势,弥补了本体学习过程中由于概念及分类关系获取效果不佳的问题,并在此基础上通过对纯文本数据源进行关系学习,获取概念间的非分类关系,使得所构建的领域本体具有更丰富的语义信息。(2)设计并实现了基于叙词表的领域本体学习系统。基于叙词表的领域本体学习系统分为叙词表转换模块以及非分类关系学习模块。在叙词表转换模块中,本文总结了一套领域叙词表本体转换的规则,并以此为依据实现了叙词表到领域初始本体的转换。在非分类关系学习模块中,以扩展的关联规则挖掘法为理论依据,利用中文自然语言处理等技术对中文语料库进行关系获取,并将关系学习的结果添加到初始本体之中。(3)用该系统构建领域本体并对其进行评价。目前对本体的评价尚未形成标准,本文仅选用复用性、可扩展性、相关关系参照度等几个指标对本体自动构建结果进行评价。本文设计并实现的基于叙词表的领域本体学习系统,为中文领域本体的自动构建提供了有价值的参考,且对基于中文本体的语义知识具体应用具有积极意义。

【Abstract】 Ontology building methods can be divided into two types: one is a manual construction method based on experts; the other is an automatic construction method based on machine learning. The first one is a manual-work approach. In this way, semantic of the ontology depends on the builder’s personal knowledge, so it just plays a role of mitigation to the bottleneck of knowledge. The second one is to obtain the knowledge automatically from the mass information by machine-learning approach. It is a fundamental solution to the bottleneck of knowledge in building Ontology. Nowadays, though more and more organizations do research on Ontology building, low-level automation, experts-dependency and other issues are still unresolved. Particularly in the research of Ontology building in Chinese, the research of Chinese Ontology building is not enough.Therefore, based on the researching in the current Ontology building and learning methods, this paper focus on the following aspects:1. Bring forward thesaurus-based Ontology automated building system model. This model combines the thesaurus conversion and relations learning technique. It inherent the advantages of thesaurus to make the concept and category relations more exact, then use relations learning technique to extract the non-category relations from document resources, in order to complete the semantic information of the relations between concepts.2. Design and develop thesaurus-based ontology automated building system. This system includes thesaurus-conversion module and relation-learning module. In the first module, a set of thesaurus-conversion rules are present as a basis to achieve the conversion of thesaurus to initial Ontology. The second module makes association rule mining as its guidance, uses Chinese natural language processing technique to get the relations from document resources, and add these relations into initial ontology.3. Evaluate the Ontology which is built by thesaurus-based ontology automated building system in terms of scalability, reusability and reference degree.These researches provide a valuable reference to Chinese Ontology building, as well as make a positive effect to Ontology-based application.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络