节点文献

领域本体构造中数据源选取及构造方法的研究

Research on Method of Data Sources Selection and Constructing Domain Ontology

【作者】 邢军

【导师】 韩敏;

【作者基本信息】 大连理工大学 , 计算机应用技术, 2008, 博士

【摘要】 本体构造方法的研究有利于本体的广泛应用和推广,尤其对WWW向下一个版本转化具有现实意义。目前,本体构造多侧重在方法过程的研究,如果能够加强对本体数据源内部特征的分析,会进一步减少有效信息丢失、无用数据被采用等问题。本文在详细分析数据源特点的基础上,分别提出文档分解模型,输入输出驱动模型以及两层向量空间模型,并集成神经网络、模糊FCA等多种智能方法,建立本体手工和(半)自动构造方法,并实现一个本体构造工具。主要研究内容和结果如下:(1)本体数据源选取研究。本体数据源选取效果直接影响本体构造的质量,目前本体数据源的研究大多集中在文本数据源的研究。对文本数据源的分析不仅要考虑了术语、概念在文档中的频率及含有术语文档在整个文档中的百分率,而且还要考虑术语在文档中的位置信息、文档标引源的位置特性。本文通过文档分解模型的建立,利用抽象方法,完成对本体数据源的概念性、关系性和预测性等特点的分析。并针对这些特点分别采用改进的VSM方法、基于本体关系距离以及神经网络的方法计算相关权值。同时,本文采用Java+Oracle技术,完成本体数据源选取系统的设计与实现,通过“湿地保护”相关的真实文档验证该方法,得到较好的选取结果。(2)特殊领域本体的手工构造——湿地保护领域本体的构造方法研究。建立“数字化”湿地的目的是实现湿地的知识管理和信息共享,而湿地本体的构造,是达到此目标的基础。本文通过对现有手工构造本体技术分析的基础上,提出构造湿地保护本体的方法——WP-Onto方法,以输入输出驱动模型完成本体数据源的组织,把相关知识进行归类并建立知识集,对概念、关系细化提取,实现本体编码及形式化表示。另外,本文还对湿地保护本体应用进行研究,包括信息共享和知识管理两个部分。(3)利用Web资源完成本体构造方法研究,不仅会缩短本体的构造周期,而且还会扩大本体的应用范围。但基于Web的数据提取、知识获取比较困难,与实际应用相比还有一定的距离。本文分析基于Web本体构造数据源的动态、海量、异质、变化、开放性等特点,本体构造的基础问题——形式化表示方法,总结本体构造的关键技术及技术难点。设计一个基于Web本体构造系统架构,为实现基于Web领域本体构造方法提供一个框架性的思路。(4)本体学习工具实现研究。为构造出一个效率、准确率较高的本体学习工具,本文采用面向对象思想的分析方法,把传统的单层文本向量空间模型改进为两层向量空间模型(Double Vector Space Model,简称D-VSM),该模型不仅具有属性特性,而且还具有很强的关系特性。在此模型的基础上,引入FFCA(Fuzzy Formal Concept Analysis模糊形式概念分析)本体学习技术。该技术充分考虑D-VSM模型中数据分布特点,较好地解决本体学习通用性、本体关系获取等问题。基于上述方法实现一个本体学习工具,为本体的(半)自动构造提供有力的支持。综上,本文给出本体构造几个关键问题的研究:在文档分解模型基础上,建立一个本体数据源选取系统;在输入输出驱动模型的基础上,提出湿地保护领域WP-Onto本体手工构造方法;在两层向量空间模型的基础上,分析Web数据特点,并结合模糊FCA方法,实现一个本体学习工具。以本体数据源选取为基础,在本体手工构造和(半)自动构造两个方面进行有效的研究,取得了较好的结果。

【Abstract】 The research on ontology construction method is necessary for the widely ontology application,and plays a practical role and value in the conversion to the next generation of WWW.The current ontology construction research seldom focuses on the analysis of internal features in information source,but mostly on the process analysis of the method,so better application performance is difficult to achieve.Based on an intensive analysis of information source,this study propose a document decomposition model,an input-output model and a double vector space model,and these model integrate many intelligent methods,such as artificial neural networks and fuzzy formal concept analysis.Based on these results,manual and automatic construction methods are derived for ontology construction tools.The main research and results are as follows:(1) Information source is the kernel and critical for building domain ontology with regard to ontology quality and efficiency.Considerable progress has been achieved in this respect;yet, traditional method only takes the frequency or percentage of terms and concepts in the whole document into account,but do not take the location information into consideration,which leads to a low accuracy.Via an abstract method analysis,this paper constructs a document decomposition model and it firstly addresses the characteristics of information source,such as conception,relation and predictability;then,these characteristics weights are determined by the improved Vector Space Model(VSM),ontology relation distance and neural network respectively.Based on Java+Oracle technique,the study design and implement the information source selection system.By this system,the document weights are obtained by training a neural network with simulated data.Combined with a real document data set of "Wetland Protection",the model is tested and a good order effect on the document selection is attained.(2) Manual method for ontology construction in specific domain——Wetland protection domain ontology.The primal objective of the study is to build digitized wetland and realize knowledge management and information sharing.Wetland ontology is the basis to achieve this objective.The analysis result indicates that the current ontology techniques suffer from the many disadvantages,such as insufficiency demand,no planning,no formalization and ignorance of ontology sharing and reusing.To overcome these disadvantages,the study proposes the WP-Onto(Wetland Protection Ontology) Method.It begins with a demand analysis of wetland protection domain,followed by the building of an input-output driven model with an object of wetland resources.The method is then used to collect concepts and terms related to wetland protection,and generate respectively every knowledge set in the driven model;finally,it goes through refinement,extraction,and supplement before its establishment.Beside,the study also focuses on the application of wetland ontology,and it consists of information sharing and knowledge management.(3) Methodology of ontology building based on Web resources will not only shorten the constructive period of the ontology,but also extend the application field of the ontology.A lot of progress has been made,but there are still some difficulties,such as the web data extraction and knowledge acquisition.This paper focuses on the characteristics of ontology construction data,such as dynamics,largeness,variation and openness;the fundamental problem--formal representation method.This paper also concludes the key technique and difficulty of ontology construction.An initial system structure has been proposed,which provides a guideline for ontology construction based on web.(4) To build an efficient and accurate ontology learning tool,this paper proposed a double vector space model(DVSM) that developed from the classical single vector space model based on the object-oriented idea.The model has not only attribute characters but also strong relation characters.On the basis of this model,fuzzy formal concept analysis(FFCA) ontology learning technology is introduced because it considers the distributed property of data in the DVSM and is predominant to solve the problems about ontology continuity,ontology relationship obtainment,etC.An ontology learning tool has been implemented based on the method above, and it is a powerful support for automatic/semi-automatic ontology construction.In summary,the study presents several important research results on ontology construction: A method to build an ontology information source selection system based on the document decomposition model;A.manual ontology construction method——WP-Onto for wetland protection domain;An ontology learning tool based on the analysis of web data and combination of FFCA method.Based on the information source selection method,the study makes a useful investigation to manual and automatic ontology construction method,and the good results are obtained.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络