节点文献
构建语义Web中文本体的粗糙概念格方法
A Method of Building Semantic Web Chinese Ontology Based on Rough Concept Lattice
【作者】 侯亚南;
【导师】 黄映辉;
【作者基本信息】 大连海事大学 , 计算机科学与技术, 2010, 硕士
【摘要】 语义Web本体是支持语义Web实际运行的知识库,它形式地定义了领域内共同认可的知识以及知识之间的关系,具体表示为领域内共同认可的概念以及概念之间的关系。概念格形式化地定义了概念以及概念之间的关系。虽然概念格和本体同为知识库,但前者立足于数学上的严谨,后者则追求应用上的便利。“实际领域→概念格→本体”的开发路线有利于构建出高质量的语义Web本体。现有的本体绝大部分都为英文形式,没有功能完整的构建中文本体的工具。现实世界是不确定的,人类对其的认知势必表现为模糊性和粗糙性,粗糙概念即是人类认知的基本形式之一。粗糙概念格是一种表示形式为粗糙概念以及粗糙概念之间关系的知识库。“实际领域→粗糙概念格→中文本体”,进而将之应用于支持更接近现实世界的语义Web的实际运行,是本论文的研究目标与技术路线。形式背景决定了所构建粗糙概念格的格结构的复杂程度,本论文对从数据源中提取的形式背景进行了重点研究。首先,提取形式背景。本论文基于实用性的考虑将无结构的中文文本作为形式背景的数据源。为了提取简洁的形式背景,提出了相似词集集合的概念以改进单一词汇所带来的冗余。其次,约简形式背景。对概念格对应的形式背景约简的算法加以扩展。进而,抽取粗糙形式概念。在约简的形式背景的基础上抽取粗糙形式概念。继之,构建粗糙概念格。由于粗糙形式概念与形式概念元组的个数不同,研究扩展概念格的构建算法,实现粗糙概念格的构建。然后,转化粗糙概念格生成语义Web中文本体。在研究形式概念分析应用于构建本体的基础上,对粗糙概念格进行处理生成本体原型,并采用本体描述语言对本体进行描述,实现粗糙概念格向语义Web中文本体的转化。最后,进行实例验证。以搜狗实验室提供的214篇交通类中文常用文本作为数据源,验证了语义Web中文本体构建的粗糙概念格方法的可行性和实用价值。
【Abstract】 Semantic Web ontology is a knowledge base to support the actual running of semantic Web, which formally defines the widely acknowledged knowledge and the relationships between knowledge in a domain. It is concretely expressed as the widely acknowledged concepts and the relationships between them in the domain. Concept lattice formally defined the concepts and the relationships between them. Although both the concept lattice and the ontology are knowledge bases, however, the former stands on mathematical rigorous feature, and the latter pursues the convenience of application. It is beneficial to build high-quality Semantic Web ontology that "practical domain→concept lattice→ontology" as the developing route.Most existing available ontologies are in English form, and there is no effective tool with integrated functions. The cognition to real world of human which is uncertain will certainly show fuzziness and roughness, and the rough concept is one of the basic forms of human cognition. Rough concept lattice is a knowledge base which is expressed as rough concepts and the relationships between them. Moreover the "practical domain→rough concept lattice→Chinese ontology" is applied to support the practical operation of the Semantic Web more closely to real-world, which is the research target and technical route of this paper. Formal context determines the complex degree of rough concept lattice, so this paper mainly researches formal context extracted from data source.First, extract formal context. This paper will take unstructured Chinese texts as the data source of formal context considering the practicality. To extract a concise formal context, the set of similar word set was proposed to reduce the redundancy caused by a single word.Second, reduce formal context. The reduction algorithm of formal context that corresponding to concept lattice is extended.Third, extract rough formal concepts. Rough formal concepts are extracted based on formal context reduced.Fourth, build rough concept lattice. Because the numbers of tuples of rough formal concept and formal concept are different, to achieve rough concept lattice building, the building algorithms of the concept lattice are researched and extended.Fifth, transform rough concept lattice forming semantic Web Chinese ontology. Based on the study of formal concept analysis applied to ontology building, rough concept lattice is processed to form the ontology prototype, and ontology description language is used to describe ontology, so as to transform rough concept lattice to semantic Web Chinese ontology.Finally, carry out example verification. The feasibility and practical value of the method of building semantic Web Chinese ontology based on rough concept lattice are verified by using 214 Chinese texts in traffic classes from Sogou Labs as the data source.
【Key words】 Semantic Web Chinese Ontology; Rough Concept Lattice; Rough Formal Concept; Concept Lattice; Formal Context;