节点文献

基于语义网络的知识协作关键技术研究

Research on the Key Techniques of Semantic Networks Based Knowledge Collaboration

【作者】 代印唐

【导师】 张世永;

【作者基本信息】 复旦大学 , 计算机应用技术, 2009, 博士

【摘要】 语义网络是一种重要的知识表示和知识处理方法。基于语义网络的知识智能在专家系统、商业智能、智能Agent、互联网语义搜索等多个方面都有广泛的应用。本论文以基于语义网络的知识协作作为应用背景,研究语义网络知识智能的紧密关联的三个方面的课题:语义网络复杂知识表示、语义网络知识智能处理、语义网络知识的自然语言输入和输出。首先,本论文研究了语义网络上的复杂知识表示,形式化定义了语义树、语义格、语义星等各种语义网络拓扑结构,提出了一种抽象关系结构模型和一种新的称为抽象语义网络的复杂知识表示方法。抽象语义网络的主要特点是在语义网络上建立一个子网到一个节点的关联映射,可以表示语义网络上的子网与节点、子网与子网之间的关系,从而可以灵活、方便、统一地表示事件、变化、规律、信念等复杂知识,并有望扩展到过程、计划的表示。在抽象语义网络的基础上,本论文研究了语义网络上的图变换的特殊要求,分析了过往的图文法和图重写实现技术,在算法图重写(algorithmic graph rewriting)图文法的基础上改进提出了一种新的图文法,称为通用图文法,很好适应了语义网络图变换的要求,并利用通用图文法来表示语义网络推理规则。其次,本论文研究了语义网络上基于图匹配和图变换的知识处理。研究分析了图匹配问题的特性,分析了语义网络上的图匹配问题的特殊要求,提出了一种新的标记图匹配快速算法,称为Graph Explorer(GE)算法。GE算法是一种非索引的基于树搜索的图匹配算法。但不同于传统树搜索图匹配以节点匹配为中心,GE算法针对语义网络作为一种标记图的这个特点,以边匹配为中心。GE算法将模式图转换为一条尽量连续的路径,这样即能够有效的制导匹配子状态搜索,减少无效状态访问,又能够通过状态继承大大减少标记匹配检查次数。GE算法还通过智能回溯来进一步有效避免无效状态访问。实验显示GE算法时间、空间性能较之类似算法,如著名的VF2算法,有较大提高。此外GE算法还将传统算法的递归函数调用改为搜索状态队列,从而避免了传统算法的堆栈溢出问题,健壮性也得到提升,将可处理的模式图的大小提升到10000节点。在GE算法与其他类似的基于树搜索的图匹配算法的性能对比测试实验中,本论文还提出一种新的利用搜索状态数目和标记检查次数作为此类图匹配算法的性能的衡量标准。该评价标准与平台和实现无关的,提高了这类算法之间的可比性。本论文将抽象语义网络、GE图匹配算法和基于通用图文法的图变换应用于语义网络的知识处理,实现了语义网络上的查询、推理、识别等知识智能功能。本论文研究了语义网络的知识合并与知识一致性维护问题,提出了一种语义网络合并时推理机制,同时提出了一种语义网络元素级知识度量方法,并尝试利用知识度量和知识贡献度来解决图变换的终结问题。接着,本论文研究了语义网络的抽取和语义网络自然语言理解,提出了一种新的面向语义网络自然语言处理的文法,称为语义网络文法(Semantic NetworksGrammar,SNG)。SNG文法通过语义模式实现语义星拓扑结构与节点序列的流化和抽取,通过转换生成和组合构造将语义树与语言节点序列之间的流化和抽取转换为多个语义星与语言节点序列之间的流化与抽取,从而实现了语义网络的语言理解和生成。SNG还通过模式网来确定语义模式应用的先后顺序,规定了语义网络语言理解中的结构歧义消解原则和语义网络语言生成中的语句良构性的标准。SNG还将词语作为一种语义模式。SNG文法将分词、文法分析和语义理解统一在一个单一的框架之中。然后,本文研究了文法分析的过程模型和消歧问题,分析了分类结构文法的共同点,剖析了基于分类结构文法的自然语言理解从符号流识别出语言结构和语义结构的过程,形式化定义了结构文法分析过程的语言格和语义格模型,指出了句法树与语言格的关系。本文根据格模型分析了文法分析中结构歧义和分类歧义产生的原因和性质,给出上下文消歧的概率解释,形式化分析了类型细分消歧的原理和有效性,并分析了细分类型消歧方案所引入的数据稀疏性问题。然后本文提出了面向语义网络文法的层级分类概率上下文无关文法分析方法和一种层级分类概率上下文相关文法分析方法。该层级分类方法能够充分利用语料库提供的知识,很好地平衡过分类和欠分类之间的矛盾。本文还针对结构文法分析的分类歧义提出了一种基于短语实例聚类的分类歧义消解方法和一种基于最大熵的局部分类消歧方法。最后,在宾州中文树库等语料库上的实验结果证明通过率与准确率比类似方法有所提高。本论文还研究了语义网络语言生成课题,将SNG文法应用于语义网络语言生成,提出了一种基于SNG语义网络语言文法的SNLG方案。提出了基于语义模式网路径搜索的语义网络语言生成内容规划和话语规划方法。本论文介绍了语义模式和语义模式网在SNLG的语句规划中的应用,设计了一种语义星生成自然语言的平凡化方法和一种基于模式网路径搜索为语义网络标定语义模式的方法。本文还提出了一种改进的基于距离的SNLG的内容规划方法,并提出了一种应用于SNLG话语规划的新的平凡化时拆分和拆分时汇聚(splitting time aggregation)方法,提高了生成的语句的流畅性和可读性。最后,论文综合应用上述研究成果,实现了一种新的语义网络中间件软件组件,并在该语义网络中间件的基础上设计了一种全语义化的语义Wiki系统来作为本论文理论和技术的演示和验证。

【Abstract】 Semantic Networks(SN)is a powerful knowledge representing and processing method.This thesis studies three tightly interrelated aspects of knowledge collaboration based on semantic networks:(a)SN based knowledge representation,(b) SN based knowledge processing,(c)inputing and outpuing between semantic networks and natural language.This thesis made following contributions:Firstly,the semantic networks based complex knowledge representation is studied.The types of knowledge were analyzed,and a categorization of knowledge is proposed.A new extension of semantic networks,called Abstract Semantic Network (ASN)is proposed.ASN provides an abstracting mapping between a subnetwork of a SN and a node of that SN.ASN can be used to represent complex knowledge types and relations,such as multi-view knowledge and inconsitant knowledge, transformational knowledge and procedural knowledge.Base on ASN,the graph transformation on semantic networks is inspected.And a new ASN based graph grammar,called the Universal Graph Grammar(UGG),is proposed.The main improvement of UGG is that it can represent the conversion between nodes and edges during graph transformation.And an ASN and UGG based representation of reasoning rules is proposed for reasoning on semantic networks.Secondly,the semantic networks based knowledge processing is sdudied.A fast labeled graph matching algorithm,called Graph Explorer(GE)algorithm,is presented. GE algorithm can be categorized into the non-indexed tree searching algorithms of determinismic graph/subgraph matching.It converts graph matching problem into a path search problem in the space of states of partially matched subgraphs.It avoids repeated label checking by using state tree structure to caching and fast visiting matched nodes and edge.By a carefully optimized searching path,GE algorithm avoided invalid search states in great deal and improved the performance to almost linear to the number of edges of pattern graph when the ambiguity is low.It employs a dynamic state queue to overcome the stack overflow problem of recursive call of traditional graph matching algorithms based on tree searching.And it can handle extra large semantic network with size up to 10,000 nodes.The analaysis and experiment show that the performance of GE is better than performance of similar algorithms. The graph mathching algorithm is applied to the recognition,reasoning and query on semantic networks.And a knowledge merging method is proposed also.Thirdly,the natural language processing for semantic networks is studied.A new grammar,called the Semantic Networks Grammar(SNG),is proposed to provision and guide the natural language generation from semantic networks and natural language understanding to semantic networks.The SNG presents a model of semantic topology,mainly including a semantic star and a semantic tree,as the intermediate structure of meaning,and it presents a model of language structure to describe the internal structure of language.It defines a semantic pattern to serialize and deserialize a semantic star.It uses transformative production to serialize and deserialize a semantic tree.Fourthly,the process model and disambiguation of grammar parsing are studied. A so called Lattice Model,which comprises a Semantic Lattice and a Langugage Lattice,is proposed to describe the parsing process of structural grammars.After analysis of the root causes of grammar parsing ambiguities,a Hierarchical Classification(HC)approach is proposed to response to both the under-classification and over-classification problems.The HC method improved conventional Head-Driven PSG into Classified Phase Structure Grammar(HC-PSG)by replacing flat features space and flat rule set of PSG with a classification hierarchy and a hierarchical rule set.Then the conventional PCFG is upgraded to a Hierarchically Classified Probabilistic Context-Free Grammar(HC-PCFG)to provide basic disambiguation.HC-PCFG uses an approach of pattern cluster to resolve the ambiguous rules,uses a Maximum-Entropy Local Disambiguation to eliminate invalid branches as early as possible.The Hierarchical Classification(HC)method is extended to the context disambiguation.The result of experiments on Peen Chinese Treebank proved the effectiveness of the HC method.A Semantic Network Language Generation(SNLG)solution base on the SNG is proposed.The SNLG provides a trivialization procedure to convert an arbitrary semantic star into a trivial semantic tree to be serialized by SNG.During content planning,an improved distance-based context planning approach is proposed.For discourse planning,a trivialization time splitting method is presented to make well-formed sentence,and a splitting time aggregation method is proposed to improve the readability of sentence.As verification and demonstration of the theories and technologies of this thesis, finally,a semantic network middleware,called Knoware,had been implemented to overally apply above theories and technologies.A fully semantized semantic wiki system,called NaturalWiki,is implemented based on Knoware to verify and demonstrate all theories and methods of this thesis.

  • 【网络出版投稿人】 复旦大学
  • 【网络出版年期】2009年 12期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络