节点文献

汉语文本中突发事件因果关系抽取方法研究

Research on Emergency Causality Extraction from Chinese Corpus

【作者】 裘江南

【导师】 王延章;

【作者基本信息】 大连理工大学 , 管理科学与工程, 2012, 博士

【摘要】 突发事件作为一个复杂系统,对其定性建模首先要分析内部各要素之间的因果关系,这是建立其他突发事件预测和仿真模型的基础。然而,基于专家知识的方法中因果关系的获取采用向领域专家发放问卷和访谈的方法,存在耗时、耗力和操作性差等局限性。而基于数据的方法需要依赖于一定规模的和完备的数据样本,而应急领域许多突发事件的数据往往存在没有系统积累、缺乏完整性和连续性等问题。但伴随着我国各级政府应急管理机制建设和学术研究的不断深入,形成了海量的有关突发事件的文本资源。这些文本资源中蕴含了大量有关突发事件演化规律的定性知识,特别是能反映各类突发事件系统中要素间的因果关系,这就是本文所指突发事件因果关系。这些文本可代替专家和数据成为突发事件因果关系的来源。因此,如何从应急领域文本中抽取突发事件内部要素间的因果关系,并建立突发事件因果关系模型是需要亟待解决的科学问题。针对汉语文本中因果关系抽取方法在国内外未进行系统研究和缺乏有效的抽取于文本的因果关系的集成方法的问题。本文利用突发事件应急管理中积累的文本知识源,围绕应急领域汉语文本中突发事件因果关系抽取方法这一核心科学问题探索基于多文本因果关系抽取的突发事件因果关系集成方法。本文针对上述问题进行下列研究工作:(1)突发事件因果关系模型研究。首先通过对突发事件系统共性要素的分析,明确了突发事件共性特征。然后,采用系统工程方法构建了突发事件的因果关系模型,进而对其因果关系进行了分析。以突发事件的输入、状态和输出要素集为基础,建立了可扩展的突发事件因果关系模型,明确了突发事件内部要素间的因果结构,为从文本中抽取的突发事件因果关系提供表示模型。(2)汉语文本中因果句法模式的归纳与显式因果关系抽取方法研究。应急领域文本中显式因果关系是突发事件因果关系的重要来源,针对汉语文本中因果关系抽取方法在国内外未进行系统研究的问题。首先,将汉语文本中的显式因果句区分为明确因果句和模糊因果句,基于汉语语法对汉语中的显式因果句归纳出的五种因果句法模式,进而提出因果句抽取匹配规则和因果句法模式匹配方法。然后,研究了基于朴素贝叶斯方法的模糊因果句分类模型。最后,针对分类后的因果句提出因果关系抽取方法,通过实验取得了较好的效果。其创新点在于归纳出汉语文本中的五种显式因果句法模式,系统揭示了汉语文本中因果句的基本表达方式,对计算机辅助汉语文本中因果关系抽取理论的进一步完善。提出的区分模糊因果句和明确因果句的基于因果句法模式的显式因果关系抽取方法,解决了当前文本中因果关系抽方法中不区分模糊因果句的局限性。(3)汉语文本中的隐式因果关系抽取方法研究。汉语文本中的隐式因果关系也是应急领域文本中突发事件因果关系的重要来源之一。通过对应急领域汉语文本的隐式因果关系特征分析的基础上,基于概念实体研究文本中隐式因果关系的抽取方法。首先,对预处理后的句子中的概念生成其频繁概念集,进而对概念频繁集进行因果性分析,最后对因果成分进行判别。其创新点在于将哲学和概率统计学中的因果理论与语言学结合的方法,基于Hume和Suppes等的因果关系理论对关联分析方法中的置信度计算方法进行了改进,从时间优先、因果性概率和因果性依赖等方面综合考虑设计了因果性评价函数和因果成分的判别方法,解决了关联分析方法不能完全适用于文本因果关系的挖掘的问题,为文本中隐式因果关系的抽取提供了一种基于因果关系理论的新方法。(4)多文本中冗余、冲突和稀疏的突发事件因果关系的集成方法研究。针对从应急领域文本抽取的因果关系具有冗余、冲突和稀疏的特点,以及独立因果关系无法形成对突发事件整体认知的问题。研究多个文本在个体层面的因果认知融合为反映突发事件全局因果认知的集成方法。首先研究基于向量空间模型的领域文本筛选方法,然后基于D—S证据理论并兼顾文本的领域特点对来自多文本的因果关系的集成方法进行了研究。其创新点在于提出了基于D—S证据理论和兼顾领域文本质量的多文本因果关系集成方法,消解了多文本因果关系中的稀疏、冗余和冲突问题,克服单文本对突发事件的描述存在偏差与不足,使基于文本中抽取的因果关系建立的突发事件因果关系模型能真实全面的反映突发事件的内部要素间的因果关系。为突发事件贝叶斯网络结构的生成提供了一种基于文本挖掘的新方法。一方面通过消解冲突和冗余信息达成突发事件中因果关系认识的一致,另一方面通过利用互补信息实现突发事件因果关系的完整认识。

【Abstract】 As Emergency is a complex system, important task of its qualitative modeling is to analyze causality among various elements within Emergency system, which is the basis work for establishing simulation model or prediction model of emergencies. However, the expert-based acquiring causality methods in modeling of emergency are questionnaire and interview, which there is a time-consuming, labor-intensive and poor operational limitation. And data-based Method depends on a certain scale and complete data samples, while there are no data accumulation, lack of integrity and continuity of data in many emergency areas.But with emergency management mechanism construction in China and academic research continued to deepen, a huge amount of text resources about the emergency are accumulated. These texts contains a lot of emergency evolution law knowledge, especially reflects casuality among factors in emergeny system, which is named emergency cauality. These texts can replace experts and data as the emergency causal knowledge sources. So how to extract causal knowledge of internal elements in emergency from text, and establish emergency causality model is needed to be solved urgently. Extraction causality method from Chinese corpus has been little studied and efficient integration approach of causality extracted from Chinese corpus was not found. Therefore, Method of causality extraction from Chinese corpus and integration of emergency causality extracted from multiple texts were research in this dessertation based on previous accumulated domestic and foreign research results. Researches have been done are as follow in the dissertation.(1) Emergeny causal model have been built. Firstly, common characters of emergency were learned based on analysis of emergeny. Secondly, Emergency causal model was built using system engineering method, and causality in Emergency causal model was analyzed.Its innovation is that scalable emergency causality model are built based on common element of emergency system’s inputs, states and outputs, and causality among elements in emergency and causal relation between emergency are clear, which can provide a representation model for cauality extracted from text.(2) Causal syntactic patterns in Chinese were discoved and method of explicit causality extraction was research in the Chinese corpus. Explicit causality is an important source of causality in emergency text, and method of extraction causality from Chinese corpus has little researched. Firstly, Chinese sentences were divided into explicit causal sentences and ambiguity causal sentences, and five kinds of Causal syntactic patterns were discoved based on Chinese grammar, and matching rules of causal sentence and match method of Causal syntactic patterns were proposed. And then, ambiguity causal setences classification model was researched based on Naive Bayes in the Chinese corpus. Finally, method of causality extraction was proposed and a good result was achieved in experiments. Its innovation is that five kinds of causal syntactic patterns were discoved, and basic expressions of causal sentence in the Chinese were discovered, and computer aided Chinese causality extraction theory were improved. Causal syntactic patterns based method of explicit causality extraction was porposed to solved the limit of traditional method not distinguish ambiguity causal sentences.(3) Method of implicit causality extraction from Chinese corpus was researched. Implicit causality is also another important source of causality in emergency text. Based on analysis of charater of Chinese corpus implicit causality in emergency field, implicit causality extraction method was researched by concept entity. First frequent concept set was generated from concepts in preprocessed sentences, and then causality of frequent concept set was analysed. Finally causal and effect were judged. Its innovation is that implicit causality extraction method proposed is a method integrating with linguistics and causal theory from philosophy and statistics. Based on causal theory of Hume and Suppes, calculation of confidence in method of association analysis has been improved. Design of causality evaluation function and causal component judgment method was considerated from time of priority, causal probabilistic and causal dependence.which solved the problem method of association analysis is not fitted in cauality mining from text. The method of implicit causality extraction proposed is a novel method based on casual theory.(4) Integration method of redundancy, conflict and sparse emergency causality was researched. Casual relation extracted from text may be redundancy, conflict and sparse, to solve the problem integration method of individual causal cognition from one text was researched to global causal cognition of emergency. Firstly, based on the vector space model field texts select method was researched. And then D-S evidence theory based integration method of causality extrated from multiple texts was studied. Its innovation is that D-S evidence theory based integration method of causality extrated from multiple texts was proposed, and problem of sparse, conflict and redundant causalitys was eliminated. Emergency causality model generated by method proposed can truly reflect internal causality among elements of emergency to overcome deficiencies of single text for emergency described. Generation of bayesian network structure for emergency provides a new method based on text mining. On one hand common casuality cognition of emergency was achieved by elimination of conflict and redundant causalitys, on the other hand overall casuality cognition of emergency was achieved by complementary of causality.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络