节点文献

中文文本中事件时空与属性信息解析方法研究

Interpretation of Event Spatio-temporal and Attribute Information in Chinese Text

【作者】 张春菊

【导师】 吉根林; 张雪英;

【作者基本信息】 南京师范大学 , 地图学与地理信息系统, 2013, 博士

【摘要】 本文依托国家“863”课题“泛在空间信息关联更新与面向主题时空信息挖掘研究”,较为系统地探索中文文本中事件时空与属性信息解析方法,为泛在空间信息动态关联更新,全球统一时空框架下的空间信息与知识服务提供数据和技术支持,同时为事件时空模式挖掘奠定数据基础,进而为事件风险评估、公共安全等重大问题提供决策服务。本文针对中文文本中事件时空与属性信息描述的非结构化、定性化和不确定性等特点,围绕“文本描述-规范化表达-结构化抽取-可视化重构”的技术主线,重点研究事件时空与属性信息解析方法。主要研究内容与结论包括以下几个方面:(1)事件时空与属性信息的结构化表达:通过归纳总结中文文本中事件时空与属性信息描述的语言特征和语义结构,设计了事件时空与属性信息的知识表达框架和标注体系;以突发公共事件为例,以网络文本为数据源,基于GATE平台构建了中文文本中事件时空与属性信息标注语料库,为事件时空与属性信息抽取研究提供了标准化训练和测试数据。(2)事件时空与属性信息抽取:分析中文文本中时间信息描述的规律性,实现了基于触发词和规则模型结合的时间信息抽取、推理和规范化解析,准确率、召回率和F值分别达到75.00%、88.24%和40.54%;利用条件随机场模型和规则模型,实现了事件名称识别和空间位置(包括地名和空间关系)信息抽取,其中事件名称识别准确率、召回率和F值分别为82.08%、80.18%和81.12%;设计了基于Bootstrapping的事件属性信息抽取算法,量词性的属性信息抽取准确率和召回率达到80.80%和85.16%。(3)时空驱动的事件分类方法:通过分析事件时空认知和表达特性,提出一种融合时间、空间、属性、事件名称、触发词汇等多种上下文语义和语境信息的事件分类方法。按照句子、段落、篇章三个语言单元等级,探讨了事件替代性名称的推理方法。实验结果表明,事件分类准确率在封闭和开放测试中分别达到92.30%和80.60%。(4)事件时空信息匹配与可视化:以地名数据库为空间数据源,提出了定性时空信息(地名、空间关系和时间信息)的匹配和可视化表达方法,探索了基于“时间-空间-概念类型”多重一致性约束的主题事件判断和时空过程重构方法,实现了事件信息在时空信息系统中有机的、直观的可视化表达,并对事件时空信息分布模式进行了聚类分析。研究结果表明,采用规则模型和统计模型结合的方式可以有效实现中文文本中事件时空与属性信息抽取,但是特征项的设置在统计模型的学习过程中起到举足轻重的作用;不同类型事件的时间、地名、空间关系、事件名称和类型等信息抽取模型具有通用性和可移植性,而属性信息存在较大差异,需要针对具体类型事件构建相应知识库和学习模型;事件类型判断存在灵活、复杂、语义模糊、不确定性特点,且属于多标记分类,融合词性、触发词汇、时间、空间、属性和事件名称等多种上下文语义和语境信息,可以有效提高事件分类效果;空间数据的覆盖范围和数据质量,以及空间关系解析模型,对事件时空与属性信息匹配、时空过程重构性能具有较大的影响。

【Abstract】 This thesis is supported by the national "863" project "The Research of Associated Updating of the Ubiquitous Spatial Information and Mining of Subject-oriented Spatio-temporal Information". An interpretation approach of event spatio-temporal and attribute information in Chinese Text is explored in this thesis. The contributions will provide a data and technology support for the associated updating of the ubiquitous spatial information, the spatial information and knowledge services under an unified spatio-temporal framework, and the spatio-temporal mining analysis of event information. Furthermore, it will provide decision-making services for the event risk assessment, public safety, and the other major issues. In Chinese text, descriptions of event spatio-temporal and attribute information are unstructured, qualitative and uncertain. According to the above description characters, this research is carried out according to the main idea of "text description, normalization expression, structured extraction, visualization reconstruction" of event information in Chinese text. The main research contents and results are described as follows:(1) Structured expression of event spatio-temporal and attribute information in Chinese textWith an analysis of the linguistic features and semantic structures of event, spatial, temporal, attribute information described in Chinese text, a representation framework and annotation schema are identified and specified. Moreover, GATE (General Architecture for Text Engineering) is introduced as an annotation platform, and an annotated corpus based on the Web data source is developed in case of events of public emergencies. The annotation schema and annotated corpus will provide a standard training and testing data support for the extraction of event information.(2) Extraction of event spatio-temporal and attribute information in Chinese textBased on description regularities of temporal information in Chinese text, a interpretation approach is illustrated for extraction, reasoning and standardization of temporal information, which combines trigger words and rule-based model. The values of precision, recall and F-measure are75.00%,88.24%and40.54%respectively. Place names and event names are recognized with a Condition Random Field model, and spatial relations are extracted with a rule-based model. For the recognition of event names, the values of precision, recall and F-measure were respectively82.08%,80.18% and81.12%. Moreover, A Bootstrapping method is explored for the extraction of event attributes. For the quantitative attribute information, the values of precision, recall and F-measure can reach80.80%,85.16%respectively.(3) Automatic event classification based on spatio-temporal informationEvery event has temporal, spatial and attribute properties. A classification method of event information is developed which integrates contextual and semantic information. It emphasizes the spatial and temporal elements for event tracking, and discovers that feature items of trigger words, part of speech, place names, temporal information, event names and attributes have an important contribution for event classification. Moreover, some special phenomenons of abbreviation and alias are reasoned according to different language units, i.e. sentence, paragraph and chapter. The experiment results show that it can reach a classification accuracy of92.30%and80.60%in a closed and open testing respectively.(4) Matching and visualization of event spatio-temporal informationBased on the spatial data source of national gazetteer, a matching and visualization method for event information is presented. With a hierarchical matching of place names, spatial relations and temporal information, event information are expressed in a GIS spatio-temporal framework. Moreover, with a consistency constraint of "temporal information-spatial information-concept type", a judgement method of theme event, and the reconstruction of spatio-temporal process are presented. Finally, a clustering analysis of the spatio-temporal pattern for event information is finished.The studies proposed in this thesis suggest that the combination of rule-model and statistical model can effectively extract event information from Chinese text, however, reasonable and effective feature items play an important role in the learning process of statistical models. For different types of events, the extraction models of temporal information, place names, spatial relations, event names and event types are universal and transplantable, however, their attribute information are with many differences. Therefore, the knowledge base and learning model need to be modified for specific types of events. The judgement of event type is flexible, complex, semantic ambigous and uncertain, in other words it is a multi-label classification problem. This paper integrates the contextual and semantic information of part of speech, place names, temporal information, event names, attributes and trigger words, which can effectively improve the event classification performance. Among the Matching and visualization of spatio-temporal information, the coverage and quality of spatial data, as well as the interpretation model of spatial relations have a large impact on the performance. Overall, the proposed approach in this dissertation for the interpretation of spatio-temproal information, attributes and event classification in Chinese text is effective, but its integration with GIS is greatly depended on the mapping spatial data.

  • 【分类号】P208;TP391.1
  • 【被引频次】3
  • 【下载频次】438
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络