节点文献

音乐领域典型事件抽取技术的研究

Research on Typical Event Extraction Technology in the Field of Music

【作者】 宋凡

【导师】 秦兵;

【作者基本信息】 哈尔滨工业大学 , 计算机科学与技术, 2009, 硕士

【摘要】 事件抽取是信息抽取领域一个重要的研究方向。事件抽取主要把人们感兴趣的,用自然语言表达的事件以结构化的形式呈现出来,如什么人,什么地方,什么时间,做了什么事等,在自动文摘,自动问答以及信息检索等领域有着广泛的应用。本文关注音乐领域的事件抽取,选择了具有代表性的演唱会及专辑事件进行深入研究。本文借鉴ACE评测中事件抽取任务的相关概念以及构建语料库的一些经验,详细定义了音乐领域我们所关注的两类事件,并且构建了语料库,详细介绍了语料标注的来源、过程、标注规范以及存储格式等。本文对事件抽取的两项关键技术——事件类型识别以及事件元素识别采用不同的处理策略,简化了事件类型的识别过程,采用了基于关键词与触发词相结合的过滤方法。在事件元素识别中,如何从众多的实体中找出事件元素,成为本文研究的重点。本文提出了两种方法:基于模式匹配的事件元素识别,以及基于最大熵的事件元素识别。在总结前人三种事件表示模型的基础上,本文结合汉语的特点以及所采用句法分析模块的特点提出了一种基于简化依存句法树模式匹配的方法;基于最大熵的方法将事件元素识别问题看作分类问题,将所有出现的实体作为候选事件元素,选取上下文、邻近实体、句法结构等特征从不同的角度描述候选元素,并采用最大熵分类器对其进行二元分类。为了发挥各自方法的优点,将基于模式匹配的方法与基于最大熵分类的方法采用级联的方式串联起来形成最终事件元素识别的解决方案,在本文构建的语料库下,最终事件识别的平均F值达到83.84%,事件元素识别的平均F值达到76.41%,整个事件识别的平均F值达到67.31%。

【Abstract】 Event extraction is a very important research point in the area of information extraction. Event extraction can present the event which was describes by natural language through structural form, e.g. who, where, when and what is related to the event. And this technology can be widely applied to many NLP researches, such as summarization, question and answering, information retrieval and so on. This paper focuses on the event extraction in the music corpus and we choose concert event and album event as two representative events for our intensive research work.According to some concepts and corpus construction experience borrowed from the event extraction task in ACE, we defined two types of events in music domain and constructed a corpus. The detail descriptions on the source, the specification and process of annotation and storage format of the corpus are also presented in this paper.In this paper we adopt different strategies to deal with two key issues in event extraction: event type recognition and event argument recognition. The recognition of event type is greatly simplified in our proposed method which is based on the filtering of keywords and triggers.For the event argument recognition, our work is mainly concentrated on how to recognize the right argument among various entities. Two methods which are based on pattern matching and maximum entropy model separately are proposed is this paper. Inspired by three kinds of event representation model in previous work, our pattern matching method is based on a reduced dependency tree which fully utilizes some features specific to the language and parser model. In our binary classifier based on maximum entropy model, all entities occurred are considered as candidate arguments and are classified as true or false according to features including the context, the adjacent entities and the syntactic structure of the entity being considered. For the two methods to work together better, our final solution is formed by combining them in a sequential manner. Experiments on our corpus show we achieved an average F measure of 83.84%, 76.41% and 67.31% for type recognition, argument recognition and event recognition separately.

  • 【分类号】TP391.1
  • 【被引频次】11
  • 【下载频次】168
节点文献中: 

本文链接的文献网络图示:

本文的引文网络