节点文献
基于条件随机场模型的中文人名识别的研究
A Study on Chinese Personal Name Recognition Based on Conditional Random Fields
【作者】 王东亮;
【导师】 黄德根;
【作者基本信息】 大连理工大学 , 计算机软件与理论, 2010, 硕士
【摘要】 中文人名识别是中文命名实体识别(NER)的一个重点工作,广泛应用于信息检索、信息抽取、机器翻译等领域。中文人名在命名实体中占有很大的比重,并且由于中文人名结构的复杂性和形式的多样性,中文人名识别一直是中文信息处理领域的一个难点。本文在前人工作的基础上,采用条件随机场(Conditional Random Fields, CRFs)模型,并利用篇章信息,来完成中文人名识别的任务。本文的主要工作和特点介绍如下:(1)详细介绍了条件随机场模型,并讨论了本模型相比其他机器学习模型的特点。CRFs模型是当前比较优秀的条件概率模型,它既克服了生成模型的独立性假设,同时避免了有向图模型的标记偏执问题,并具有这两种模型的优点。(2)由于中文人名可能在同一篇语料中多次出现,但是同一人名在不同的位置具有不同的上下文环境,因此对于上下文信息比较充足的人名很容易通过模型进行召回,但是对于上下文信息不足的人名可能被漏识别。本文基于篇章信息,将通过CRFs模型识别出来的人名提取出来作为人名词典,进行第二次人名识别,进一步提高中文人名识别的效率。本文的研究成果同样适用于中文地名和机构名等其他命名实体的识别,实验证明本文提出的方法是有效的。
【Abstract】 Chinese Personal Name Recognition (CPNR) plays an important role in Named Entity Recognition (NER) task; it is usually used in information retrieval, information extraction and machine translation and so on. Chinese personal names account for a large proportion in named entities, and it is always a difficulty of Chinese Natural Language Processing (CNLP) due to complexity of construction and diversity of form.This paper based on the previous works of others, completes this task with CRFs model. In order to improve the performance of our system, we introduce the proliferation based on discourse. The main works of this paper are as follows:(1) Give a detail description of CRFs model, and compare this model with other machine-learning models. CRFs model is a very excellent conditional probability model. It not only overcomes the independence assumption of generation models, but also settles the label-bias problem of directed graph models. It inheritances advantages from both type of models in addition.(2) CPNs maybe appear many times in the same corpus, but have different context information. The CPNs which have strong context information are sure more easy to recalled than the others. Based on discourse, this paper constructed a dictionary with personal names extracted from the results of CRFs model. In order to improve the performance of our system, we implement a second recognition of personal names.The research of this paper can also be provided to recognize Chinese location names and organization names. Experimental results prove that our method is effective.