节点文献
体育领域信息抽取系统的研究
Research on the Information Extraction System in Sports Domain
【作者】 高国洋;
【作者基本信息】 华北电力大学(河北) , 通信与信息系统, 2010, 硕士
【摘要】 信息抽取作为一种自动化信息处理技术,已成为自然语言处理领域的研究热点。本文首先针对信息抽取中的两大关键技术命名实体识别和实体关系自动抽取进行了研究,提出了融合多知识的基于条件随机场的中文命名实体识别方法和针对体育领域的实体关系自动抽取方法;其次,在此基础上,基于统计与规则相结合的原则,针对体育领域提出并实现了赛事信息抽取系统,实验语料来自新浪和搜狐,实验证明本文提出的方法卓有成效,系统的准确率、召回率、和F-值分别达到了95.70%、93.00%和94.33%。
【Abstract】 Information extraction as an automated information processing technology interests many researchers in natural language processing. Firstly, Named entity recognition and relation extraction as the key technology of information extraction have been studied in this paper, a new approach is proposed to recognize entity based on conditional random fields, which fuses multiple knowledges, and a new approach is proposed to extract the entity relation in sports news based on conditional random fields. Secondly, the information extraction system in sports game news is designed and realized, which is mainly based on statistics and rules to extract sports game news. The experiments corpus comes from the www.sina.com and www.sohu.com. The experiments results show that the precision of system is 95.70%, the recall of system is 93.00% and the F-measure of system is 94.33%, which prove the validity of our approach.
【Key words】 information extraction; named entity recognition; entity relation extraction; condition random fields;
- 【网络出版投稿人】 华北电力大学(河北) 【网络出版年期】2011年 05期
- 【分类号】TP391.1
- 【下载频次】65
- 攻读期成果