节点文献

序列标注的在线算法研究

【作者】 高文君

【导师】 黄萱菁;

【作者基本信息】 复旦大学 , 计算机应用技术, 2011, 硕士

【摘要】 序列模型就是结构化模型中的一个经典模型,在自然语言处理、计算机视觉、生物信息学等领域得到了广泛的应用。对其模型及算法的研究和改进,具有重大的意义和实用价值。在过去的几年里,研究人员对序列模型的研究取得了一定的成果,但仍然存在很多值得探索的问题,本文就对序列模型的若干问题进行了深入的研究。首先,提出了基于标签对分类间隔最大化的方法,克服了传统优化分类间隔最大化方法,倾向于使整个序列间隔最大,而会忽略局部标签正确性的缺点。利用标签对分类间隔最大的准则来代替在线被动主动算法中参数更新过程的优化目标,得到一种新的在线算法。并且通过实验验证该算法相比原算法能够使序列标注的性能得到改进。其次,本文还对在线被动主动算法的并行化方法进行了研究,对在线被动主动算法的训练过程进行了修改,提出了并行化在线被动主动算法。并且从理论上证明了并行化的算法拥有同原算法相同的分类累积错误上界,还通过实验验证了并行算法的正确性,同时试验了该算法在分布式平台上能够达到一定的加速比,为利用多核平台的计算能力提供了一套可行有效的解决方案。本文研究了用于序列标注的在线主动被动算法相关的两个问题,提出了该模型和算法的改进,通过实验验证了改进后的算法获得了性能上的提升。

【Abstract】 Sequence model is a classical structured model, which has been widly ap-plied in many areas such as Nature Language Processing, Computer Vision and Bioinfomatics. In the past several years, researchers have achieved some results on sequence model studying. But there are still many problems to be explored. In this paper, some problems on sequence model are chosen to conduct in-depth study.First, we propose labelwise margin maximum method to overcome the short-comings of traditional method, which tends to maximum the margin of whole sequence and ignores the correctness of local labels. By using labelwise margin maximum criteria instead of margin maximum criteria in online passive aggressive algorithm, we get a new online algorithm called labelwise PA. Experiments show the new algorithm gets better performance.Secondly, we study the parallel solution of online passive aggressive algo-rithm. We propose parallel PA algorithm by modifying the training process of original online passive aggressive algorithm. We theoretically prove the parallel algorithm has same upper bound of cumulative erros with the original algorithm. We also verify the correctness of the parallel algorithm by experiments, while test-ing the speed ratio of the algorithm on distributed platform. This work provides an effective solution for the use of multi-core computing platform.In this paper, we study on two problems related to the online passive ag-gressive algorithm of sequence labeling. We proposed improvements in the model and algorithm. Experiments show that the improved algorithm obtains better performance.

  • 【网络出版投稿人】 复旦大学
  • 【网络出版年期】2012年 01期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络