节点文献

自然语言处理中序列标注问题的联合学习方法研究

Research on Joint Learning of Sequence Labeling in Natural Language Processing

【作者】 李鑫鑫

【导师】 王轩;

【作者基本信息】 哈尔滨工业大学 , 计算机应用技术, 2014, 博士

【摘要】 序列标注问题是自然语言处理领域的基本问题之一,可以分为两类:单序列标注问题,即预测一个输出标签序列的序列标注问题;多序列标注问题,即预测多个输出标签序列的序列标注问题。对于多序列标注问题,一般采用级联学习方法来处理,这种方法将多序列标注问题当作多个单序列标注问题来逐一进行处理,往往存在错误传递、信息无法共享等缺点。而联合学习方法却能有效克服以上不足,它对多序列标注问题包含的多个单序列标注问题同时进行处理,能够促进问题间的信息交互。本文探析了不同类型的序列标注问题,对单序列标注方法和联合学习方法进行了研究,其中联合学习方法是本文的研究重点。具体的研究内容包括:第一、传统序列标注方法一般采用预测单元的邻近信息作为模型的特征,较少考虑序列中的全局信息,使得预测结果不够准确。针对这一问题,本文提出融合全局信息的级联重排序方法。对于单序列标注问题,级联重排序方法引入包含序列全局信息和句法信息的模型,首先,采用线性重排序方法将这些模型进行结合;然后,从这些模型的预测结果中提取特征来训练结构化感知器重排序方法的模型;最后,将线性重排序方法和结构化感知器重排序方法进行级联来选择最优标签序列。对于多序列标注问题,级联重排序方法能够使用单序列标注问题的全局信息和多个问题的信息,本文称之为级联重排序联合学习方法。实验结果表明:级联重排序方法提高了汉语音字转换问题和汉语语音识别问题的识别准确率,优于单个重排序方法;级联重排序联合学习方法在英语词性标注和组块分析问题上取得了优于级联学习方法和标签值结合方法的预测性能。第二、与单一学习方法相比,统一解析方法能通过在解析过程中将多个单一模型进行结合来提高预测性能。针对多序列标注问题,本文提出有监督和半监督的统一解析联合学习方法。有监督统一解析联合学习方法在解析过程中通过概率加权的方式来结合多个联合学习模型。在半监督统一解析联合学习方法中,首先采用两个联合学习模型对未标注语料进行标注,然后将两个模型预测的标签序列相同的语料作为新训练语料,最后使用原训练语料和新训练语料来训练半监督模型。将统一解析联合学习方法应用于中文分词和词性标注问题,实验结果表明:有监督统一解析联合学习方法优于单一有监督学习方法,半监督统一解析联合学习方法优于目前其他的半监督学习方法。第三、当多序列标注问题中各个单序列标注问题的训练集不一致时,不能采用级联重排序联合学习方法和统一解析联合学习方法来解决。针对这一问题,本文提出一种迭代联合学习方法,使多序列标注问题中的各个单序列标注问题通过特征传递的方式来交互信息。在迭代过程中,对于每个单序列标注问题,首先采用结构化感知器方法将基本模型和包含其他问题信息的模型进行集成,然后再采用该集成学习模型进行预测。英文词性标注和组块分析问题、中文分词和词性标注与名实体识别问题的实验结果表明了迭代联合学习方法的有效性。第四、传统中文序列标注方法采用字词等离散信息作为特征来训练模型,存在模型规模庞大和需要人工特征选择的不足。针对这个问题,本文首先提出一种基于词边界字向量的深度神经网络模型,并用于解决中文单序列标注问题。在模型的字向量表示层,将每个汉字输入表示为词边界字向量的组合;在模型的标签推导层,采用二阶标签转移矩阵来加强邻近标签之间的约束。然后,采用深度神经网络联合学习方法来处理中文多序列标注问题,该方法通过共享多个单序列标注模型的字向量表示层来促进问题间的信息交互。中文分词和词性标注与中文名实体识别的实验结果显示:基于词边界字向量的深度神经网络模型要优于基于基本字向量的模型,而采用深度神经网络联合学习方法能进一步提高模型的预测性能。最后,通过实验对论文提出的四种联合学习方法进行比较分析。

【Abstract】 Sequence labeling is one of the fundamental problems in natural language process-ing. In this thesis, we classify it into two categories: single sequence labeling problem(SSLP), which predicts one output label sequence; multiple sequence labeling problems(MSLPs), which predict several output label sequences. The cascaded approach treatsMSLPs as multiple SSLPs, and processes these SSLPs in pipeline. However, this ap-proach has the drawbacks that error propagation and the lack of information sharing areamong these multiple SSLPs. Joint learning approach can overcome these drawbacksby jointly processing multiple SSLPs in one model or in one framework. It efectivelyenhances information exchange among these SSLPs, and improves their prediction per-formance. This thesis discusses diferent types of sequence labeling problems, and studieson single sequence labeling approaches and joint learning approaches. Our main researchtopics include:1. Traditional sequence labeling approaches use neighboring information of theinput unit as features and usually lack global information, which tend to mistakenly anno-tate the unit. A cascaded reranking approach with global information fusion is proposedto solve this problem. For SSLP, the cascaded reranking approach brings several mod-els with sequence global information and syntactic information. First, a linear rerankingapproach is used to combine these models. Second, a structured perceptron rerankingapproach uses features extracted from these models to build the reranking model. Finally,the linear reranking approach and the structured perceptron reranking approach are cas-caded to choose the optimal output label sequence. For MSLPs, the cascaded rerankingjoint learning approach can employ global information in each SSLP and combination in-formation among these SSLPs. Experimental results show that the cascaded reranking ap-proach improves the recognition accuracy on Chinese pinyin-to-character conversion andMandarin speech recognition by incorporating part-of-speech and syntactic information,and the cascaded reranking joint learning approach outperforms the cascaded approachand the tag combination approach on English part-of-speech tagging and chunking.2. Compared with a single learning approach, joint decoding approach can integratediferent models in decoding phase, and improve the prediction performance. The the-sis proposes supervised and semi-supervised joint decoding approaches for MSLPs. The supervised joint decoding approach integrates diferent models with linear weights in de-coding phase, and the semi-supervised joint decoding approach selects the text annotatedsame by two models as new training sentences. Then the joint decoding approaches areused for Chinese word segmentation and part-of-speech tagging. Experimental resultsshow that the supervised joint decoding approach outperforms other single supervised ap-proaches, and the semi-supervised joint decoding approach outperforms the state-of-artsupervised approaches and semi-supervised approaches.3. The cascaded reranking joint learning approach and joint decoding approach can’tbe applied for MSLPs with inconsistent training data. An iterative joint learning approachis proposed to solve this problem, which allows each SSLP in MSLPs to share informa-tion with other SSLPs through feature propagation. In the iterative step, each problemuses a structured perceptron based ensemble method to combine the models using basicinformation and the models using the information in other problems. Experimental re-sults on English part-of-speech tagging and chunking, Chinese word segmentation andpart-of-speech tagging&named entity recognition show that the iterative joint learningapproach achieves a better performance than the pipeline approach, the tag combinationapproach and other ensemble methods.4. Traditional approaches for Chinese sequence labeling problems utilize discretelinguistic information as features. However, the scale of training model is greatly in-creased during the training process, and features for diferent problems need to be manu-ally chosen and tuned on development data. To solve this problem, a deep neural networkmodel with word boundary based character representation is proposed, and then appliedon Chinese single sequence labeling problem. In the character representation layer of thedeep neural network model, each Chinese character is converted to a combination of fourword boundary based character representations. In the tag inference layer, this deep neu-ral network model uses a second-order tag transition matrix to enhance tag constraints.Then, a deep neural network based joint learning approach is used for MSLPs to increaseinformation exchange among multiple SSLPs by sharing their character representationlayer. Experimental results on Chinese word segmentation and part-of-speech tagging&named entity recognition show that the deep neural network model with word boundarybased character representation outperforms the model with baseline character representa-tion, and the deep neural network based joint learning approach further improves singlemodels.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络