节点文献

可移植的稳健口语理解方法研究

Robust Spoken Language Understanding Across Domains and Languages

【作者】 吴尉林

【导师】 徐良贤; 陆汝占;

【作者基本信息】 上海交通大学 , 计算机软件与理论, 2007, 博士

【摘要】 口语对话系统(Spoken Dialogue System)的研究具有很强的理论意义和实际价值。口语理解(Spoken Language Understanding)是实现口语对话系统的关键技术之一。目前,口语理解主要面临两方面的挑战:稳健性(robustness),因为语音识别难免有错误,而且口语本身也往往是病态和不合语法的。可移植性(portability),当前对话系统中口语理解模块的开发往往需要大量手工工作(例如语义语法的编写),这构成了口语对话系统开发的主要瓶颈之一。因此,要缩短口语理解模块的开发周期、减少开发成本以及增强可移植性,关键是如何减少对手工工作的依赖,从而使整个开发过程自动化。本文提出了一种新的可移植的稳健口语理解方法。该方法基本上是数据驱动(data-driven)的,只需要简单标记的数据,这样保证了良好的可移植性。它能对口语进行深层理解,同时也能保持稳健性。论文的主要工作和创新点包括:本文提出了一个基于两阶段分类的口语理解框架。首先,第一阶段的分类器用来识别用户输入语句的主题,即主题分类(Topic Classi-fication)。接下来,识别的主题可用于帮助第二阶段的分类器抽取相应的语义槽/值对,即语义槽分类(Semantic Slot Classfication)。这两种分类器是可以自动学习的,而且只需要简单标记的训练数据。该框架既能保证对输入语句的深层理解,也能保持稳健性。利用一个稳健的基于图算法的局部分析器来对用户输入语句进行预处理。该局部分析器具有跳跃词和规则符的能力,这样从底层就保证了系统的稳健性。同时,为了避免跳跃能力带来的副作用,引入了内置的机器学习系统来进行剪枝和消歧。预处理使得数据标记形式更简单,能给主题分类提供深层的特征,还能减少语义槽分类器的数目。对于主题分类,考察了可用于主题分类的各种特征并且比较了它们的分类能力,并且利用多分类器相结合的方法来提高主题分类的精度。对于语义槽分类,把它建模为分类问题:首先利用文字上下文进行初始语义槽分类,然后检查语义槽的一致性,如有必要,再利用语义槽上下文进行重分类以纠正错误。本文比较了两种语义槽分类算法,即决策表和Winnow算法。为了进一步地减轻手工标记数据的工作,研究了上述两种分类器的弱监督训练方法:(1)采用了结合主动学习(active learning)和半监督学习(semi-supervised learning)来训练主题分类器的方法;(2)提出了一种实际的bootstrapping方法来训练语义槽分类器。这两种手段使得两阶段分类模型的训练只需要少量标记数据,而能利用较多的未标记数据来提高性能。最后,分别在两个不同领域和语种的语料库上对本文所提出的方法进行了实验验证。实验结果表明,本文方法在性能上优于已有的基于规则的方法,而跟其他新的数据驱动方法相当,但是能大大减少开发成本。

【Abstract】 Spoken dialogue interface has attracted extensive attention in both the research commu-nity and the commercial application due to its great theoretical and practical value. SpokenLanguage Understanding (SLU) is one of the key technologies for implementing spokendialogue systems.One challenge for spoken language understanding is the robustness problem since thespeech recognition error is inevitable and the spoken language is mostly grammatically in-correct or ill-formed. The other challenge is the portability issue. Currently, the developmentof spoken language systems relies often heavily on human effort, which is one of the mainbottlenecks for rapid development of spoken dialogue systems. For example, the linguisticexperts handcraft the semantic grammar for parsing. Therefore, the key issue is to reduce theneed for the manual works in the development of SLU systems and automate the whole pro-cess as much as possible, which helps to reduce the whole development cost and increasesthe portability of the spoken dialogue system.This dissertation proposes a robust and portable approach for spoken language under-standing. The advantage of the proposed approach is that it is mainly data-driven and requiresonly minimally annotated corpus for training while keeping the understanding robustness anddeepness of spoken language. The research works in this thesis include:This thesis proposes a novel spoken language understanding approach, which mainlyconsists of two successive classifiers. Firstly, the topic classifier is used to identifythe topic of an input utterance. With the restriction of the recognized target topic, thesemantic classifiers are trained to extract the corresponding slot-value pairs. The twokinds of classifiers can be automatically learned from minimally labelled training sen-tences. This SLU approach has good robustness for spoken language whilst keepingthe understanding deepness.A robust chart-based local parser is used to preprocess the input utterance to recognizethe concepts, which are relevant to the application domain. This robust local parser has the ability of skipping noise words or rule symbols ensuring that the SLU system hasthe low level robustness. To avoid the side-effect resulting from the skipping ability, amachine learning system is embedded into the parser for pruning. The preprocessingstep not only facilitates the labelling of training sentences but also reduces the numberof semantic slot classifiers.For the problem of topic classification, we investigate different kinds of features andcompare their corresponding performances. The strategy of combining diverse clas-sifiers is applied to improve the precision of topic classification. At the same time,the slot-filling task is also modelled as a classification problem so called semantic slotclassification. Initially, the literal context features are used for semantic slot classi-fication. Then, the consistency of the semantic slot in a sentence is checked. If theslots clash, the semantic slot re-classification is carried out to correct the misclassifiedslots. Two learning algorithms are employed for semantic slot classification, i.e., thedecision list and winnow algorithm.To further reduce the cost of labelling training utterances, weakly supervised learningtechniques are employed to train the topic and semantic classifiers. Firstly, the strategyof combining active learning and self-training is adopted to train the topic classifier.Secondly, a practical method is proposed for bootstrapping the topic-dependent seman-tic classifiers from a small amount of labelled sentences. The two weakly supervisedstrategies allow our SLU framework to begin with a small amount of labelled data andmake use of a larger amount of unlabelled data to improve the performance.The proposed SLU approach in this dissertation has been evaluated in different domainsand languages. The experimental results show that the performance of our system is betterthan the rule-based parser and comparable to the state-of-the-art data-driven SLU systems.Furthermore, our system requires less labelled data and hence significantly reduce the devel-opment cost.

节点文献中: