èŠ‚ç‚¹æ–‡çŒ®

å¯ç§»æ¤çš„ç¨³å¥å£è¯ç†è§£æ–¹æ³•ç ”ç©¶

Robust Spoken Language Understanding Across Domains and Languages

åˆ†é¡µä¸‹è½½
åˆ†ç« ä¸‹è½½
æ•´æœ¬ä¸‹è½½
åœ¨çº¿é˜…è¯»
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€ä½œè€…ã€‘ å´å°‰æž—ï¼›

ã€ä½œè€…åŸºæœ¬ä¿¡æ¯ã€‘ ä¸Šæµ·äº¤é€šå¤§å¦ ï¼Œ è®¡ç®—æœºè½¯ä»¶ä¸Žç†è®ºï¼Œ 2007ï¼Œ åšå£«

ã€æ‘˜è¦ã€‘ å£è¯å¯¹è¯ç³»ç»Ÿ(Spoken Dialogue System)çš„ç ”ç©¶å…·æœ‰å¾ˆå¼ºçš„ç†è®ºæ„ä¹‰å’Œå®žé™…ä»·å€¼ã€‚å£è¯ç†è§£(Spoken Language Understanding)æ˜¯å®žçŽ°å£è¯å¯¹è¯ç³»ç»Ÿçš„å…³é”®æŠ€æœ¯ä¹‹ä¸€ã€‚ç›®å‰,å£è¯ç†è§£ä¸»è¦é¢ä¸´ä¸¤æ–¹é¢çš„æŒ‘æˆ˜:ç¨³å¥æ€§(robustness),å› ä¸ºè¯éŸ³è¯†åˆ«éš¾å…æœ‰é”™è¯¯,è€Œä¸”å£è¯æœ¬èº«ä¹Ÿå¾€å¾€æ˜¯ç—…æ€å’Œä¸åˆè¯æ³•çš„ã€‚å¯ç§»æ¤æ€§(portability),å½“å‰å¯¹è¯ç³»ç»Ÿä¸å£è¯ç†è§£æ¨¡å—çš„å¼€å‘å¾€å¾€éœ€è¦å¤§é‡æ‰‹å·¥å·¥ä½œ(ä¾‹å¦‚è¯ä¹‰è¯æ³•çš„ç¼–å†™),è¿™æž„æˆäº†å£è¯å¯¹è¯ç³»ç»Ÿå¼€å‘çš„ä¸»è¦ç“¶é¢ˆä¹‹ä¸€ã€‚å› æ¤,è¦ç¼©çŸå£è¯ç†è§£æ¨¡å—çš„å¼€å‘å‘¨æœŸã€å‡å°‘å¼€å‘æˆæœ¬ä»¥åŠå¢žå¼ºå¯ç§»æ¤æ€§,å…³é”®æ˜¯å¦‚ä½•å‡å°‘å¯¹æ‰‹å·¥å·¥ä½œçš„ä¾èµ–,ä»Žè€Œä½¿æ•´ä¸ªå¼€å‘è¿‡ç¨‹è‡ªåŠ¨åŒ–ã€‚æœ¬æ–‡æå‡ºäº†ä¸€ç§æ–°çš„å¯ç§»æ¤çš„ç¨³å¥å£è¯ç†è§£æ–¹æ³•ã€‚è¯¥æ–¹æ³•åŸºæœ¬ä¸Šæ˜¯æ•°æ®é©±åŠ¨(data-driven)çš„,åªéœ€è¦ç®€å•æ ‡è®°çš„æ•°æ®,è¿™æ ·ä¿è¯äº†è‰¯å¥½çš„å¯ç§»æ¤æ€§ã€‚å®ƒèƒ½å¯¹å£è¯è¿›è¡Œæ·±å±‚ç†è§£,åŒæ—¶ä¹Ÿèƒ½ä¿æŒç¨³å¥æ€§ã€‚è®ºæ–‡çš„ä¸»è¦å·¥ä½œå’Œåˆ›æ–°ç‚¹åŒ…æ‹¬:æœ¬æ–‡æå‡ºäº†ä¸€ä¸ªåŸºäºŽä¸¤é˜¶æ®µåˆ†ç±»çš„å£è¯ç†è§£æ¡†æž¶ã€‚é¦–å…ˆ,ç¬¬ä¸€é˜¶æ®µçš„åˆ†ç±»å™¨ç”¨æ¥è¯†åˆ«ç”¨æˆ·è¾“å…¥è¯å¥çš„ä¸»é¢˜,å³ä¸»é¢˜åˆ†ç±»(Topic Classi-fication)ã€‚æŽ¥ä¸‹æ¥,è¯†åˆ«çš„ä¸»é¢˜å¯ç”¨äºŽå¸®åŠ©ç¬¬äºŒé˜¶æ®µçš„åˆ†ç±»å™¨æŠ½å–ç›¸åº”çš„è¯ä¹‰æ§½/å€¼å¯¹,å³è¯ä¹‰æ§½åˆ†ç±»(Semantic Slot Classfication)ã€‚è¿™ä¸¤ç§åˆ†ç±»å™¨æ˜¯å¯ä»¥è‡ªåŠ¨å¦ä¹ çš„,è€Œä¸”åªéœ€è¦ç®€å•æ ‡è®°çš„è®ç»ƒæ•°æ®ã€‚è¯¥æ¡†æž¶æ—¢èƒ½ä¿è¯å¯¹è¾“å…¥è¯å¥çš„æ·±å±‚ç†è§£,ä¹Ÿèƒ½ä¿æŒç¨³å¥æ€§ã€‚åˆ©ç”¨ä¸€ä¸ªç¨³å¥çš„åŸºäºŽå›¾ç®—æ³•çš„å±€éƒ¨åˆ†æžå™¨æ¥å¯¹ç”¨æˆ·è¾“å…¥è¯å¥è¿›è¡Œé¢„å¤„ç†ã€‚è¯¥å±€éƒ¨åˆ†æžå™¨å…·æœ‰è·³è·ƒè¯å’Œè§„åˆ™ç¬¦çš„èƒ½åŠ›,è¿™æ ·ä»Žåº•å±‚å°±ä¿è¯äº†ç³»ç»Ÿçš„ç¨³å¥æ€§ã€‚åŒæ—¶,ä¸ºäº†é¿å…è·³è·ƒèƒ½åŠ›å¸¦æ¥çš„å‰¯ä½œç”¨,å¼•å…¥äº†å†…ç½®çš„æœºå™¨å¦ä¹ ç³»ç»Ÿæ¥è¿›è¡Œå‰ªæžå’Œæ¶ˆæ§ã€‚é¢„å¤„ç†ä½¿å¾—æ•°æ®æ ‡è®°å½¢å¼æ›´ç®€å•,èƒ½ç»™ä¸»é¢˜åˆ†ç±»æä¾›æ·±å±‚çš„ç‰¹å¾,è¿˜èƒ½å‡å°‘è¯ä¹‰æ§½åˆ†ç±»å™¨çš„æ•°ç›®ã€‚å¯¹äºŽä¸»é¢˜åˆ†ç±»,è€ƒå¯Ÿäº†å¯ç”¨äºŽä¸»é¢˜åˆ†ç±»çš„å„ç§ç‰¹å¾å¹¶ä¸”æ¯”è¾ƒäº†å®ƒä»¬çš„åˆ†ç±»èƒ½åŠ›,å¹¶ä¸”åˆ©ç”¨å¤šåˆ†ç±»å™¨ç›¸ç»“åˆçš„æ–¹æ³•æ¥æé«˜ä¸»é¢˜åˆ†ç±»çš„ç²¾åº¦ã€‚å¯¹äºŽè¯ä¹‰æ§½åˆ†ç±»,æŠŠå®ƒå»ºæ¨¡ä¸ºåˆ†ç±»é—®é¢˜:é¦–å…ˆåˆ©ç”¨æ–‡å—ä¸Šä¸‹æ–‡è¿›è¡Œåˆå§‹è¯ä¹‰æ§½åˆ†ç±»,ç„¶åŽæ£€æŸ¥è¯ä¹‰æ§½çš„ä¸€è‡´æ€§,å¦‚æœ‰å¿…è¦,å†åˆ©ç”¨è¯ä¹‰æ§½ä¸Šä¸‹æ–‡è¿›è¡Œé‡åˆ†ç±»ä»¥çº æ£é”™è¯¯ã€‚æœ¬æ–‡æ¯”è¾ƒäº†ä¸¤ç§è¯ä¹‰æ§½åˆ†ç±»ç®—æ³•,å³å†³ç–è¡¨å’ŒWinnowç®—æ³•ã€‚ä¸ºäº†è¿›ä¸€æ¥åœ°å‡è½»æ‰‹å·¥æ ‡è®°æ•°æ®çš„å·¥ä½œ,ç ”ç©¶äº†ä¸Šè¿°ä¸¤ç§åˆ†ç±»å™¨çš„å¼±ç›‘ç£è®ç»ƒæ–¹æ³•:(1)é‡‡ç”¨äº†ç»“åˆä¸»åŠ¨å¦ä¹ (active learning)å’ŒåŠç›‘ç£å¦ä¹ (semi-supervised learning)æ¥è®ç»ƒä¸»é¢˜åˆ†ç±»å™¨çš„æ–¹æ³•;(2)æå‡ºäº†ä¸€ç§å®žé™…çš„bootstrappingæ–¹æ³•æ¥è®ç»ƒè¯ä¹‰æ§½åˆ†ç±»å™¨ã€‚è¿™ä¸¤ç§æ‰‹æ®µä½¿å¾—ä¸¤é˜¶æ®µåˆ†ç±»æ¨¡åž‹çš„è®ç»ƒåªéœ€è¦å°‘é‡æ ‡è®°æ•°æ®,è€Œèƒ½åˆ©ç”¨è¾ƒå¤šçš„æœªæ ‡è®°æ•°æ®æ¥æé«˜æ€§èƒ½ã€‚æœ€åŽ,åˆ†åˆ«åœ¨ä¸¤ä¸ªä¸åŒé¢†åŸŸå’Œè¯ç§çš„è¯æ–™åº“ä¸Šå¯¹æœ¬æ–‡æ‰€æå‡ºçš„æ–¹æ³•è¿›è¡Œäº†å®žéªŒéªŒè¯ã€‚å®žéªŒç»“æžœè¡¨æ˜Ž,æœ¬æ–‡æ–¹æ³•åœ¨æ€§èƒ½ä¸Šä¼˜äºŽå·²æœ‰çš„åŸºäºŽè§„åˆ™çš„æ–¹æ³•,è€Œè·Ÿå…¶ä»–æ–°çš„æ•°æ®é©±åŠ¨æ–¹æ³•ç›¸å½“,ä½†æ˜¯èƒ½å¤§å¤§å‡å°‘å¼€å‘æˆæœ¬ã€‚æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ Spoken dialogue interface has attracted extensive attention in both the research commu-nity and the commercial application due to its great theoretical and practical value. SpokenLanguage Understanding (SLU) is one of the key technologies for implementing spokendialogue systems.One challenge for spoken language understanding is the robustness problem since thespeech recognition error is inevitable and the spoken language is mostly grammatically in-correct or ill-formed. The other challenge is the portability issue. Currently, the developmentof spoken language systems relies often heavily on human effort, which is one of the mainbottlenecks for rapid development of spoken dialogue systems. For example, the linguisticexperts handcraft the semantic grammar for parsing. Therefore, the key issue is to reduce theneed for the manual works in the development of SLU systems and automate the whole pro-cess as much as possible, which helps to reduce the whole development cost and increasesthe portability of the spoken dialogue system.This dissertation proposes a robust and portable approach for spoken language under-standing. The advantage of the proposed approach is that it is mainly data-driven and requiresonly minimally annotated corpus for training while keeping the understanding robustness anddeepness of spoken language. The research works in this thesis include:This thesis proposes a novel spoken language understanding approach, which mainlyconsists of two successive classifiers. Firstly, the topic classifier is used to identifythe topic of an input utterance. With the restriction of the recognized target topic, thesemantic classifiers are trained to extract the corresponding slot-value pairs. The twokinds of classifiers can be automatically learned from minimally labelled training sen-tences. This SLU approach has good robustness for spoken language whilst keepingthe understanding deepness.A robust chart-based local parser is used to preprocess the input utterance to recognizethe concepts, which are relevant to the application domain. This robust local parser has the ability of skipping noise words or rule symbols ensuring that the SLU system hasthe low level robustness. To avoid the side-effect resulting from the skipping ability, amachine learning system is embedded into the parser for pruning. The preprocessingstep not only facilitates the labelling of training sentences but also reduces the numberof semantic slot classifiers.For the problem of topic classification, we investigate different kinds of features andcompare their corresponding performances. The strategy of combining diverse clas-sifiers is applied to improve the precision of topic classification. At the same time,the slot-filling task is also modelled as a classification problem so called semantic slotclassification. Initially, the literal context features are used for semantic slot classi-fication. Then, the consistency of the semantic slot in a sentence is checked. If theslots clash, the semantic slot re-classification is carried out to correct the misclassifiedslots. Two learning algorithms are employed for semantic slot classification, i.e., thedecision list and winnow algorithm.To further reduce the cost of labelling training utterances, weakly supervised learningtechniques are employed to train the topic and semantic classifiers. Firstly, the strategyof combining active learning and self-training is adopted to train the topic classifier.Secondly, a practical method is proposed for bootstrapping the topic-dependent seman-tic classifiers from a small amount of labelled sentences. The two weakly supervisedstrategies allow our SLU framework to begin with a small amount of labelled data andmake use of a larger amount of unlabelled data to improve the performance.The proposed SLU approach in this dissertation has been evaluated in different domainsand languages. The experimental results show that the performance of our system is betterthan the rule-based parser and comparable to the state-of-the-art data-driven SLU systems.Furthermore, our system requires less labelled data and hence significantly reduce the devel-opment cost.æ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ å£è¯å¯¹è¯ç³»ç»Ÿï¼› å£è¯ç†è§£ï¼› ä¸»é¢˜åˆ†ç±»ï¼› å¼±ç›‘ç£å¦ä¹ ï¼› ä¸»åŠ¨å¦ä¹ ï¼›
ã€Key wordsã€‘ spoken dialogue systemï¼› spoken language understandingï¼› topic classificationï¼› weakly supervised learningï¼› active learningï¼› bootstrappingï¼›

ã€ç½‘ç»œå‡ºç‰ˆæŠ•ç¨¿äººã€‘ ä¸Šæµ·äº¤é€šå¤§å¦

ã€åˆ†ç±»å·ã€‘TP391.42
ã€è¢«å¼•é¢‘æ¬¡ã€‘1
ã€ä¸‹è½½é¢‘æ¬¡ã€‘138
æ”»è¯»æœŸæˆæžœ

æ‰“å°æœ¬é¡µ

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

èŠ‚ç‚¹æ–‡çŒ®

å¯ç§»æ¤çš„ç¨³å¥å£è¯­ç†è§£æ–¹æ³•ç ”ç©¶

Robust Spoken Language Understanding Across Domains and Languages

å¯ç§»æ¤çš„ç¨³å¥å£è¯ç†è§£æ–¹æ³•ç ”ç©¶