èŠ‚ç‚¹æ–‡çŒ®

è¯éŸ³è¯†åˆ«ä¸çš„åŽå¤„ç†æŠ€æœ¯ç ”ç©¶

Post-Processing Technique for Speech Recognition

åˆ†é¡µä¸‹è½½
åˆ†ç« ä¸‹è½½
æ•´æœ¬ä¸‹è½½
åœ¨çº¿é˜…è¯»
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€ä½œè€…ã€‘ å´æ–Œï¼›

ã€ä½œè€…åŸºæœ¬ä¿¡æ¯ã€‘ åŒ—äº¬é‚®ç”µå¤§å¦ ï¼Œ ä¿¡å·ä¸Žä¿¡æ¯å¤„ç†ï¼Œ 2008ï¼Œ åšå£«

ã€æ‘˜è¦ã€‘ æ™®é€šè¯å¤§è¯æ±‡é‡è¿žç»è¯éŸ³è¯†åˆ«çš„ç ”ç©¶å·²ç»è¿›è¡Œäº†åå¤šå¹´,è™½å·²å–å¾—äº†æ˜¾è‘—è¿›å±•,ä½†è·ç¦»å¹¿æ³›åº”ç”¨è¿˜æœ‰ç›¸å½“çš„è·ç¦»ã€‚è¯éŸ³è¯†åˆ«åŽå¤„ç†æ˜¯å°†å‰å¤„ç†æ‰€å¾—åˆ°çš„éŸ³èŠ‚æµè½¬æ¢ä¸ºæ±‰å—æµçš„è¿‡ç¨‹ã€‚ç ”ç©¶å‘çŽ°,è¯éŸ³è¯†åˆ«ç³»ç»Ÿçš„åŽå¤„ç†å¯¹æé«˜ç³»ç»Ÿæ€§èƒ½å…·æœ‰ååˆ†é‡è¦çš„æ„ä¹‰ã€‚äººç±»å¬è§‰å®žéªŒè¡¨æ˜Ž,äººåªèƒ½å¬æ¸…è¿žç»è¯éŸ³æµä¸70%çš„éŸ³èŠ‚,å‰©ä½™çš„30%æ˜¯é ä¸Šä¸‹æ–‡çŸ¥è¯†æ¥çŒœæµ‹ç†è§£çš„ã€‚å› æ¤,è¯éŸ³è¯†åˆ«åŽå¤„ç†å—åˆ°äº†å¹¿æ³›çš„å…³æ³¨,å¾—åˆ°äº†è¶Šæ¥è¶Šæ·±å…¥çš„ç ”ç©¶ã€‚æœ¬æ–‡ä¸»è¦å¯¹æ™®é€šè¯å¤§è¯æ±‡é‡è¿žç»è¯éŸ³è¯†åˆ«åŽå¤„ç†ä¸çš„è¯è¨€æ¨¡åž‹è‡ªé€‚åº”ã€è§£ç ç–ç•¥ã€é”™è¯¯å¤„ç†ç‰é—®é¢˜è¿›è¡Œäº†ç ”ç©¶,ä¸»è¦å·¥ä½œä¸Žåˆ›æ–°åŒ…æ‹¬ä»¥ä¸‹å‡ ä¸ªæ–¹é¢:1.æ±‰è¯æ··æ·†ç½‘ç»œç®—æ³•é¦–å…ˆç ”ç©¶äº†æœ€å°è´å¶æ–¯é£Žé™©è§£ç å‡†åˆ™ä»¥åŠåŸºäºŽæœ€å°è´å¶æ–¯è§£ç å‡†åˆ™è¿›è¡Œæœ€å°å—é”™è¯¯çŽ‡è§£ç çš„è‹¥å¹²æ–¹æ³•,ä¾‹å¦‚:åŸºäºŽN-best listsçš„æ–¹æ³•ã€åŸºäºŽword latticeçš„æ–¹æ³•ç‰ã€‚åœ¨æ¤åŸºç¡€ä¸Š,è€ƒè™‘åˆ°æ±‰è¯è¯è¨€çš„ç‰¹ç‚¹,æå‡ºä¸€ç§æž„é€ æ±‰è¯è¯æ··æ·†ç½‘ç»œçš„ç®—æ³•,å¯¹äºŽæ±‰è¯è¯æ ¼(wordlattice)ä¸çš„é•¿å¼§,åœ¨å¼ºåˆ¶å¯¹é½æ—¶æ ¹æ®å…¶å‘éŸ³ç‰¹ç‚¹å¿«é€Ÿæœ‰æ•ˆåœ°åŠ å…¥nullå¼§ã€‚å®žéªŒè¡¨æ˜Žæ”¹è¿›çš„æž„é€ æ±‰è¯è¯æ··æ·†ç½‘ç»œè¿›è¡Œè§£ç çš„æ–¹æ³•ä¸ŽMAP(Maximum a posterior)è§£ç ã€å…ˆå‰çš„å„ç§é”™è¯¯çŽ‡æœ€å°åŒ–ç®—æ³•ç›¸æ¯”,æœ‰æ•ˆåœ°é™ä½Žäº†æ™®é€šè¯å¤§è¯æ±‡é‡è¿žç»è¯éŸ³è¯†åˆ«è¯é”™è¯¯çŽ‡ã€‚æ±‰è¯ä¸ä¸€ä¸ªè¯ä¸€èˆ¬ç”±1â€”4ä¸ªæ±‰å—ç»„æˆ,ç”±ä¸åŒæ•°ç›®æ±‰å—ç»„æˆçš„è¯çš„å‘éŸ³æ—¶é—´é•¿çŸå·®åˆ«æ¯”è¾ƒå¤§,é€ æˆæž„é€ çš„æ±‰è¯è¯æ··æ·†ç½‘ç»œä¸åŒ…å«äº†å¤§é‡çš„nullå¼§ã€‚æœ¬æ–‡æå‡ºä¸€ç§æž„é€ æ±‰å—æ··æ·†ç½‘ç»œæ¥èŽ·å–å…·æœ‰æœ€å°å—é”™è¯¯çŽ‡çš„è¯†åˆ«ç»“æžœå‡è®¾çš„æ–¹æ³•,è¿™ç§ç®—æ³•æ˜¾è‘—åœ°å‡å°‘äº†æž„é€ çš„æ±‰å—æ··æ·†ç½‘ç»œä¸çš„nullå¼§çš„æ•°ç›®ã€‚å®žéªŒç»“æžœè¡¨æ˜Žè¿™ç§æž„é€ æ±‰å—æ··æ·†ç½‘ç»œè¿›è¡Œè§£ç çš„æ–¹æ³•æœ‰æ•ˆé™ä½Žäº†è¯†åˆ«ç»“æžœçš„å—é”™è¯¯çŽ‡ã€‚2.è§£ç ç»“æžœçš„é”™è¯¯æ£€æµ‹ä¸Žçº æ£ç ”ç©¶åœ¨æ™®é€šè¯å¤§è¯æ±‡é‡è¿žç»è¯éŸ³è¯†åˆ«ä¸,è¯†åˆ«ç»“æžœå‡ºé”™çš„çŽ°è±¡å’ŒåŽŸå› éžå¸¸å¤æ‚ã€‚æœ¬æ–‡é¦–å…ˆåˆ†æžäº†ä¸€äº›å¸¸è§çš„æ™®é€šè¯å¤§è¯æ±‡é‡è¿žç»è¯éŸ³è¯†åˆ«ç»“æžœä¸çš„é”™è¯¯åŠå…¶å‡ºçŽ°çš„åŽŸå› ã€‚åœ¨æ¤åŸºç¡€ä¸Š,é‡‡ç”¨åŸºäºŽè½¬æ¢çš„å¦ä¹ æ–¹æ³•ä»Žæ··æ·†ç½‘ç»œä¸å¦ä¹ çº é”™è§„åˆ™,å®žéªŒè¡¨æ˜Žåº”ç”¨è¿™äº›çº é”™è§„åˆ™èƒ½å¤Ÿæœ‰æ•ˆé™ä½Žè¯†åˆ«ç»“æžœçš„è¯é”™è¯¯çŽ‡ã€‚è€ƒè™‘åˆ°æ±‰è¯è¯è¨€çš„å¤æ‚æ€§ä»¥åŠç”¨äºŽé”™è¯¯çº æ£è§„åˆ™å¦ä¹ çš„è®ç»ƒè¯æ–™é›†æœ‰é™,ä¸èƒ½è¦†ç›–æ‰€æœ‰çš„é”™è¯¯çŽ°è±¡,æœ¬æ–‡ä½¿ç”¨ç»Ÿè®¡çš„æ–¹æ³•è¿›è¡Œé”™è¯¯çš„æ£€æµ‹ä¸Žçº æ£ã€‚å…·ä½“åœ°,æœ¬æ–‡æå‡ºä¸€ç§åŸºäºŽæ”¯æ’‘å‘é‡æœºSVM(SupportVector Machines)è¿›è¡Œé”™è¯¯æ£€æµ‹ä¸Žçº æ£çš„æ¡†æž¶,é¦–å…ˆä½¿ç”¨SVMå¯¹è¯†åˆ«ç»“æžœå‡è®¾å—ä¸²ä¸çš„æ¯ä¸ªå—è¿›è¡Œåˆ†ç±»,åˆ¤æ–å…¶æ£ç¡®æ€§;æŽ¥ä¸‹æ¥å¯¹äºŽåˆ†ç±»ä¸ºé”™è¯¯çš„å—åŸºäºŽæ±‰è¯å—æ··æ·†ç½‘ç»œæž„é€ å€™é€‰å—åºåˆ—,å¯¹å€™é€‰å—åºåˆ—é‡æ–°æ‰“åˆ†,é€‰æ‹©æœ€é«˜å¾—åˆ†çš„å—ä¸²ä½œä¸ºé”™è¯¯çº æ£çš„ç»“æžœã€‚å®žéªŒç»“æžœè¡¨æ˜Žè¿™ç§æ–¹æ³•èƒ½å¤Ÿæœ‰æ•ˆåœ°æ£€æµ‹å‡ºè¯†åˆ«ç»“æžœä¸çš„é”™è¯¯å¹¶è¿›è¡Œçº æ£,é™ä½Žäº†å—é”™è¯¯çŽ‡ã€‚3.è¯éŸ³è¯†åˆ«ä¸çš„åŒºåˆ†æ€§è¯è¨€æ¨¡åž‹ç ”ç©¶è¯è¨€æ¨¡åž‹è‡ªé€‚åº”æ˜¯æ ¹æ®ä¸æ–å˜åŒ–çš„åº”ç”¨çŽ¯å¢ƒ,è°ƒæ•´è¯è¨€æ¨¡åž‹ä¸å„ç§çŽ°è±¡å‡ºçŽ°çš„æ¦‚çŽ‡,ä»¥é€‚åº”ä¸åŒåº”ç”¨çŽ¯å¢ƒçš„ç‰¹å¾ã€‚æœ¬æ–‡å°†Boostingã€Perceptronä»¥åŠæœ€å°åŒ–æ ·æœ¬é£Žé™©ä¸‰ç§ç®—æ³•ç”¨äºŽè®ç»ƒè¯éŸ³è¯†åˆ«ç³»ç»Ÿä¸çš„N-Gramè¯è¨€æ¨¡åž‹,ä½¿å…¶å¯¹ç‰¹å®šé¢†åŸŸå…·æœ‰è‡ªé€‚åº”èƒ½åŠ›ã€‚å®žéªŒç»“æžœè¡¨æ˜Žä½¿ç”¨è¿™ä¸‰ç§ç®—æ³•è®ç»ƒçš„N-Gramè¯è¨€æ¨¡åž‹é™ä½Žäº†ç‰¹å®šé¢†åŸŸçš„è¯éŸ³è¯†åˆ«ç»“æžœçš„è¯é”™è¯¯çŽ‡ã€‚å…¶ä¸Perceptronç®—æ³•è®ç»ƒçš„N-Gramè¯è¨€æ¨¡åž‹çš„é¢†åŸŸè‡ªé€‚åº”èƒ½åŠ›æœ€å¥½ã€‚æ‰€ä»¥æœ¬æ–‡åœ¨é€šç”¨é¢†åŸŸçš„è¯éŸ³è¯†åˆ«ä¸,å°†è¾“å…¥çš„è¯éŸ³ä¸Žè¯†åˆ«è¾“å‡ºçš„æ±‰è¯è¯æ··æ·†ç½‘ç»œä½œä¸ºè®ç»ƒæ ·æœ¬,ä½¿ç”¨Perceptronç®—æ³•è®ç»ƒåŒºåˆ†æ€§è¯è¨€æ¨¡åž‹,å¹¶ç”¨è¿™ç§è¯è¨€æ¨¡åž‹å¯¹æ±‰è¯è¯æ··æ·†ç½‘ç»œé‡æ–°æ‰“åˆ†ã€‚å®žéªŒç»“æžœè¡¨æ˜Žè¿™ç§æ–¹æ³•æœ‰æ•ˆåœ°é™ä½Žäº†è¯†åˆ«ç»“æžœçš„è¯é”™è¯¯çŽ‡ã€‚æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ Mandarin Large vocabulary Chinese continuous speech recognition has been researched for more than ten years. Although there are some achievements in continuous speech recognition research, the distance from widespread application is still very long. The post-processing of speech recognition is a processing which converts Pinyin to Chinese characters. The research shows that the post-processing of speech recognition has very important significance to improve the system performance. The experiment on hearing indicates that human can only hear 70% syllables in continuous speech and understand the remaining 30% using context knowledge. Therefore, the post-processing technique of speech recognition is paid a great attention and conducted in-depth studies.In this thesis, we will make a deep research on the post-processing technique of Mandarin large vocabulary continuous speech recognition, including language model adaptation, decoding strategy and error handling. The main contributions and innovations are described in details as follows:1. Chinese confusion network algorithmsAt the beginning, we study the minimum bayes risk decoding rule and some minimum word error rate decoding methods. According to the characteristic of Chinese linguistics, we proposed an improved algorithm of constructing Chinese word confusion network. The improved algorithm fleetly adds null arc in confusion set when the long arc with a Chinese character string is forcibly aligned in the process of constructing a confusion network. This improved algorithm were evaluated on 2005 HTRDP (863) Evaluation task, where improved word accuracy performance was observed.In general, a Chinese word consists of 1-4 single Chinese characters, so its pronunciation time length change quite large. So we proposed an novel Chinese character confusion network algorithm for the purpose of decreasing the number of null arc. Experimental result proves that this algorithm cut the character error rate of recognition results effectively.2. Research on detection and correction of decoding resultsOn the basis on analysis of decoding errors and reason, we proposed a method that we use transformation-based learning for learning error correction rules from Chinese word confusion network. Experimental result shows significant improvements over recognition results.Considering the complexity of Chinese and the limited corpus for learning error correction rules, we use statistical methods to detect and correct decoding errors. In details, we use SVM to classify the decoding results, detect the errors; then we use Chinese character confusion network to correct errors. Experimental result shows that this method can effectively detect and correct decoding errors, and reduce the character error rate.3. Study on discriminative language model of speech recognition Firstly, we study three discriminative methods of language modeladaptation, including the boosting algorithm, the perceptron algorithm and the minimum sample risk algorithm, and present comparative experimental results on the performance of using different approaches to train discriminative language model on the task of speech recognition. Then we use the perceptron algorithm with best discriminative performance to train discriminative language model for general domain Mandarin large vocabulary continuous speech recognition, and rescore the Chinese word confusion network. Experimental result shows that this method can effectively reduce the word error rate.æ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ åŽå¤„ç†ï¼› è§£ç ï¼› æœ€å°è´å¶æ–¯é£Žé™©å‡†åˆ™ï¼› åŒºåˆ†æ€§è¯è¨€æ¨¡åž‹ï¼› é”™è¯¯æ£€æµ‹ï¼› é”™è¯¯çº æ£ï¼›
ã€Key wordsã€‘ post processingï¼› decodingï¼› minimum bayes risk ruleï¼› discriminative language modelï¼› error detectionï¼› error correctionï¼›

ã€ç½‘ç»œå‡ºç‰ˆæŠ•ç¨¿äººã€‘ åŒ—äº¬é‚®ç”µå¤§å¦

ã€åˆ†ç±»å·ã€‘TN912.34
ã€è¢«å¼•é¢‘æ¬¡ã€‘7
ã€ä¸‹è½½é¢‘æ¬¡ã€‘749
æ”»è¯»æœŸæˆæžœ

çŸ¥ç½‘èŠ‚ä¸‹è½½

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

æœ¬æ–‡çš„å¼•æ–‡ç½‘ç»œ

èŠ‚ç‚¹æ–‡çŒ®

èŠ‚ç‚¹æ–‡çŒ®

è¯­éŸ³è¯†åˆ«ä¸­çš„åŽå¤„ç†æŠ€æœ¯ç ”ç©¶

Post-Processing Technique for Speech Recognition

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

è¯éŸ³è¯†åˆ«ä¸çš„åŽå¤„ç†æŠ€æœ¯ç ”ç©¶