èŠ‚ç‚¹æ–‡çŒ®

è¯éŸ³è¯†åˆ«ç½®ä¿¡åº¦ç‰¹å¾æå–ç®—æ³•ç ”ç©¶

A Study of Feature Extraction Algorithm of Speech Recognition Confidence Measure

åˆ†é¡µä¸‹è½½
åˆ†ç« ä¸‹è½½
æ•´æœ¬ä¸‹è½½
åœ¨çº¿é˜…è¯»
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€ä½œè€…ã€‘ å›½çŽ‰æ™¶ï¼›

ã€ä½œè€…åŸºæœ¬ä¿¡æ¯ã€‘ åŒ—äº¬é‚®ç”µå¤§å¦ ï¼Œ æ¨¡å¼è¯†åˆ«ä¸Žæ™ºèƒ½ç³»ç»Ÿï¼Œ 2010ï¼Œ ç¡•å£«

ã€æ‘˜è¦ã€‘ å¤§è§„æ¨¡è¿žç»è¯éŸ³è¯†åˆ«çš„ç ”ç©¶å·²ç»è¿›è¡Œäº†äºŒåå¤šå¹´,è™½å·²å–å¾—äº†æ˜¾è‘—è¿›å±•,ä½†è·ç¦»å¹¿æ³›åº”ç”¨è¿˜æœ‰ç›¸å½“çš„è·ç¦»ã€‚åœ¨å…‹æœè¯†åˆ«ç®—æ³•æœ¬èº«ç¼ºé™·ã€è¿½æ±‚è¯†åˆ«æ€§èƒ½æå‡çš„è¿‡ç¨‹ä¸,ç ”ç©¶è€…ä»¬é€æ¸å¼•å…¥äº†ç½®ä¿¡åº¦çš„æ¦‚å¿µ,ç”¨å®ƒæ¥è¡¡é‡è¯éŸ³è¯†åˆ«ç³»ç»Ÿæ‰€ä½œå†³ç–çš„å¯ä¿¡ç¨‹åº¦ã€‚è¿‘å¹´æ¥,è¯éŸ³è¯†åˆ«ç½®ä¿¡åº¦åœ¨è¯éŸ³é”™è¯¯æ£€æµ‹ä¸Žé”™è¯¯çº æ£,æ— ç›‘ç£å’ŒåŠç›‘ç£è®ç»ƒã€å¤šéæœç´¢æŠ€æœ¯å’Œè¯æ–™åº“ä¸é”™è¯¯è¯æ–™ç”„é€‰ç‰åº”ç”¨ä¸éƒ½å‘æŒ¥äº†éžå¸¸é‡è¦çš„ä½œç”¨ã€‚ä¼ ç»Ÿçš„è¯éŸ³è¯†åˆ«ç½®ä¿¡åº¦æ ‡æ³¨åŸºäºŽä¸åŒç½®ä¿¡ç‰¹å¾æˆ–è€…ç‰¹å¾ç»„åˆè¿›è¡Œåˆ†ç±»åˆ¤å†³,ç›®å‰å¸¸ä½¿ç”¨çš„ç½®ä¿¡ç‰¹å¾ä¸»è¦æ¥æºäºŽè§£ç ä¿¡æ¯ã€‚ä½†æ˜¯,æ–¹é¢çŽ°æœ‰ç½®ä¿¡åº¦ç‰¹å¾å¯¹è§£ç ä¿¡æ¯çš„æŒ–æŽ˜ä»å±€é™äºŽå¤ç«‹å’Œé™æ€,è€Œå¿½ç•¥äº†è¯ä¸Žå‘¨å›´çŽ¯å¢ƒä¹‹é—´çš„å…³ç³»ï¼›å¦ä¸€æ–¹é¢,ç›®å‰å£°å¦ç‰¹å¾ä»å ä¸»è¦åœ°ä½,è€Œäººç±»å¬è§‰å®žéªŒè¡¨æ˜Ž,äººåœ¨è¿›è¡Œè¯éŸ³ç†è§£æ—¶,å¤§çº¦æœ‰30%çš„ä¿¡æ¯æ¥è‡ªäºŽè¯æ³•ã€è¯ä¹‰ç‰çŸ¥è¯†çš„æŒ‡å¯¼ã€‚å› æ¤,åœ¨ç½®ä¿¡åº¦ç‰¹å¾æå–ä¸,å¦‚ä½•æŒ–æŽ˜å‡ºè¯ä¸ŽçŽ¯å¢ƒä¹‹é—´çš„å…³ç³»,åŒæ—¶æç‚¼å‡ºè¯çš„è¯æ³•å’Œè¯ä¹‰ç‰¹å¾,ä»Žè€Œæé«˜è¯†åˆ«åŽå¤„ç†æ€§èƒ½,æ˜¯ä¸€ä¸ªéžå¸¸å€¼å¾—ç ”ç©¶çš„é—®é¢˜ã€‚åŸºäºŽä¸Šè¿°ç›®çš„,æœ¬æ–‡åœ¨æå»ºä¼ ç»Ÿè¯éŸ³è¯†åˆ«ç½®ä¿¡åº¦æ ‡è®°ç³»ç»Ÿçš„åŸºç¡€ä¸Š,æå‡ºäº†ä¸¤ç§æ–°çš„ç½®ä¿¡åº¦ç‰¹å¾,ä¸€æ˜¯çŽ¯å¢ƒç‰¹å¾,åˆ†ä¸ºä¸Šä¸‹æ–‡çŽ¯å¢ƒã€åŠ¨æ€çŽ¯å¢ƒã€å¥å…¨å±€çŽ¯å¢ƒä¸‰ç±»,é€šè¿‡å¯¹è§£ç ä¿¡æ¯çš„å†åŠ å·¥,ä»Žç©ºé—´ä¸Žæ—¶é—´è§’åº¦è¾ƒå…¨é¢åœ°æè¿°äº†è¯ä¸ŽçŽ¯å¢ƒä¹‹é—´çš„å…³ç³»ï¼›äºŒæ˜¯åŸºäºŽä¸»é¢˜ç›¸ä¼¼æ€§çš„è¯ä¹‰å±‚ç½®ä¿¡ç‰¹å¾æå–ç®—æ³•TSS (Topic Similarity based Semantic confidence feature extraction algorithm),é€šè¿‡ä¸»é¢˜æ¨¡åž‹LDA(Latent Dirichlet Allocation)è®¡ç®—å¾—åˆ°è¯†åˆ«ç»“æžœä¸è¯çš„ä¸»é¢˜åˆ†å¸ƒåŠå…¶ä¸Šä¸‹æ–‡çš„ä¸»é¢˜åˆ†å¸ƒ,å¹¶å°†äºŒè€…ä¹‹é—´çš„ä¸»é¢˜ç›¸ä¼¼æ€§ä½œä¸ºè¯çš„è¯ä¹‰ç½®ä¿¡ç‰¹å¾ã€‚å®žéªŒè¡¨æ˜Ž,æœ¬æ–‡æå‡ºçš„ä¸¤ç§ç‰¹å¾æ·±å…¥æŒ–æŽ˜äº†è§£ç å±‚çš„æœ‰æ•ˆä¿¡æ¯,åˆå¢žåŠ äº†ç½®ä¿¡ç‰¹å¾çš„ä¿¡æ¯æ¥æº,ä¸Žè§£ç å±‚ç½®ä¿¡ç‰¹å¾è¿›è¡Œç»„åˆåŽèƒ½æœ‰æ•ˆåœ°æé«˜ç½®ä¿¡åº¦æ ‡æ³¨çš„ç²¾åº¦ã€‚æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ The large vocabulary continuous speech recognition research has been studied for more than two decades, though significant progress has been made, but there is still a considerable distance from the wide range of applications. In the pursuit of overcoming the deficiencies inside recognition algorithm itself, and improving recognition performance, researchers have gradually introduced the concept of confidence measure, to measure in which degree we could trust the result of speech recognition system. In recent years, speech recognition confidence measure has played a very important role in many applications, including speech error detection and correction, no supervision and semi-supervised training, multi-search technology and corpus selection and verification, etc.Based on different feature-combinations, traditional speech recognition confidence is actually a confidence annotation or classification decisions, with mainly information from decoding messages. However, the current confidence features are still limited to isolated and static, while ignoring the the relationship between the words and their surrounding environment; on the other hand, acoustic features are still dominant, while the experiments show that in speech understanding, human beings depend approximately 30% of the information from the syntax, semantics and other non-acoustic knowledge. Therefore, how to dig out the relationship between words and the environment, to extract the characteristics of syntax and semantics of the word so as to enhance recognition performance of post-processing is a very worthwhile study in the field of feature extraction of confidence measure.For purpose above, in addition to build a traditional baseline system of speech recognition confidence annotation, this paper proposed two new confidence features. The first one is environmental feature, including context, dynamic and the global environment features, which extract more valuable information from the intermediate production of decoding, and provide a more comprehensive description of the relationship between words and the environment from both perspectives of space and time. The second is based on topic similarity of the semantic layer of confidence feature extraction algorithm TSS (Topic Similarity based Semantic confidence feature extraction algorithm), using a new theme Model LDA (Latent Dirichlet Allocation) we could calculated the distribution on theme of first the word in recognition results and then in the context. and distribution similarity between the theme and the word could be figured out as the semantic features of words in context. Experiments show that the two features proposed in this paper deeply excavated valuable decoding information, and, after combined with acoustic features, an significant increase in accuracy of confidence annotation experiment has been seen.æ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ ç½®ä¿¡åº¦ï¼› çŽ¯å¢ƒç‰¹å¾ï¼› æ½œç‹„åˆ©å…‹é›·åˆ†é…ï¼› ä¸»é¢˜æ¨¡åž‹ï¼› è¯ä¹‰ï¼›
ã€Key wordsã€‘ confidence measureï¼› environment featureï¼› latent dirichlet allocationï¼› topic modelï¼› semanticï¼›

ã€ç½‘ç»œå‡ºç‰ˆæŠ•ç¨¿äººã€‘ åŒ—äº¬é‚®ç”µå¤§å¦

ã€åˆ†ç±»å·ã€‘TN912.34
ã€è¢«å¼•é¢‘æ¬¡ã€‘1
ã€ä¸‹è½½é¢‘æ¬¡ã€‘177
æ”»è¯»æœŸæˆæžœ

çŸ¥ç½‘èŠ‚ä¸‹è½½

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

æœ¬æ–‡çš„å¼•æ–‡ç½‘ç»œ

èŠ‚ç‚¹æ–‡çŒ®

èŠ‚ç‚¹æ–‡çŒ®

è¯­éŸ³è¯†åˆ«ç½®ä¿¡åº¦ç‰¹å¾æå–ç®—æ³•ç ”ç©¶

A Study of Feature Extraction Algorithm of Speech Recognition Confidence Measure

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

è¯éŸ³è¯†åˆ«ç½®ä¿¡åº¦ç‰¹å¾æå–ç®—æ³•ç ”ç©¶