èŠ‚ç‚¹æ–‡çŒ®

ä¸åŒ¹é…ä¿¡é“ä¸‹è€³è¯éŸ³è¯´è¯äººè¯†åˆ«ç ”ç©¶

Research on Whispered Speaker Identification in Channel Mismatch Conditions

åˆ†é¡µä¸‹è½½
åˆ†ç« ä¸‹è½½
æ•´æœ¬ä¸‹è½½
åœ¨çº¿é˜…è¯»
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€ä½œè€…ã€‘ é¡¾æ™“æ±Ÿï¼›

ã€ä½œè€…åŸºæœ¬ä¿¡æ¯ã€‘ è‹å·žå¤§å¦ ï¼Œ ä¿¡å·ä¸Žä¿¡æ¯å¤„ç†ï¼Œ 2011ï¼Œ ç¡•å£«

ã€æ‘˜è¦ã€‘ è€³è¯éŸ³ä½œä¸ºäººç±»çš„ä¸€ç§è¾…åŠ©å‘éŸ³æ–¹å¼,åœ¨æ—¥å¸¸ç”Ÿæ´»ä¸èµ·ç€è¾ƒä¸ºå¹¿æ³›çš„ä½œç”¨,å°¤å…¶æ˜¯åœ¨é‡‘èžé¢†åŸŸ,å…¬å®‰å¸æ³•é¢†åŸŸä¸å„ç§èº«ä»½çš„ç¡®è®¤ã€‚è¯´è¯è€…ä¸ºäº†ä¿è¯ä¿¡æ¯çš„ç§å¯†æ€§,å¸¸å¸¸ä¼šç”¨åˆ°è€³è¯éŸ³ã€‚æ£å› å¦‚æ¤,è€³è¯éŸ³è¯´è¯äººè¯†åˆ«ä¹Ÿä½œä¸ºä¸€ä¸ªæ–°çš„è¯¾é¢˜è¢«æå‡ºæ¥ã€‚è€³è¯éŸ³ä¸»è¦æ˜¯ç”¨åœ¨æ‰‹æœºé€šè¯ä¸,è¯éŸ³å¿…ç„¶ä¼šå—åˆ°ä¿¡é“ç•¸å˜çš„å½±å“ã€‚ä¼ ç»Ÿçš„è¯†åˆ«æ¨¡åž‹é‡åˆ°è®ç»ƒå’Œæµ‹è¯•çš„ä¿¡é“çŽ¯å¢ƒå·®å¼‚å˜å¤§æ—¶,è¯†åˆ«çŽ‡å°±ä¼šå¤§å¤§å—åˆ°å½±å“ã€‚å› æ¤,å¿…ç„¶éœ€è¦ä¸€ç§ç¨³å¥çš„ä¿¡é“è¡¥å¿ç®—æ³•æ¥å¢žå¼ºè¿™ä¸ªè¯´è¯äººè¯†åˆ«ç³»ç»Ÿã€‚ä¸ºäº†è§£å†³è¿™ä¸ªé—®é¢˜,æœ¬æ–‡åšäº†ä»¥ä¸‹å‡ ä¸ªæ–¹é¢çš„å·¥ä½œ:ä¸€ã€å°†å„ç§ä¿¡é“çš„è€³è¯éŸ³æ•°æ®æ··åˆåœ¨ä¸€èµ·è®ç»ƒé€šç”¨èƒŒæ™¯æ¨¡åž‹(UBM),ç„¶åŽåœ¨æ¤åŸºç¡€ä¸Šè¿›è¡Œæœ€å¤§åŽéªŒæ¦‚çŽ‡(MAP)è‡ªé€‚åº”èŽ·å¾—è¯´è¯äººæ¨¡åž‹,å°†æ¤æ¨¡åž‹å’Œå¸¸è§„çš„GMMæ¨¡åž‹è¿›è¡Œè¯†åˆ«çŽ‡çš„æ¯”è¾ƒã€‚å®žéªŒè¯æ˜Ž,UBMæ¨¡åž‹ä¼˜äºŽæ™®é€šçš„GMMã€‚äºŒã€å°†è”åˆå› ååˆ†æž(JFA)åº”ç”¨åˆ°è€³è¯è¯†åˆ«ä¸,æ ¹æ®è€³è¯æ•°æ®åº“çš„ç‰¹æ€§,é‡‡å–åˆ†å¼€ä¼°è®¡å’Œçœç•¥æ®‹å·®ç©ºé—´çš„æ–¹æ³•ã€‚å…·ä½“åœ¨è¯†åˆ«è¿‡ç¨‹ä¸,é€šè¿‡å°†è®ç»ƒæ‰€å¾—çš„è¯´è¯äººå› åå’Œæµ‹è¯•æ‰€å¾—çš„ä¿¡é“å› åç›¸ç»“åˆçš„æ–¹å¼,è¾¾åˆ°è¯´è¯äººä¸æ–é€‚åº”æµ‹è¯•ä¿¡é“çŽ¯å¢ƒçš„ç›®çš„ã€‚å®žéªŒç»“æžœæ˜¾ç¤ºä¿®æ”¹åŽJFAçš„è¯†åˆ«æ•ˆæžœå¤§å¤§æå‡ã€‚å¦å¤–,æ ¹æ®JFAåœ¨çŸæ—¶è¯†åˆ«æ–¹é¢æ•ˆæžœä¸ç†æƒ³,æå‡ºäº†ä¸€ç§åœ¨æ¨¡åž‹ä¸Šä¿æŒè¯´è¯äººå› åä¸å˜,è€Œå°†ä¿¡é“å› åç”¨åˆ°ç‰¹å¾æ–¹é¢,å¯¹æ¯ä¸€å¸§ç‰¹å¾çŸ¢é‡è¿›è¡Œè¡¥å¿çš„æ··åˆè¡¥å¿æ³•,è¯¥æ–¹æ³•ç›¸å¯¹äºŽJFAæ¥è¯´è¡¥å¿çš„æ›´ä¸ºç»†è‡´,å®žéªŒæ˜¾ç¤ºHHä¿¡é“è®ç»ƒæ—¶1så’Œ2så¹³å‡è¯†åˆ«çŽ‡åˆ†åˆ«æé«˜4.36%å’Œ3.89%,EPä¿¡é“è®ç»ƒæ—¶1så’Œ2så¹³å‡è¯†åˆ«çŽ‡åˆ†åˆ«æé«˜4.14%å’Œ2.64%ã€‚ä¸‰ã€æ ¹æ®æ”¯æŒå‘é‡æœº(SVM)çš„åŒºåˆ†æ€§,å°†è¯´è¯äººè¶…å‘é‡è¾“å…¥åˆ°SVMä¸,ç»“æžœç³»ç»Ÿæ€§èƒ½ä¸å¦‚UBM-MAPç³»ç»Ÿã€‚è¿™æ—¶å°†è¯´è¯äººå› åçŸ¢é‡è¾“å…¥åˆ°SVMä¸,ç”±äºŽè¯´è¯äººå› ååœ¨è¾¨è®¤ç³»ç»Ÿä¸ç‰¹å¾ç»´æ•°ä½Ž,æ˜“çº¿æ€§å¯åˆ†,èŽ·å¾—äº†è‰¯å¥½çš„è¯†åˆ«æ•ˆæžœã€‚ç„¶åŽç»è¿‡ä¸‰ç§ä¿¡é“è¡¥å¿æ–¹æ³•è¿›ä¸€æ¥åŽ»å†—ä½™,å–å¾—äº†å’ŒJFAç›¸å½“çš„è¯†åˆ«ç»“æžœã€‚æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ The whispered speech is acted as an auxiliary way of communication and it is widely used in human life at the same time, especially in the all kinds of identity recognition of finance area and justice area. Speaker usually can use whispered speech in order to keep information secret.So, the whispered speaker identification is also noticed as a new project. The whispered speech is often used in mobile phone environment, which is affected by channel distortion. The traditional model gets low recognition accuracy when the channel environment difference between training and testing is obvious. Therefore, a robust channel compensation algorithm must enhance the speaker recognition system. In order to solve this problem, the articleâ€™s work is as follows:1. Mix all the kinds of channel whispered speech to train a universal background model (UBM), then on this base, maximum a posteriori adaptation is adopted to train the speaker model. Compare this model with GMM, the experiment result proves that the UBM performs better than normal GMM.2. Joint factor analysis (JFA) is introduced in whispered speaker identification. According the speech databaseâ€™s characteristic, decoupled estimation and omitting residual subspace are applied. In the specific identification process, the speaker factor from training utterance and channel factor from testing utterance are combined to fit the test channel dynamically. The experiment shows that improvement JFA achieves high recognition result. In addition, JFA is not ideal in the short-time identification. A new hybrid compensation method which keeps speaker factor in model domain and applies channel factor in feature domain is proposed. This method is to compensate each frame feature vector and more meticulous than JFA. The experiment shows 1s and 2s average identification rate separately improve 4.36% and 3.89% when HH channel is trained. In addition, EP channel separately improve 4.14% and 2.64%.3. According to support vector machine (SVM)â€™s discriminability, the speaker supervector is input into the SVM. But the system performance is not as good as UBM-MAP. Then the speaker factor vector is input into the SVM. Because the speaker factor has the property of low dimension and linear discriminant availability, it achieves excellent accuracy result. After that, three kinds of channel compensation technique are used to improve the systemâ€™s robustness further and obtain quite identification result compared to JFA.æ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ è€³è¯éŸ³ï¼› è¯´è¯è®¤è¯†åˆ«ï¼› è”åˆå› ååˆ†æžï¼› æ··åˆè¡¥å¿ï¼› æ”¯æŒå‘é‡æœºï¼›
ã€Key wordsã€‘ whispered speechï¼› speaker identificationï¼› joint factor analysisï¼› hybrid compensationï¼› support vector machineï¼›

ã€ç½‘ç»œå‡ºç‰ˆæŠ•ç¨¿äººã€‘ è‹å·žå¤§å¦

ã€åˆ†ç±»å·ã€‘TN912.34
ã€è¢«å¼•é¢‘æ¬¡ã€‘4
ã€ä¸‹è½½é¢‘æ¬¡ã€‘79
æ”»è¯»æœŸæˆæžœ

çŸ¥ç½‘èŠ‚ä¸‹è½½

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

æœ¬æ–‡çš„å¼•æ–‡ç½‘ç»œ

èŠ‚ç‚¹æ–‡çŒ®

èŠ‚ç‚¹æ–‡çŒ®

ä¸åŒ¹é…ä¿¡é“ä¸‹è€³è¯­éŸ³è¯´è¯äººè¯†åˆ«ç ”ç©¶

Research on Whispered Speaker Identification in Channel Mismatch Conditions

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

ä¸åŒ¹é…ä¿¡é“ä¸‹è€³è¯éŸ³è¯´è¯äººè¯†åˆ«ç ”ç©¶