èŠ‚ç‚¹æ–‡çŒ®

åŸºéŸ³å‘¨æœŸæ£€æµ‹ç®—æ³•ç ”ç©¶åŠåœ¨è¯éŸ³åˆæˆä¸çš„åº”ç”¨

Study of Speech Pitch Period Detection Algorithm and the Application in Speech Synthesis System

åˆ†é¡µä¸‹è½½
åˆ†ç« ä¸‹è½½
æ•´æœ¬ä¸‹è½½
åœ¨çº¿é˜…è¯»
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€ä½œè€…ã€‘ æŽå¨Ÿï¼›

ã€ä½œè€…åŸºæœ¬ä¿¡æ¯ã€‘ å¤ªåŽŸç†å·¥å¤§å¦ ï¼Œ ä¿¡å·ä¸Žä¿¡æ¯å¤„ç†ï¼Œ 2008ï¼Œ ç¡•å£«

ã€æ‘˜è¦ã€‘ è¯éŸ³ä¿¡å·çš„åŸºéŸ³å‘¨æœŸæ˜¯æè¿°æ¿€åŠ±æºçš„é‡è¦ç‰¹å¾å‚æ•°ä¹‹ä¸€,å‡†ç¡®çš„æ£€æµ‹è¯éŸ³ä¿¡å·çš„åŸºéŸ³å‘¨æœŸå¯¹é«˜è´¨é‡çš„è¯éŸ³åˆ†æžä¸Žåˆæˆã€è¯éŸ³åŽ‹ç¼©ç¼–ç ã€è¯éŸ³è¯†åˆ«ç‰éƒ½å…·æœ‰é‡è¦æ„ä¹‰ã€‚æœ¬æ–‡è®¨è®ºäº†å‡ ç§å¸¸ç”¨çš„åŸºéŸ³å‘¨æœŸæ£€æµ‹æ–¹æ³•ä»¥åŠå°æ³¢å˜æ¢å’ŒHilbert-Huangå˜æ¢,æå‡ºäº†æŠ—å™ªæ€§å¾ˆå¥½çš„è‡ªç›¸å…³èƒ½é‡å‡½æ•°å’Œå¹…åº¦å·®èƒ½é‡å‡½æ•°ç›¸ç»“åˆçš„åŸºéŸ³å‘¨æœŸæ£€æµ‹ç®—æ³•,å¹¶å°†Hilbert-Huangå˜æ¢åº”ç”¨äºŽTD-PSOLAè¯éŸ³åˆæˆç³»ç»Ÿçš„åŸºéŸ³æ ‡è®°ä¸ã€‚æ–‡ä¸é¦–å…ˆä»‹ç»äº†å‡ ç§å¸¸è§çš„è¯éŸ³åŸºéŸ³å‘¨æœŸæ£€æµ‹æ–¹æ³•å¦‚è‡ªç›¸å…³å‡½æ•°æ³•(ACF)ã€å¹³å‡å¹…åº¦å·®æ³•(AMDF)ã€å€’è°±æ³•ã€‚è‡ªç›¸å…³å‡½æ•°æ–¹æ³•é€‚åˆäºŽå™ªå£°çŽ¯å¢ƒä¸‹,ä½†å•ç‹¬ä½¿ç”¨ç»å¸¸å‘ç”ŸåŸºé¢‘ä¼°è®¡ç»“æžœä¸ºå…¶å®žé™…åŸºéŸ³é¢‘çŽ‡çš„äºŒæ¬¡å€é¢‘æˆ–äºŒæ¬¡åˆ†é¢‘çš„æƒ…å†µ;å¹³å‡å¹…åº¦å·®æ³•ã€å€’è°±æ³•åœ¨é™éŸ³çŽ¯å¢ƒä¸‹æˆ–å™ªå£°è¾ƒå°æ—¶å¯ä»¥å–å¾—è¾ƒå¥½çš„æ£€æµ‹ç»“æžœ,ä½†åœ¨è¯éŸ³çŽ¯å¢ƒè¾ƒæ¶åŠ£ã€ä¿¡å™ªæ¯”è¾ƒä½Žæ—¶,æ£€æµ‹çš„ç»“æžœä¸‹é™è¾ƒå¿«,éš¾ä»¥è®©äººæ»¡æ„ã€‚åŸºäºŽæ¤,æœ¬æ–‡æå‡ºäº†ä¸€ç§æŠ—å™ªæ€§å¾ˆå¥½çš„è‡ªç›¸å…³èƒ½é‡å‡½æ•°(ACEF)å’Œå¹…åº¦å·®èƒ½é‡å‡½æ•°(MDEF)ç›¸ç»“åˆçš„åŸºéŸ³å‘¨æœŸæ£€æµ‹ç®—æ³•,æŠ‘åˆ¶äº†è‡ªç›¸å…³å‡½æ•°ä¸å¿…è¦çš„å³°å€¼,æé«˜äº†æŠ—å™ªæ€§,æœ‰æ•ˆå¼¥è¡¥äº†ä¼ ç»ŸåŸºéŸ³å‘¨æœŸæ£€æµ‹ç®—æ³•çš„ç¼ºç‚¹ã€‚è®ºæ–‡ä»‹ç»äº†å°æ³¢å˜æ¢ç†è®º,åŒ…æ‹¬è¿žç»å°æ³¢å˜æ¢ã€ç¦»æ•£å°æ³¢å˜æ¢ã€å¤šåˆ†è¾¨çŽ‡åˆ†æžã€Mallatç®—æ³•ç‰,å¹¶é€šè¿‡å®žéªŒåˆ†æžäº†åŸºäºŽMallatç®—æ³•çš„åŸºéŸ³å‘¨æœŸæ£€æµ‹æ–¹æ³•â€”å°æ³¢åˆ†è§£ä¸Žé‡æž„ç®—æ³•(é«˜é¢‘ç½®é›¶)ä»¥åŠåœ¨Mallatç®—æ³•åŸºç¡€ä¸Šè¡ç”Ÿå‡ºçš„å¤šå”ç®—æ³•ã€‚ç›´æŽ¥ç”¨Mallatç®—æ³•åˆ†è§£è¯éŸ³ä¿¡å·æ—¶,éœ€è¦é™é‡‡æ ·,æ¯ä¸€çº§åˆ†è§£åŽçš„åˆ†é‡é•¿åº¦æ˜¯ä¸Šä¸€çº§åˆ†è§£åˆ†é‡é•¿åº¦çš„ä¸€åŠ;è€Œé‡‡ç”¨å¤šå”ç®—æ³•æ—¶æ˜¯ç›´æŽ¥å¯¹æ»¤æ³¢å™¨ç³»æ•°æ’å€¼,æ¯ä¸€çº§åˆ†è§£åŽçš„åˆ†é‡é•¿åº¦éƒ½ä¸ŽåŽŸä¿¡å·çš„é•¿åº¦ç›¸ç‰,æœ‰åˆ©äºŽåŸºéŸ³å‘¨æœŸçš„æå–ã€‚è®ºæ–‡ä»‹ç»äº†Hilbertâ€”Huangå˜æ¢ç†è®º,å¹¶å°†å®ƒåº”ç”¨äºŽåŸºéŸ³å‘¨æœŸæ£€æµ‹ä¸ã€‚ä¸Žä¼ ç»Ÿæ–¹æ³•ç›¸æ¯”,Hilbert-Huangå˜æ¢ä¸éœ€è¦å¯¹è¯éŸ³ä¿¡å·è¿›è¡ŒçŸæ—¶å¹³ç¨³å‡è®¾,æ£€æµ‹ç²¾åº¦é«˜,é€‚åº”èŒƒå›´å¹¿,å¸§é•¿å¤§å¤§å¢žåŠ ;ä¸Žå°æ³¢å˜æ¢ç›¸æ¯”,Hilbertâ€”Huangå˜æ¢ä¾æ®ä¿¡å·æœ¬èº«çš„ä¿¡æ¯å¯¹ä¿¡å·è¿›è¡Œåˆ†è§£,éšä¿¡å·æœ¬èº«å˜åŒ–è€Œå˜åŒ–,è¡¨çŽ°äº†ä¿¡å·å†…å«çš„çœŸå®žç‰©ç†ä¿¡æ¯,å…·æœ‰æ›´å¥½çš„è‡ªé€‚åº”æ€§å’Œä¼˜è¶Šæ€§ã€‚è®ºæ–‡å°†Hilbertâ€”Huangå˜æ¢åº”ç”¨äºŽTD-PSOLAè¯éŸ³åˆæˆç³»ç»ŸåŸºéŸ³æ ‡æ³¨ä¸,å¤§å¤§æ‹“å±•äº†Hilbert-Huangå˜æ¢çš„åº”ç”¨èŒƒå›´,å¹¶ä»¥å®žéªŒè¯æ˜Ž:é€šå¸¸ä½¿ç”¨çš„è‡ªç›¸å…³æ–¹æ³•åªæ±‚å¾—æ¯å¸§è¯éŸ³ä¿¡å·çš„å¹³å‡åŸºéŸ³å‘¨æœŸ,ç„¶åŽå¯¹æ‰€æ±‚å¾—çš„åŸºéŸ³å‘¨æœŸåœ¨å¸§å†…é‡‡ç”¨æ’å€¼æŠ€æœ¯æ ‡æ³¨,å‡†ç¡®æ€§ä¸é«˜;è€Œç”¨Hilbert-Huangå˜æ¢æ–¹æ³•ç»™è¯éŸ³ä¿¡å·åšåŸºéŸ³æ ‡æ³¨,åŸºæœ¬æ£€æµ‹å‡ºäº†ä¸€æ®µè¯éŸ³ä¿¡å·çš„æ‰€æœ‰åŸºéŸ³å³°å€¼ç‚¹,ä½“çŽ°å‡ºæ¯å¸§å†…å¾®å°çš„å‘¨æœŸå˜åŒ–,æ¯”é€šå¸¸ä½¿ç”¨çš„è‡ªç›¸å…³æ–¹æ³•å‡†ç¡®æ€§é«˜ã€‚æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ Pitch period of speech signal is a very important character parameter to describe the excitation source. Detecting the pitch period of speech signal accurately has very important significance for speech analysis and synthesis, speech compression and coding, speech recognition. The paper discusses several common methods for pitch period detection and wavelet transform, Hilbert-Huang transform, this paper proposes the algorithm of AutoCorrection Energy Function (ACEF) combined with Magnitude Difference Energy Function (MDEF) which has good performance in anti-noise, meanwhile applies the Hilbert-Huang transform to pitch synchronous mark of TD-PSOLA speech synthesis system.This paper first introduces some kinds of commonly used speech pitch period detection. For example AutoCorrection Function (ACF), Average Magnitude Difference Function (AMDF), cepstrum etc. ACF is suitable for noise environment, but it is possible to produce the situation that period estimating results is double or half times of the actual results, AMDF and cepstrum can receive good detection results under silence environment or less noisy environment, but the decline of the result is fast under bad environment or low SNR environment and the result is difficult to be satisfacted. Therefore, we proposed a method which has good anti-noise performance--AutoCorrection Energy Function (ACEF) combined with Magnitude Difference Energy Function (MDEF), It improves the anti-noise performance, compensates the shortcomings of traditional pitch period detection method effectively.Next, The paper introduces the wavelet transform theory, including continuous wavelet transform, discrete wavelet transform, multi-resolution analysis, Mallat algorithm, Etc. This paper proposed a method of pitch period detection based on Mallat algorithmâ€”wavelet decomposition and reconstruction algorithm (high frequency set 0) and trous algorithm which is derivated from Mallat algorithm. Mallat algorithm decompose speech signal directly, it needs to drop sampling, the length of each level of decomposition component is half of the length of decomposition component of the last level, but the trous algorithm interpolates to the filter coefficients directly, the length of each level of decomposition component is equal to the length of the original signal, it is conducive to pitch period extraction.This paper introduces Hilbert-Huang transform and applies it in pitch period detection, Comparing with traditional methods, Hilbert-Huang transform doesnâ€™t need to do assumption of short-term stationary for speech signal and has highly detection accuracy, widely application scope, The length of frame greatly increases. Comparing with wavelet transform, Hilbert-Huang transform decomposes signal according to signalâ€™ own information, changes with signal itself, it reflect the real physical information of the signal and has a better adaptability and superiority.In paper. Hilbert-Huang transform is applied in pitch mark of TD-PSOLA speech synthesis system, it expands the application scope of Hilbert-Huang transform. The experiment shows: The commonly used methods only can achieve an average pitch period of each frame, and then mark the pitch period by interpolation technology, the accuracy is not high. Marking pitch period by Hilbert-Huang transform can detect almost all the pitch peaks, reflect small changes in the frame, it has highly accuracy than ACF.æ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ åŸºéŸ³å‘¨æœŸæ£€æµ‹ï¼› å°æ³¢å˜æ¢ï¼› Hilbert-Huangå˜æ¢ï¼› è¯éŸ³åˆæˆï¼›
ã€Key wordsã€‘ pitch period detectionï¼› wavelet transformï¼› Hilbert-Huang transformï¼› speech synthesisï¼›

ã€ç½‘ç»œå‡ºç‰ˆæŠ•ç¨¿äººã€‘ å¤ªåŽŸç†å·¥å¤§å¦

ã€åˆ†ç±»å·ã€‘TN912.3
ã€è¢«å¼•é¢‘æ¬¡ã€‘10
ã€ä¸‹è½½é¢‘æ¬¡ã€‘649

çŸ¥ç½‘èŠ‚ä¸‹è½½

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

æœ¬æ–‡çš„å¼•æ–‡ç½‘ç»œ

èŠ‚ç‚¹æ–‡çŒ®

èŠ‚ç‚¹æ–‡çŒ®

åŸºéŸ³å‘¨æœŸæ£€æµ‹ç®—æ³•ç ”ç©¶åŠåœ¨è¯­éŸ³åˆæˆä¸­çš„åº”ç”¨

Study of Speech Pitch Period Detection Algorithm and the Application in Speech Synthesis System

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

åŸºéŸ³å‘¨æœŸæ£€æµ‹ç®—æ³•ç ”ç©¶åŠåœ¨è¯éŸ³åˆæˆä¸çš„åº”ç”¨