èŠ‚ç‚¹æ–‡çŒ®

ä½Žé€ŸçŽ‡è¯éŸ³ç¼–ç ç®—æ³•ç ”ç©¶

Research on Low Bit Rate Speech Coding Algorithm

åˆ†é¡µä¸‹è½½
åˆ†ç« ä¸‹è½½
æ•´æœ¬ä¸‹è½½
åœ¨çº¿é˜…è¯»
ä¸æ”¯æŒè¿…é›·ç‰ä¸‹è½½å·¥å…·ï¼Œè¯·å–æ¶ˆåŠ é€Ÿå·¥å…·åŽä¸‹è½½ã€‚

ã€ä½œè€…ã€‘ è®¡å“²ï¼›

ã€ä½œè€…åŸºæœ¬ä¿¡æ¯ã€‘ æ¸…åŽå¤§å¦ ï¼Œ ä¿¡æ¯ä¸Žé€šä¿¡å·¥ç¨‹ï¼Œ 2011ï¼Œ åšå£«

ã€æ‘˜è¦ã€‘ ä½Žé€ŸçŽ‡è¯éŸ³ç¼–ç ç®—æ³•åœ¨çŽ°ä»£é€šä¿¡ç³»ç»Ÿä¸æœ‰ç€éžå¸¸å¹¿æ³›çš„åº”ç”¨ï¼Œè¶…ä½Žé€ŸçŽ‡ä¸‹çš„è¯éŸ³åŽ‹ç¼©ç¼–ç ç®—æ³•æ˜¯ç›®å‰è¯éŸ³ä¿¡å·å¤„ç†é¢†åŸŸçš„é‡è¦ç ”ç©¶è¯¾é¢˜ä¹‹ä¸€ã€‚æ£å¼¦æ¿€åŠ±çº¿æ€§é¢„æµ‹ï¼ˆSinusoidal excitation linear prediction, SELPï¼‰ç¼–ç ç®—æ³•é‡‡ç”¨åŸºäºŽçº¿æ€§é¢„æµ‹çš„æ£å¼¦æ··åˆæ¿€åŠ±æŠ€æœ¯ï¼Œåœ¨2.4kbpsåŠæ›´ä½Žé€ŸçŽ‡çš„è¯éŸ³åŽ‹ç¼©ç¼–ç ç®—æ³•ä¸å…·æœ‰éžå¸¸ä¼˜è¶Šçš„æ€§èƒ½ã€‚è®ºæ–‡çš„ç ”ç©¶ç›®çš„æ˜¯åœ¨SELPæ¨¡åž‹çš„åŸºç¡€ä¸Šï¼Œå¯¹è¯éŸ³ç¼–ç ç®—æ³•ä¸çš„å…³é”®æŠ€æœ¯è¿›è¡Œåˆ†æžå’Œç ”ç©¶ï¼Œè®¾è®¡å®žçŽ°150bpsçš„è¶…ä½Žé€ŸçŽ‡è¯éŸ³åŽ‹ç¼©ç¼–ç ç®—æ³•ã€‚è®ºæ–‡é¦–å…ˆæå‡ºäº†é«˜æ•ˆçš„ç‰¹å¾å‚æ•°é‡åŒ–ç®—æ³•ã€‚åœ¨çº¿è°±é¢‘çŽ‡å‚æ•°ï¼ˆLine spectralfrequency, LSFï¼‰çš„æ ‡é‡é‡åŒ–ä¸ï¼Œæå‡ºäº†åŸºäºŽåŠ¨æ€è§„åˆ’çš„å…¨å±€æœ€ä¼˜LSFå·®å€¼é‡åŒ–ç®—æ³•ï¼Œå¹¶é‡‡ç”¨å¤šç æœ¬è¿›ä¸€æ¥æé«˜å‚æ•°çš„é‡åŒ–æ€§èƒ½ï¼Œè¯¥ç®—æ³•èƒ½å¤Ÿåœ¨æ¯å¸§28bitsè¾¾åˆ°LSFå‚æ•°çš„é€æ˜Žé‡åŒ–ã€‚åœ¨å¯¹åŸºéŸ³å‘¨æœŸå‚æ•°è¿›è¡ŒçŸ¢é‡é‡åŒ–æ—¶ï¼Œåˆ©ç”¨äººè€³çš„å¬è§‰ç‰¹æ€§ï¼Œæå‡ºäº†åŸºäºŽæ„Ÿè§‰åŠ æƒçš„å¤±çœŸåº¦é‡å‡†åˆ™ï¼Œæé«˜äº†å‚æ•°çš„é‡åŒ–æ€§èƒ½ï¼Œå¹¶è®¾è®¡äº†ä¸€ç§ç å—æœç´¢çš„æ•´åž‹ä¼˜åŒ–ç®—æ³•ï¼Œé™ä½Žäº†åŸºéŸ³å‘¨æœŸæœ€ä¼˜ç å—çš„è¯¯æœç´¢æ¦‚çŽ‡ã€‚é’ˆå¯¹è¶…ä½Žé€ŸçŽ‡è¯éŸ³ç¼–ç ç®—æ³•ä¸ï¼Œç‰¹å¾å‚æ•°é‡åŒ–æ¯”ç‰¹ä¸è¶³çš„é—®é¢˜ï¼Œæå‡ºäº†åˆ©ç”¨å‚æ•°é—´ç›¸å…³æ€§çš„ç‰¹å¾å‚æ•°è§£ç ç«¯æ¢å¤ç®—æ³•ã€‚é¦–å…ˆæå‡ºåŸºäºŽéšé©¬å°”å¯å¤«æ¨¡åž‹ï¼ˆHiddenMarkov model, HMMï¼‰çš„èƒ½é‡å‚æ•°æ¢å¤ç®—æ³•ï¼Œæ ¹æ®LSFå‚æ•°å’Œåå¸¦æ¸…æµŠéŸ³ï¼ˆUnvoiced/Voiced, U/Vï¼‰å‚æ•°ä¼°è®¡èƒ½é‡å‚æ•°çš„å˜åŒ–è½¨è¿¹ã€‚éšåŽæå‡ºåŸºäºŽé«˜æ–¯æ··åˆæ¨¡åž‹ï¼ˆGaussian Mixed Model, GMMï¼‰çš„U/Vå‚æ•°æ¢å¤ç®—æ³•ï¼Œåˆ©ç”¨LSFå‚æ•°å’Œå½’ä¸€åŒ–èƒ½é‡å‚æ•°ï¼Œå¯¹U/Vå‚æ•°çš„æ¦‚çŽ‡åˆ†å¸ƒç‰¹æ€§è¿›è¡Œä¼°è®¡ï¼Œä»Žè€ŒèŠ‚çœäº†å‚æ•°é‡åŒ–æ‰€éœ€çš„æ¯”ç‰¹æ•°ã€‚éšåŽï¼Œä»Žè§£ç ç«¯è§’åº¦è€ƒè™‘ï¼Œæå‡ºäº†ç‰¹å¾å‚æ•°æ’å€¼æ–¹å¼çš„æ”¹è¿›ç®—æ³•ï¼Œä»¥æé«˜æ¸…æµŠéŸ³è¿‡æ¸¡æ—¶å£°ç å™¨çš„åˆæˆè¯éŸ³è‡ªç„¶åº¦ã€‚ä¸ºäº†æé«˜å£°ç å™¨çš„æŠ—è¿žç»ä¸¢åŒ…å¤„ç†èƒ½åŠ›ï¼Œæå‡ºåŸºäºŽåˆ†æ¨¡å¼çº¿æ€§é¢„æµ‹çš„ä¸¢åŒ…éšè—ç®—æ³•ï¼Œæ”¹å–„äº†è¿žç»ä¸¢åŒ…æƒ…å†µä¸‹çš„åˆæˆè¯éŸ³è´¨é‡ã€‚æœ€åŽï¼Œç»¼åˆä¸Šè¿°ç ”ç©¶æˆæžœï¼Œè®¾è®¡å¹¶å®žçŽ°äº†150bps SELPè¯éŸ³ç¼–ç ç®—æ³•ï¼Œåˆæˆè¯éŸ³çš„å®¢è§‚å¹³å‡æ„è§åˆ†ï¼ˆMean Opinion Score, MOSï¼‰ä¸º2.424ï¼Œåˆ¤æ–éŸµå—æµ‹è¯•ï¼ˆDiagnostic rhyme test, DRTï¼‰çš„å‡†ç¡®çŽ‡è¾¾åˆ°82.9%ï¼Œç æœ¬å˜å‚¨é‡ä¸º120Kwordï¼Œç®—æ³•å»¶æ—¶ä¸º325msï¼Œæ€»ä½“æ€§èƒ½æŒ‡æ ‡è¶…å‡ºå›½å®¶åä¸€äº”ä¸“é¡¹é¡¹ç›®çš„è¦æ±‚ã€‚æ›´å¤š è¿˜åŽŸ

ã€Abstractã€‘ The low bit rate speech coding algorithm is widely used in modern communicationsystem, and the ultra low bit rate speech compression coding is one of the mostsignificant research topics in speech signal processing area at present. Sinusoidalexcitation linear prediction (SELP) algorithm uses linear-prediction based sinusoidalmixed excitation technique, and has very outstanding performance among the speechcompression coding algorithms at the bit rate of2.4kbps or less. The research purposeof this dissertation is to analyze and research the essential techniques in speech coding,and design the150bps ultra low bit rate speech compression coding algorithm based onSELP model.The high-efficiency quantization methods of characteristic parameters areresearched first. In the scalar quantization of line spectral frequency (LSF), the globaloptimal difference quantization of LSF based on dynamic programming is proposed. Ituses multi-codebook to further improve the parameterâ€™s quantization performance, andcan attain the transparent quantization of LSF at the rate of28bits/frame. In the vectorquantization of pitch parameter, the perceptual weighting distortion measure whichutilizes the auditory characteristics of human ears is proposed to improve thequantization performance of pitch, and the integer changed optimization technique isdeveloped to further reduce the search error rate of the optimal codeword for pitchparameter.In the ultra low bit rate speech coding, the bits assigned to each frame is severelyinadequate to quantize the characteristic parameters. In order to solve this problem, therecovery algorithm of characteristic parameters in the decoder is proposed based on thecorrelation between different parameters. First the energy is recovered based on thehidden Markov model (HMM). It utilizes the LSF and the sub-band unvoiced andvoiced (U/V) parameters to estimate the change of energy parameters. Then the U/Vrecovery algorithm is proposed based on the Gaussian mixed model (GMM), whichutilizes the LSF and the normalized energy to estimate the probability distribution ofU/V parameter, so as to save the bits assigned to quantizing it.From the consideration of the decoding end, the interpolation algorithm for the characteristic parameters is developed to improve the naturalness of synthesized speechin the transition period from unvoiced speech to voiced speech. In order to improvevocoderâ€™s resistance to packet loss, mode-based linear prediction packet lossconcealment algorithm is propose, which can improve the synthesized speech qualityunder the existence of consecutive packet loss.Finally, integrating the research achievements mentioned above, the150bps SELPspeech coding algorithm is designed and realized. The vocoderâ€™s mean opinion score(MOS) is2.424, the accurate rate of the diagnostic rhyme test (DRT) is82.9%, thecodebook size is120Kword, and the algorithm delay is325ms. To sum up, the entireperformance index of the150bps SELP vocoder exceeds the requirement of the nationalEleventh Five-Year major project.æ›´å¤š è¿˜åŽŸ

ã€å…³é”®è¯ã€‘ è¯éŸ³ç¼–ç ï¼› è¶…ä½Žé€ŸçŽ‡ï¼› ç‰¹å¾å‚æ•°é‡åŒ–ï¼› å‚æ•°è§£ç ç«¯æ¢å¤ï¼›
ã€Key wordsã€‘ speech codingï¼› ultra low bit rateï¼› quantization of characteristicparameterï¼› parameter recovery in the decoderï¼›

ã€ç½‘ç»œå‡ºç‰ˆæŠ•ç¨¿äººã€‘ æ¸…åŽå¤§å¦

ã€åˆ†ç±»å·ã€‘TN912.3
ã€è¢«å¼•é¢‘æ¬¡ã€‘4
ã€ä¸‹è½½é¢‘æ¬¡ã€‘487
æ”»è¯»æœŸæˆæžœ

çŸ¥ç½‘èŠ‚ä¸‹è½½

èŠ‚ç‚¹æ–‡çŒ®ä¸ï¼š

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

æœ¬æ–‡çš„å¼•æ–‡ç½‘ç»œ

èŠ‚ç‚¹æ–‡çŒ®

èŠ‚ç‚¹æ–‡çŒ®

ä½Žé€ŸçŽ‡è¯­éŸ³ç¼–ç ç®—æ³•ç ”ç©¶

Research on Low Bit Rate Speech Coding Algorithm

æœ¬æ–‡é“¾æŽ¥çš„æ–‡çŒ®ç½‘ç»œå›¾ç¤º:

ä½Žé€ŸçŽ‡è¯éŸ³ç¼–ç ç®—æ³•ç ”ç©¶