节点文献

多音音乐音高估计研究

Research on Polyphonic Music Pitch Estimation

【作者】 段志尧

【导师】 张长水;

【作者基本信息】 清华大学 , 控制科学与工程, 2008, 硕士

【摘要】 多音音高估计(基频估计)是音乐信息检索领域非常重要而困难的研究方向之一,其基本任务就是估计多音音乐每一时刻音符的音高(基频)和数目。音符的发音时间、结束时间的估计有时也列入其中。本文从音乐信息检索的大背景出发,介绍了多音音乐音高估计研究的基本任务、研究价值及与其它研究的关系,然后较系统得回顾了一些有代表性的音高估计算法。在此基础上,本文提出了两个新算法。第一个算法是针对单帧信号的基于最大似然频谱建模的多音音高估计算法。与以往对于整个频谱建模的方法不同,该算法把信号的频谱简化为幅度谱的峰值和非峰区域,峰值进一步简化为其频率和幅度。最大似然模型也随之分为峰值似然和非峰区域似然两个部分。在建模峰值似然时,考虑到峰值检测算法的检测错误,我们提出了“真”峰和“假”峰的概念,并分别建模。在建模非峰区域似然时,我们用该区域未检测到由谐频产生的峰的概率作为似然函数。这两部分似然模型关注的焦点不同,互为补充。我们通过单音训练数据学习这些模型的参数,因为在单音数据中,“真”峰和“假”峰可以比较可靠的区分开来。我们还采用了一种加权的贝叶斯信息准则来估计音符个数。最后,该算法在由真实乐器音符合成的随机和弦和音乐和弦上进行测试,取得了不错的结果。第二个算法是针对多帧信号的基于计算听觉场景分析的多音音高估计算法。在该算法中,我们模仿人脑的声音感知规则,对信号频谱的时频成分做聚集。具体来说,我们在信号连续的频谱中定义了谐波事件的概念,每一个谐波事件是一个四元组(频率、幅度、发音时间、结束时间)。对于待处理的音乐,我们提取其所有的谐波事件并组成一个集合,集合中的每个事件都是基频事件的候选。我们设计了一个支持度传递的算法让这些谐波事件互相投票,选出支持度最高的事件作为基频。该算法在由真实乐器音符合成的随机和弦,以及计算机合成的重奏音乐上进行测试,取得了不错的结果。

【Abstract】 Multi-Pitch Estimation (MPE), or say Multiple Fundamental Frequency (F0) Es-timation, is one of the most important and di?cult issues in the area of Music Informa-tion Retrieval (MIR). Its main tasks are to estimate the pitches (F0s) and their number(polyphony) at any time. Sometimes the note onsets and o?sets are also should beestimated.This thesis starts from introducing the background of the MIR research, thenpresents the main tasks, research values and relations to other research. After that,it systematically reviews some typical pitch estimation algorithms. Finally, it proposestwo new algorithms.The first algorithm is a signle-frame MPE algorithm based on maximum likeli-hood spectral modeling. Di?erent from the traditional whole spectral modeling meth-ods, this algorithm reduces the frequency spectrum into peaks and non-peak areas inthe amplitude spectrum, and the peaks are further reduced into their frequencies andamplitudes. Along with the reductions, the maximum likelihood model is split into twoparts: the peak likelihood and the non-peak area likelihood. In modeling the peaks, theconcepts of“true”and“false”peaks are proposed and modeled separately, to cope withthe errors in the peak detection method. In modeling the non-peak area likelihood, theprobability that the peaks which are generated by the harmonics but not detected is setto the likelihood function. The two parts of the likelihood function models di?erentaspects and are complementary. Their parameters are learned from monophonic train-ing data, where the“true”and“false”peaks are easy to be discriminated. A weightedBayesian Information Criteria (BIC) is employed to estimate the polyphony. Finally,the algorithm is tested on random chords and musical chords, which are both generatedusing the real instrumental notes. The experimental results are promising.The second algorithm is a multiple-frame MPE algorithm based on ComputationalAuditory Scene Analysis (CASA). In this algorithm, we simulate the auditory cues of human perception, to group the time-frequency components. More concretely, the con-cept of partial event is defined. Each partial event is a four-element vector (frequency,amplitude, onset and o?set). For a piece of music to be processed, all its partial eventsare extracted to compose a set, in which each event is a candidate of the F0 event. Thena support transfer algorithm is designed to make the events vote to each other, to electthe ones with highest degrees to be F0s. The proposed algorithm is tested on randomchords which are generated using real instrumental notes, and on computer-synthesizedchamber music. The results are promising.

  • 【网络出版投稿人】 清华大学
  • 【网络出版年期】2009年 09期
  • 【分类号】TP391.3;TN912.3
  • 【被引频次】2
  • 【下载频次】241
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络