节点文献
基于音频指纹和版本识别的音乐检索技术研究
The Music Retrieval Technology Based on Audio Fingerprint and Version Identifeication
【作者】 郭永帅;
【导师】 郑贵滨;
【作者基本信息】 哈尔滨工业大学 , 计算机科学与技术, 2014, 硕士
【摘要】 基于内容的音乐检索是当前音频检索的热门领域,而且随着在线音乐量的不断增加,其应用价值也越来越大。另一方面,用户的检索需求也在变化,他们往往不满足于仅仅获得与查询完全相同的歌曲,还希望获得目标音乐的多个版本,比如不同歌手、不同场合演唱的版本。随着网络自媒体的发展和业余翻唱的普及,这种需求也越来越明显。基于内容的音乐检索分别从查询音乐和样例音乐提取特征,然后进行特征匹配来检索与查询相同的样例音乐。在样例检索中使用的特征通常称为音频指纹,其追求格式紧凑简洁,倾向于匹配内容相同的音乐片段,而音乐版本特征表达复杂,倾向于匹配版本特征相同的片段,而内容并不一定相同。因此本文对两者分开处理,音乐版本识别可以在规范样例库中离线进行,而基于音频指纹的检索实时进行,对于指纹检索命中样例,可以根据版本识别结果马上给出相关样例(即该歌曲的其它版本)。由于人类听觉性能良好,本文希望从基于听觉机理的特征出发来构建音频指纹。在分析人耳的生理特征后,本文使用余弦基和发放函数来仿真耳蜗对声音的处理流程,然后使用稀疏分解得到特征系数。为了克服分解耗时较高的问题,提出了基于匹配追踪算法的快速特征提取方法。由于基于听觉机理的稀疏特征形式复杂,并不适于直接用来检索,本文将其压缩转换为音频指纹。应用的主要方法包括使用最小哈希对高维二值序列特征进行降维,以及使用局部敏感哈希进行快速检索,然后给出相应的候选确认和样例检出方法。实验表明该指纹特征具有较好的检索效率和表达性,对于轻微噪声和时域全局性变化的鲁棒性较好,但对时域局部变化鲁棒性较差。在音乐版本识别方面,本文首先分析了音乐版本领域内的基础定义、主要问题和通用处理方法。通过对识别流程梳理和各种方法比较分析,构建出完整的音乐版本识别方法。本文对常用的谐波音级轮廓特征进行了改进,加入节拍和调移信息并作为版本识别的核心特征,而且在特征计算前应用了必要的预处理步骤,包括峰值估计、节拍估计和参照频率估计等。实验结果显示本文构建的版本识别方法是有效的。
【Abstract】 Content-based music retrieval is one hot area of music retrieval. The significantincrease of online music has enhanced its value of application. On the other hand,users’ retrieval demand are also in the change, they are never satisfied with just getthe same song as query, but also want to get multiple versions of music, such asdifferent singers sung the song at different occasions.With the development of thewe-media and the popularity of amateur cover, this demand is becoming more andmore obvious.Content-based music retrieval system will extract feature from the query musicand sample music respectively, and then retrieve the sample which is same as queryby feature matching. The features used in retrieval are often referred to as audiofingerprints, whose format are compact, and tend to match music clips have samecontent. On the contrary, the features used to express music versions are morecomplex, and tend to match music clips sharing the same version features, while thecontent is not necessarily the same. Thus in this paper, the music versions retrievalwill be divided into two processing: audio-fingerprint-based music retrieval andmusic version identification. Music version identification can be done offline in thenormative sample library, while the retrieval based on the audio fingerprint isprocessed in real time, thus for samples which have been retrieved, the versionidentification can provide version-relevant samples immediately(the songs of otherversions for the same music).Due to the outstanding performance of human hearing, I plan to extract featurethat can satisfy the mechanism of acoustic and then construct audio fingerprint fromit. After analyzing the physiological characteristics of the human ear, this paperused the cosine-base and distribution function to simulate the process of the cochleato sounds, and then used the sparse decomposition coefficient as feature. In order toovercome the problem of higher decomposition time, the paper put forward a rapidfeature extraction method based on matching pursuit algorithm.Due to its complex form, the sparse feature based on mechanism of auditorydoes not suitable to sample music retrieval, this paper proposed a series of methodsfor feature quantitative and compression, mainly including the use of MinHash to reduce dimension of the high-dimensional binary sequence feature, using localsensitive hash for rapid retrieval, and then gave the corresponding candidateconfirmation and sample detection method. Experiments showed that the fingerprintfeature has good retrieval efficiency and expressive ability, and has good robustnessto minor noise and global change in time domain, but poor robustness to changes inlocal time domain.In terms of music version identification, this paper analyzed the basic definition,main problems and general processing method of version identification field.Through carding processs for identification and comparative analysis of variousmethods, I constructed a complete music version identification system. In this paper,the commonly used harmonic pitch class profile (HPCP) feature was improved byadding the beat and transposition information. This improved-HPCP is the corefeature of music version identification. This paper also applyed some necessarypreprocessing steps before feature calculation, including peak estimation, beatdetection and reference frequency estimation and so on. Experimental resultsshowed that the builded music version identification system is effective.
【Key words】 audio fingerprint; version; sparse decomposition; local sensitive hash; harmonic pitch class profile;
- 【网络出版投稿人】 哈尔滨工业大学 【网络出版年期】2015年 02期
- 【分类号】TN912.34
- 【下载频次】103