节点文献

歌词识别辅助的音乐检索研究

A Study of Lyric Recognition-Assisted Music Information Retrieval

【作者】 郭芝源

【导师】 郭军;

【作者基本信息】 北京邮电大学 , 信号与信息处理, 2013, 博士

【摘要】 随着数字技术的高速发展以及互联网、无线网的高度普及,数字音乐的获取变得非常容易。如何从海量的数字音乐中检出用户需要的音乐,已成为当前亟待解决的问题。基于内容的音乐检索,例如样例检索、哼唱检索,采用音乐本身的特征进行音乐检索,人工标注量小,用户使用方便,已成为主流研究方向。现有音乐检索系统通常仅使用旋律特征对音乐进行查找,当演唱者出现哼唱错误时,易导致检索失败。歌词是歌曲除了旋律之外的另一个重要组成部分,它存在于口语或者音乐中,在很多情况下可以辅助旋律特征提高音乐检索的精度。本文围绕如何利用歌词辅助音乐检索,对口语歌词的识别、基于口语歌词的音乐检索方法,以及清唱音乐的歌词识别、基于歌词和旋律的哼唱检索方法等关键问题进行了深入研究。本文的主要工作及创新包括以下几个方面:1.提出了一种基于词激活力的类的语言模型口语歌词识别中语言模型数据稀疏问题较为突出。为了提高口语歌词识别的准确率,本文围绕数据稀疏问题进行了相关研究。基于类的语言模型与基于词的语言模型插值是常用的解决语言模型数据稀疏问题的方法。但是基于类的语言模型的性能依赖于词类的性能。基于词激活力的亲和度测度在描述词语相似度上取得了很好的效果,本文使用该测度对词进行聚类,并使用聚类结果训练类的语言模型,称之为基于词激活力的类的语言模型。由于同一词类中词相似性强,基于词激活力的类的语言模型能够获得比经典的基于类的语言模型更优越的性能。实验结果表明,基于词激活力的类的语言模型与基于词的语言模型的插值模型在口语歌词识别任务中表现出了优越性能。2.提出了一种基于多层滤波的检索算法口语歌词经过识别后,如何快速准确地查找到目标歌词是基于口语歌词的音乐检索的关键问题。为此,本文提出了一种基于多层滤波的检索算法。该算法首先对识别结果进行查询扩展,针对完全识别正确的识别结果,第一层滤波器利用索引能够快速匹配到目标歌曲;针对出现误识的识别结果,第二层滤波器能够找到一个较小的候选集合;第三层滤波器采用基于声学相似度的模糊匹配算法实现候选集合与识别结果的精确匹配。实验证明,本文提出的基于多层滤波的检索算法显著提高了基于口语歌词的音乐检索系统的性能。3.提出了一种歌词识别辅助的哼唱检索算法利用歌词特征辅助哼唱检索是一个值得研究的难点问题。现有的方法采用连续语音识别技术直接对音乐中的歌词进行识别,由于识别出的歌词不够准确,因此性能提升并不明显。本文提出了一种歌词识别辅助的哼唱检索算法,该算法首先利用旋律特征找到多个候选音乐片段,然后利用候选音乐片段的歌词搭建识别网络,并采用孤立词识别技术实现歌词识别,最后结合旋律匹配和歌词匹配的结果对歌曲进行排序。本文提出的算法利用旋律检索显著缩减了歌词识别的范围,大幅度提高了识别准确率。实验证明,歌词识别辅助的哼唱检索算法能够有效地利用音乐中的歌词信息,显著提高哼唱检索系统的性能。

【Abstract】 With the rapid development of digital technique and the population of networks, it becomes very easy to access a large quantity of digital music. At the same time, music information retrieval (MIR), which aims to search music from a large-scale music database, becomes an important and challenging research topic. The recent developed content based MIR systems, which work based on content features, such as melody, rhythm, etc, provide richer music retrieval methods for users, and it has become a very popular research topic.However, most of MIR systems only make use of melody to match music. Since most of users are non-professional singers, it is very likely that the input queries contain melody errors. In this case, MIR systems, which are only based on melody features, may result in a failed retrieval. In fact, lyric, which is not taken into account in such a MIR system, provides additive complementary information for song identification. This paper tries to improve MIR systems by adding lyric. We focues on two key problems:the first one is extracting lyric from spoken or singing queries, and the second one is searching method. The main contributions and innovations in this paper are described as follows:1. Word activation force based language modelsThe problem of sparse data becomes an outstanding issue when constructing n-gram language models for lyric recognition of spoken queries. To improve the lyric recognition accuracy, this paper pays attention to this problem.Class-based language models suggest an appealing way to solve the problem of sparse data, but the performance of class-based language models depends on the word classes. The word activation force (WAF) based affinity measure has been proved to be effective to measure the similarity between two words. In this paper, we first apply the affinity measure to measure the similarity between two words, and then employ normalized spectral clustering to group words into word classes. Based on word classes,we can easily get a class-based language model. At last, we interpolate our WAF-based language model with a classic word-based n-gram model. Experimental results show the effectiveness of such interpolated model.2. A multilayer filter-based searching method for a MR system using spoken queriesThis paper proposes a multilayer filter-based method for searching the target lyric fast and accurately from the lyric database. The proposed method uses multiple hypothesis of recognition output for matching. For each hypothesis, if it is correctly recognized, the level-1filter can fast find the target songs using indexes; while if the level-1filter can not find any "matched" songs, the level-2filtering is performed to pre-select the probable lyric candidates; and then the acoustic similarity between a lyric candidate and its corresponding hypothesis can be calculated using the level-3filtering. Experimental results show the effectiveness of the proposed method.3. A lyric recognition-assisted Query-by-Singing/Humming (QBSH) methodAdding lyric to help QBSH systems is intuitive but challenging. The existing methods use a large vocabulary continues speech recognizer (LVCSR) for lyric recognition of singing queries, but the extracted lyric is inaccurate. This paper proposes a lyric recognition-assisted QBSH method. Before lyric recognition, we first pre-select candidates using melody matching methods; after that, we build a recognition network using the lyrics of candidates; and then, we use the isolated-word recognition technique for lyric scoring; at last, candidates are ranked according to their melody matching and lyric scoring results. In our experiments, a significant improvement is achieved by the proposed method.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络