节点文献
基于内容的音频信息检索技术研究
Research on Content-Based Audio Information Retrieval Technology
【作者】 郑贵滨;
【导师】 韩纪庆;
【作者基本信息】 哈尔滨工业大学 , 计算机应用技术, 2006, 博士
【摘要】 随着现代信息技术、多媒体技术和网络技术的迅速发展,多媒体信息的数据量急剧增多。为了充分利用已有的音频信息资源,基于内容的音频信息检索技术受到越来越多的关注。音频数据的存在形式有静态与动态之分,在检索层次上也有表示级检索和语义级检索之分。音频数据的形态不同、检索层次不同,需要的检索方法也不同。尽管相关的研究很多,但音频检索技术仍然存在大量问题亟需解决。主要问题有:大多数检索算法在有噪声的情况下检索性能明显下降;音频数据维数高且具有时序性,构建索引非常困难;缺少针对动态音频检索的研究;音频形式的音乐由于获取语义信息困难,语义级检索的研究难度大、进展缓慢。从整体来看,音频检索技术尚处于实验探索阶段,缺少实用化的技术与系统。本文针对音频检索技术存在的问题,在以下方面对音频检索技术开展了研究工作:1、针对表示级的静态音频检索问题,提出了基于响度主分量特征的模糊直方图音频检索方法。在直方图模型设计中,根据响度数据的统计分布对直方图模型进行优化。并采用模糊直方图进一步提高直方图模型对噪声和响度数值扰动的鲁棒性。在检索时,利用活动搜索算法提高检索速度。实验结果表明,该方法具有较好的噪声鲁棒性(Robustness)。2、针对表示级的静态音频索引问题,提出了基于响度主分量模糊直方图的索引方法。采用响度主分量模糊直方图表示音频数据后,长度不同的两段音频数据,只要长度倍数不超过一定限度,其直方图相似度均能正确反映二者之间的包含关系。根据这一特点,提出了二叉树与链表相结合的索引方法。在检索过程中,根据检索目标的长度及长度倍数上限值在索引中选择合适的搜索层次范围。实验结果表明,该索引可大幅度地提高检索速度。3、针对表示级的动态音频检索问题,提出了基于分段的实时音频检索方法。该方法将检索目标划分为片段序列,并使用检索窗控制参与检索的片段。研究了算法中灵活的目标检出判别标准、快速检索控制策略、检索反应滞后时间估计数学模型、基于音频分类的多目标快速检索方法等问题。实验结果表明,该方法的速度快、可控性好、检索反应延迟小、对检索目标发生部分残缺以及噪声均具有较好的鲁棒性。
【Abstract】 Rapid development of modern information technology, multimedia technology and network technology has resulted in large and ever-increasing stores of multimedia data. Therefore, content-based audio information retrieval (CBAIR) technology has been attracting more and more attention to the efforts to make full use of existing audio information. Audio data can be static or dynamic, and the audio retrieval can be at expression level or semantic level. Different audio form and different retrieval level requires different retrieval methods. Although much work has been done in relevant researches, there still exist many unsolved difficulties in CBAIR area. The major difficulties include the following: the performance of most present retrieval methods deteriorates dramatically under noise; it is very difficult to index audio data which is highly dimensional and of time sequence); little work has been done on dynamic audio retrieval; research in the semantic-level retrieval for audio music progresses hard and slowly due to the serious difficulties in extracting semantic information from audio music). Generally speaking, CBAIR technology is still at experimental stage and lacks applicable technology and system.Centering around the problems existing in CBAIR, this dissertation studies the following problems:1. For the problem of expression-level retrieval of static audio, fuzzy histogram audio retrieval method based on principal loudness component is developed. In the design of histogram model, statistical distribution of loudness is used to optimize the histogram model. At the same time, fuzzy histogram is used to improve the robustness of histogram model to noise and small change in loudness value. Active search method is used in the histogram retrieval. Experimental results show that the method has better robustness to noise.2. For the problem of expression-level indexing of static audio, a novel indexing method based on fuzzy histogram of principal loudness component is presented. When audio data is expressed by fuzzy histogram of principal loudness component, the similarity between their histograms can correctly reflect
【Key words】 Audio information retrieval; Fuzzy histogram; Segmentation based retrieval; Multi-target retrieval; Multi-pitch estimation;