节点文献

辅助视频内容分析的音频技术研究与实现

【作者】 程捷

【导师】 吴玲达;

【作者基本信息】 中国人民解放军国防科学技术大学 , 电子与信息工程, 2003, 硕士

【摘要】 纵观多媒体研究,在多媒体自动内容处理方面变得越来越重要。自动视频内容分割、视频类型识别、自组织的视频库都是研究者的首选方向。 但是,目前仅仅根据视频数据来提取视频内容难度很大。单独依靠现有的视频处理技术不能准确地分割视频场景和对视频内容进行分类。而使用对应的音频和文字信息处理来辅助视频流分析,就可以较好地解决视频场景分割和视频内容分类的问题。 而且,近几年来,音频处理技术发展迅速,语音识别技术已趋于成熟,对于大词汇量连续语音识别率很高。利用语音识别技术和自然语言处理技术对音频流中的语音段进行处理,就可以解决音频内容的提取和分类问题,这样就更有利于检索的进行,进而可以对所对应的视频段进行内容分类,这些都为我们的研究创造了条件。 本文提出了一种实用而高效的基于内容的音频辅助视频内容分析技术,并实现了较完善的音频辅助新闻视频场景分割和新闻视频内容分类。 1.通过对音频特性的研究,详细分析了音频的物理和生物学特征,在经典的短时分析的基础上,提出了一系列音频特征的提取方法。 2.通过对音频数据特点的研究,根据辅助视频分析的需要,提出了一套音频数据的基于内容的分析方法,包括长时间音频分类与分段,说话人改变探测和语音文字内容提取。 3.结合音频分析方法,提出了一套音频辅助视频分析方法,将视频和音频信息结合起来,实现了有更好效果的视频场景探测和故事切分方法。 4.通过文本分类方法,实现了视频数据基于内容的分类方法,使得视频媒体的浏览和检索在基于内容方面有更好的效果。

【Abstract】 Looking at multimedia research, the field of automatic content processing of multimedia data become more and more important. Automatic cut detection in the video domain genre recognition or automatic creation of digital video libraries are key topic addressed by researcher.But at present, prior work on video content analysis is difficult of approach good result using video frequency data alone and is inapplicability to scene segmentation and video content classified. Using corresponding audio and text information to assist video analysis can deal with this problem.Further more, in present years, audio processing technology developed rapidly and speech recognition technology already grown up and achieve high veracity in vocabulary speech recognition. Speech recognition and natural language process technology can receive and sort audio content. This is the first step toward retrieval entire video and sort video content.In this paper, we deal with the problem of segmenting news video data into semantically coherent scene using audio and video data, besides, classifying of news video content.1. According to research of audio, we analyze physics and biology characters and pick up a serial of audio characters on the base of classic short-time analysis.2. We develop a serial audio content analysis algorithm about audio classification and segmentation, speaker change detection and speech recognition.3. Merging audio analysis method we develop the algorithm about video structure analysis. Fusing video and audio analysis result we realization the method of video scene detection and story segmentation.4. Via text classification we develop content-based classification algorithm of video data. In this way we can approach more available result on video browse and retrieval.

  • 【分类号】TN912.3
  • 【被引频次】2
  • 【下载频次】168
节点文献中: 

本文链接的文献网络图示:

本文的引文网络