节点文献

基于音视频融合的网球视频检索

【作者】 董晴

【导师】 王建宇;

【作者基本信息】 南京理工大学 , 检测技术与自动化装置, 2010, 硕士

【摘要】 本文以网球视频为研究对象,检测网球比赛视频中出现的精彩事件,如ACE球、上网球等。并提出了一个网球视频精彩事件检测框架,实现了视频流语义分析、音频流语义分析和音视频特征融合的精彩事件检测三个部分。视频流语义分析包括镜头分类、球员检测、球员跟踪等。镜头分类是网球比赛视频分析的基础,直接决定精彩事件检测的准确性。针对目前现有的镜头分类方法,结合网球比赛视频的特点,提出了一种基于Hough直线检测的镜头分类方法,将镜头分为比赛镜头和非比赛镜头。然后在比赛镜头中利用帧差法提取球员所在位置区域,利用Camshift算法实现球员跟踪。音频流语义分析包括基于帧的特征提取及基于段的音频分类等。本文先将音频流分段,再对音频段分帧处理,然后提取音频帧的特征参数,包括短时平均能量、短时过零率、MFCC以及差分MFCC等,利用连续隐马尔可夫模型实现对音频段的分类,将音频段分为击球声、欢呼声、解说员激昂解说、解说员平缓解说、背景噪音等五个类别。最后通过比赛镜头的长度、球员位置、球员运动变化、击球声和欢呼声等特征检测出ACE球事件、底线对打事件和上网球事件。综上所述,本文采用音视频融合的方法实现了网球比赛精彩事件自动分析与提取。最后,本文以Visual C++ 6.0、matlab 7.0为开发平台,应用Intel OpenCV Library实现了一个网球视频自动分析原型系统。实验表明,本文提出的网球视频语义分析算法具有令人满意的效果。

【Abstract】 This paper uses tennis video as research object, detects the exciting events occurred in the tennis video, such as ACE,Net-approach and so on. And proposes a wonderful tennis video incident detection framework, containing visual semantic analysis, audio semantic analysis, highlights detection based on the fusion of audio-visual information.Visual semantic analysis includes shot classification, player detection, player tracking. Shot classification which directly determines the accuracy of exciting event detection is the foundation of Video analysis. For the current shot classification, combined with the characteristics of tennis game video, shot classification method based on Hough line detection is proposed. The lens is divided into game and non-game camera lens. And then extracts the location of players in the game lens by the frame difference, and accomplish player tracking using Camshift algorithm.Audio semantic analysis includes frame-level audio features extraction and audio clip recognition. This paper designs an algorithm to extract average short time energy, short time zero-crossing rate, MFCC and difference MFCC. And then an audio classifier based on continuous hidden Markov model is realized which divides audio information in tennis game into five classes:shots, cheers, excited commentary, normal commentary and background noise.Finally, ACE ball, Base-line Rally and Net-approach can be abstracted according to the length of the lens, position of player, player movement, shots and cheers.In summary, this paper proposes an algorithm to automatically analyze and extract tennis game highlights scene based on the fusion of audio-visual features. And a prototype system, using Intel OpenCV Library, for automatic tennis video semantic analysis by Visual C++6.0 and matlab 7.0 is implemented. Experiments have demonstrated that all these methods are effective.

  • 【分类号】TP391.41
  • 【被引频次】2
  • 【下载频次】143
节点文献中: 

本文链接的文献网络图示:

本文的引文网络