节点文献

体育视频语义内容分析技术研究

Sports Video Semantic Content Analysis

【作者】 陈剑赟

【导师】 吴玲达;

【作者基本信息】 国防科学技术大学 , 管理科学与工程, 2005, 博士

【摘要】 传统的视频内容分析抽取客观存在的感知特征,而用户所消费的往往是语义内容,这就造成了计算机自动分析与用户需求之间的矛盾。多媒体信息系统领域专家把这种矛盾称为语义鸿沟。语义鸿沟是阻碍新一代视频应用的瓶颈问题。本文以体育视频为研究对象,从概念模型、技术框架、分析方法等方面系统地研究了视频低层特征与高层语义之间的关联,以跨越语义鸿沟获取体育视频的语义内容。 在体育比赛领域规则和视频拍摄编辑手法的基础上,本文定义了体育视频的基本语义单元BSU(Basic Semantic Unit),BSU是表征体育视频语义内容的基本单元。围绕BSU,本文提出了基于BSU的体育视频语义内容分析框架,进而重点研究了该框架下的伴随音轨BSU、场景BSU和事件BSU等各类BSU的语义内容分析,并设计实现了体育视频语义内容分析和摘要平台SCASP(Sports video Semantic Content Analysis and Summarization Platform)。论文的主要贡献体现在以下几个方面: ●提出了基于BSU的体育视频语义内容分析框架。这个框架包括两个部分:一是基于BSU的概念模型——BSUCN(Basic Semantic Unit Composite Network);定义基本语义单元之间的关系为BSURelation,BSUCN是由BSU和BSURelation组成的体育视频语义内容分析的网络;BSUCN将纷繁芜杂的语义理解问题转化为目标明确的BSU分类识别。另一是基于概率统计关联模型的技术框架;技术框架明确了体育视频语义内容分析的技术途径和基本方法论,指出BSU的语义内容分析是不确定性的分类识别问题,需要采用基于概率统计的模型实现低层特征与高层语义之间的关联。 ●提出了基于高斯混合模型的伴随音轨BSU语义内容分析方法。在基于BSU的体育视频语义内容分析框架基础上,运用高斯混合模型建模体育视频伴随音轨的语义类型,将伴随音轨BSU的语义内容分析转化为音频的语义分类与分段。 ●提出了基于隐马尔可夫模型的场景BSU语义内容分析方法。在基于BSU的体育视频语义内容分析框架基础上,运用隐马尔可夫模型建模体育视频视图与场景的统计时序关系,将场景BSU的语义内容分析转化为场景的语义分类与分割。 ●提出了基于贝叶斯网络的事件BSU的语义内容分析方法。在基于BSU的体育视频语义内容分析框架基础上,运用贝叶斯网络建模体育视频语义事件的多特征融合关系,将事件BSU的语义内容分析转化为基于概率统计模型的融合分析。 ●设计并实现了体育视频语义内容分析和摘要平台——SCASP,对基于BSU的体育视频语义内容分析框架和相关技术进行了应用和验证。 综上所述,本文提出了体育视频语义内容分析的概念、框架和方法,并通过设计实现SCASP,验证了本文的思路。这些研究为视频语义鸿沟问题提供了一定的解决之道,视频语义内容分析技术的不断发展和完善将使其在信息资源的管理和共享等领域发挥越来越大的作用。

【Abstract】 One of the major challenges facing current content-based video analysis and the related applications is the so-called "the Semantic Gap" between the rich high-level semantics that a user desires and the shallowness of the low-level features that the automatic algorithms can extract from the media. In this thesis, we systematically explore the problem of bridging this gap in the sports video.According to domain-specific knowledge of sports video, at first we define those periodical or semi-periodical important semantic parts during the sports programs as "Basic Semantic Unit", abbreviated to "BSU", which include AudioBSU, SceneBSU and EventBSU and so on. Then a general framework based on BSU for sports video semantic content analysis is presented. Within this general framework, we develop the methods of BSUs semantic content analysis that map low-level features to high-level semantics. Finally, the above framework and methods are validated by designing and implementing the Sports video Semantic Content Analysis and Summarization Platform-- SCASP.The main contributions of this thesis are as follows:· We propose a novel unified BSU-based framework for sports video semantic content analysis, which is composed of two parts: the concept model BSUCN (Basic Semantic Unit Composite Network) and the probabilistic technical framework. On one hand, BSUCN defines the relations among BSUs as "BSURelation" and models the semantic content of sports video. To extract semantics from sports video, we convert the video indexing and understanding problem into a pattern classification and recognition problem. On the other hand, the technical framework clarifies the appropriate approach and methodology of this domain. Unlike previous approaches, we want a feasible, general and effective technique for developing those stochastic models rather than fine-tuning signal-based analytical procedures.· We address the method of AudioBSU semantic content analysis based on Gaussian Mixture Model. We model three kinds of AudioBSU in sports video using GMM and approach the AudioBSU semantic content analysis as audio classification and segmentation.· We develop the method of SceneBSU semantic content analysis based on Hidden Mixture Model. We model the statistical temporal relations of views and scenes in sports video using HMM and approach the SceneBSU semantic content analysis as scene classification and segmentation.· We devise the method of EventBSU semantic content analysis based on Bayesian Network. We model the combined relations of low-level evidences in event and approach the EventBSU semantic content analysis as fusion analysis in event detection.· We design and implement SCASP, which gives a sound support to the aboveframework and methods of sports video semantic content analysis.In a word, this thesis provides an in-depth investigation into the concepts, framework and methods of sports video semantic content analysis. The framework and methods are flexible and generic and can therefore be applied to applications such as multimedia management, human-computer interaction and so on.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络