节点文献

基于内容的视频结构挖掘方法研究

Research on Method of Video Structure Mining Based on Content

【作者】 付畅俭

【导师】 李国辉;

【作者基本信息】 国防科学技术大学 , 控制科学与工程, 2008, 博士

【摘要】 多媒体技术的快速发展产生了大量的视频数据,迫切需要有效技术对其进行管理、解释和利用。本文利用数据挖掘的思想,从语法和语义两个方面,探索视频高层结构知识,挖掘视频结构中蕴含的、有价值的、可理解的语义信息和模式知识,用于视频数据库的组织与管理、基于内容的个性视频推荐、基于结构语法和语义的视频摘要等。论文的主要研究内容及创新点如下:(1)视频结构挖掘概念和方法的理论研究。在传统数据挖掘及多媒体数据挖掘的基础上,明确提出了视频结构挖掘,确定了视频结构挖掘的概念框架,并对视频基本结构挖掘、结构语法挖掘和结构语义挖掘等概念进行了规范界定。确定了视频结构挖掘的系统结构,由视频数据预处理、建立视频数据库、视频数据的多维分析、视频挖掘功能模块和视频挖掘界面五大部分组成。确定了视频结构挖掘的功能结构,包括数据预处理、基本结构挖掘、结构语法挖掘、结构语义挖掘、模式评估和知识表现六大组成部分,其中视频基本结构挖掘是结构语法和结构语义挖掘的基础,视频结构语法挖掘和视频结构语义挖掘相辅相成,相互促进。(2)基于内容的视频基本结构挖掘方法研究。针对视频基本结构挖掘中的两大核心内容,提出了镜头分割算法和场景分割算法,从而得到视频中包括帧、镜头、场景和节目本身的视频层次结构,实现视频结构化,为进一步挖掘隐藏在基本结构之中的结构语法和结构语义提供了有力保证。确定了视频基本结构挖掘框架,主要内容有镜头分割、关键帧提取、镜头特征提取和场景分割等。利用HSV颜色空间进行非等间距量化,提出自适应双直方图两次判别镜头分割算法。利用HSV颜色直方图、同构纹理(HTD)和边界直方图(EHD)计算镜头之间的相似性,基于镜头多特征聚类和基于竞争力,从合并和分割两个方面提出视频场景构造方法。对视频结构挖掘中的音频辅助进行了探讨,提出利用新闻视频中声纹特征进行新闻故事单元分割方法。(3)基于内容的视频结构语法挖掘方法研究。确定了视频结构语法挖掘框架,在镜头分割的基础上,提出改进的FSCL算法进行无监督镜头聚类,把视频流数据转化为符号序列。针对视频关联规则中项的次序相关、时间相关、没有明确事务概念的特点,对传统Apriori算法进行改进,提出基于时基窗计算支持度的视频关联规则挖掘算法,以关联规则频繁集探讨视频中周期性或半周期性的结构语法模式。语法模式识别常用方法有字符串匹配和字符串解析两种,针对字符串匹配的局限性,提出基于HMM的模式挖掘方法对高层视频事件进行解析,识别和定位篮球视频中的罚球事件。(4)基于内容的视频结构语义挖掘方法研究。提出三个层次和两层映射的视频结构语义模型,并以此探讨解决视频低层特征到高层语义(用户需求)之间的“语义鸿沟”的方法。在底层特征和用户需求之间,增加镜头层语义概念,形成三个层次。结合语义概念网络模型,建立视频镜头多概念判别随机场模型,实现底层特征到镜头层语义概念的映射,充分利用概念之间的相互作用,提高镜头层语义概念标注的精确度。利用结构语法挖掘中得到的语法结构知识,以镜头层语义概念线索作为观察值,建立HHMM模型,以事件推理的方式,实现镜头层语义概念到高层视频语义事件的映射。综上所述,论文主要工作集中在基于内容的视频结构挖掘,建立了视频结构挖掘的理论与框架,从视频基本结构、结构语法和结构语义三个层次探讨视频挖掘方法与应用,在理论和应用上都取得了一定的成果。这些成果不仅具有实践价值,也将对多媒体数据挖掘产生积极的影响。

【Abstract】 Advances in multimedia technologies have yielded a vast amount of video data. The omnipresent video data calls for efficient and flexible methodologies to annotate, organize, store, and access video resources. Video mining has attracted much research interest in recent years. It is defined as the process of discovering the implicit and previously unknown knowledge or interesting patterns from a massive set of video data. By means of data mining, the higher-level structure knowledge of video is explored at two aspects of syntax and semantics in this thesis. The main content and innovations are as follows:(1) The theoretical research on the concepts and methods of video structure mining. Based on the theories of traditional data mining and multimedia data mining, the concepts of video structure mining are defined explicitly in this thesis. The video structure mining mainly includes basic structure mining, structure syntax mining and structure semantics mining. The basic structure mining is the base of the structure syntax mining and the structure semantics mining. The structure syntax mining and the structure semantics mining supplement each other. A system structure of video structure mining is proposed, which includes pre-processing of video data, establishing video database, multi-dimensional analysis of video data, video mining function module and video mining interface. A functional structure of video structure mining is proposed, which includes data preprocessing, basic structure mining, structure syntax mining, structure semantics mining, patterns evaluation and knowledge representation, etc.(2) The research on content-based video basic structure mining methods. In order to obtain the hierarchical structure, which includes frame, shot, scene and video program from video, a framework of the video basic structure mining is proposed. This framework includes shot boundaries detection, key frame selection, video shot feature extraction and video scene segmentation, etc. This thesis focuses on the algorithms of video shot boundaries detection and video scene segmentation, so as to structuralize video stream. By using HSV (Hue, Saturation, Value) color space to do quantification of unequal distance, an algorithm of video shot boundaries detection using adaptive threshold by two-histogram and twice-differentiation is proposed to partition a video into shots. Based on the similarities of HSV color histogram, homogeneous texture descriptor (HTD) and edge histogram descriptor (EHD) among video shots, two video scene construction methods are developed, the one is shot clustering approach based on multi-features and the other is shot segmenting approach based on force competition. This thesis also discusses audio assistance in video structure mining and presents a method of news story unit segmentation using the speaker identification by the voice feature in news videos. (3) The research on content-based video structure syntax mining methods. This thesis presents a framework for video structure syntax mining. Based on shot segmentation, an improved method of frequency sensitive competitive learning (FSCL) is put forward to achieve unsupervised shots clustering and transform video stream into symbol sequence. With regard to the characteristic such as item’s order correlation, time correlation and without explicit transaction concept in video symbol sequence, calculating support based on temporal window, improving traditional apriori algorithm, a video association rule mining algorithm is proposed to exploit the periodic or semi-periodic structure syntax pattern in videos by frequent set from the transformed cluster sequence.(4) The research on content-based video structure semantics mining methods. This thesis presents a video structure semantics model composed of three semantics levels and two inter-level mappings to bridge the semantic gaps between the low level features and the high level semantics. Between the low-level feature and high-level user’s demand, this model adds the shot semantics concept. From the mapping of the low-level feature to shot semantics concept, this thesis applies discriminative random fields (DRF) model to shot multi-concepts annotation, and puts forward multi-concepts discriminative random fields (MDRF) and generalized MDRF models to detect semantics concepts in video shot. In our system framework of higher layer semantics events mining, structure syntax knowledge is extracted from video structure syntax to decide the model structure, the shot layer semantic concepts cues are treated as models observations, and hierarchical hidden markov models (HHMMs) are built and trained to infer the events from the cues. Through the way of incident reasoning, it fulfills the mapping of shot layer semantic concepts to higher layer video semantic incident.This thesis focused on video structure mining based on content. It set up the theories and framework of video structure mining and explored the methods and application of video structure mining from three gradations, that is, video basic structure mining, video structure syntax mining and video structure semantics mining. It will not only bring positive influence on multimedia data mining, but also establish theoretical and practical values for other correlative researches.

节点文献中: