节点文献

面向观众的电影情感内容表示与识别方法研究

Research on the Audience Oriented Affective Content Representation and Recognition Methods in Film

【作者】 孙凯

【导师】 卢正鼎; 于俊清;

【作者基本信息】 华中科技大学 , 计算机应用技术, 2009, 博士

【摘要】 随着数字音视频数据的爆炸性增长,从这些非结构化数据中自动提取富含语义的内容成为当前面临的一项挑战,由此引发的相关研究热潮催生了基于内容视频检索(CBVR)这一研究课题。视频情感内容是人们理解视频内容时的一个重要但是经常被研究者忽略的因素。作为CBVR研究领域中的一个新兴研究方向,视频情感计算可以利用CBVR和情感计算的相关理论理解视频情感内容。但是,由于人类情感与低层特征之间存在较大的“情感鸿沟”,目前仍然缺乏一个统一的理论框架用于视频情感内容理解。在此背景下,以电影视频作为研究对象提出一种面向观众的电影视频情感内容表示与识别方法。为了有效表示电影视频的情感内容同时反映观众的个性情感特征,提出一种面向观众的电影情感空间建模方法。通过引入典型模糊情感子空间的概念,该模型可以统一离散和连续两大流派的心理学情感模型。模型采用模糊C-均值聚类算法划分情感空间,利用高斯混合模型确定划分出的典型模糊情感子空间的情感隶属度函数。该电影情感空间可以反映观众的个性化情感体验,能够在情感空间中定义典型情感状态区域,并且能够方便地计算各种情感状态的情感强度。实验结果验证了建模方法的有效性,并且表明该电影情感空间可以面向观众地表示电影情感内容。为了在低级特征与电影情感内容之间存在的“情感鸿沟”之上架起桥梁,依照情绪心理学和电影创作的相关理论,设计、提取和选择了一组电影情感特征。利用Whitney特征选择算法选择出两组电影情感特征向量,其中一组用于描述情感诱力,另一组用于描述情感激励。实验结果表明,提出的电影情感特征向量在区分情感诱力和激励的正负时优于现有研究结果。为了有效检测电影情感内容,提出一种基于激励曲线和电影情感树的多级电影视频摘要生成算法。基于此算法,可以从原始影片中检测出情感语义较为显著的部分。激励曲线是一种可以用来度量电影观众情绪兴奋度随电影情感内容起伏变化的曲线。首先利用激励曲线定位不同情感粒度的电影情感单元,然后将这些情感单元按情感粒度大小逐级组织起来即可生成电影情感树。电影情感树的每层节点都对应着原始电影的一个电影视频摘要。为了识别电影视频摘要中各情感单元的情感内容,研究中提出两种情感识别方法:基于基因-隐马尔可夫联合模型(GA-HMM)的情感内容识别方法和基于情感空间的情感内容识别方法。GA-HMM情感识别器可以用于识别观众的基本情感事件。实验结果表明与传统的隐马尔可夫模型相比,GA-HMM可以在减小计算量的同时获得更高的情感识别率。基于情感空间的情感内容识别方法采用多层感知机和多元线性回归计算电影情感单元的情感坐标。基于电影情感单元的情感坐标和电影情感空间的情感隶属度函数,该方法提出“最大隶属原则”和“阈值原则”,用来表示和识别观众观影过程中的个性化情感体验。实验结果表明,该方法能够有效地表示和识别个性化电影情感内容。电影情感内容表示与识别需要研究的问题还很多。在电影情感空间建模方面,现有的建模方法完全依赖观众自己标注的情感评价数据建模,给用户带来的负担较重,如何利用已有的其他用户的情感数据为一个新用户服务是未来的一个研究重点。由于人类情感与视觉和听觉之间的内在联系尚不明朗,现有的电影情感特征向量在区分情感诱力正负时的识别精度还不够理想,必须进一步结合领域知识设计更加合理的情感特征向量。此外,建立面向观众的电影情感空间时考虑的观众群还比较有限。为了能够更准确地描述观众的个性情感信息,在今后的研究中还需要进一步扩大采集情感信息的观众群。

【Abstract】 With the proliferation of digital audiovisual, the challenge of extracting meaningful content from such data sets has lead to research and development in the area of content based video retrieval (CBVR). An important and often overlooked aspect of human interpretation of video data is the affective dimension. To address this problem, video affective computing is proposed, which is one of the latest research areas and can utilize both CBVR and affective computing theories to understand video affective content. However, due to the inscrutable nature of human emotions and seemingly broad "affective gap" from low-level features, there is still lacking a unified theoretical framework for video affective content understanding. Taking the film as study object, a solution for audience oriented affective content representation and recognition is presented.To represent film affective content effectively and describe the personalization of audience faithfully, an audience oriented film emotion space is proposed. It can unify the discrete and dimensional emotion model by introducing the typical fuzzy emotion subspace. Fuzzy C-mean clustering algorithm is adopted to divide the emotion space. Gaussian mixture model is used to determine membership functions of typical affective subspaces. At every step of modeling the space, the inputs rely completely on the affective experience recorded by the audiences. The advantages of the audience oriented film emotion space are the personalization, the ability to define typical affective states areas in the emotion space, and the convenience to explicitly express the intensity of each affective state. The experimental results validate the model and show it can be used as an audience oriented emotion space for film affective content representation.To bridge the "affective gap" between low-level features and film affective content, a set of film affective features are designed, extracted and selected. These film affective features are designed according to the theories of emotional psychology and filmmaking. Whitney feature selection algorithm is implemented and two sets of film affective feature vectors are formulated. One is for describing the affective valence and the other is for describing the affective arousal. The comparative experiments show that the proposed film affective features outperform the existing studies in classifying the positive and negative of the affective valence and arousal.To recognize the film affective content, the affective highlight in the film should be detected in the first place. A multilevel film summary is proposed based on the arousal curve and film affective tree (FAT). The arousal curve indicates how the intensity of the emotional load changes along a film, and depicts the expected changes in audience’s arousal intensities while watching that film. Film affective units (AU) in different granularities are firstly located by arousal curve, and then the selected affective content units are used to construct the FAT. The AU at each level of the FAT can be organized as a film summary. Two methods are proposed to recognize the affective content of AU in the summary, which are the genetic algorithm combined hidden marcov model (GA-HMM) based affective content recognition and the emotion space based affective content recognition. The first method can be used to recognize the basic emotional events of audience. The experimental results show that GA-HMM can achieve higher recognition rate with less computation compared with classic HMM. The second method adopts multi-layer perceptron and multiple linear regression to compute the emotion coordinates of the AUs in film summary. Based on the affective membership functions and emotion coordinates, the maximum membership principle and the threshold principle are introduced to represent and recognize the emotional preferences of the audiences. Experimental results demonstrate that this method can effectively represent and recognize the personalized film affective content.There are many research issues exist in the audience oriented affective content representation and recognition. The proposed film emotional space depends entirely on the affective evaluation of the audience, which is a tedious and heavy burden to the user. How to make good use of existing data to service other users is an important research issue in the future. Because of the relation between emotion and audio-visual is still unclear, the selected film affective feature vectors are not perfect for classifying the positive and negative of the emotional valence. Further investigation on the domain knowledge should be implemented to design the more reasonable film affective features. Furthermore, the emotional information of the audiences is not comprehensive enough. To describe the information of the audience’s emotional personality more accurately, the coverage of the audience should be further expanded.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络