节点文献

基于语义的视频内容检索中模糊不确定性问题研究

Research on Fuzzy Uncertainty Problems of Semantic-based Video Retrieval

【作者】 常军

【导师】 胡瑞敏;

【作者基本信息】 武汉大学 , 计算机应用, 2011, 博士

【摘要】 近十几年来,随着计算机技术、网络通信技术和多媒体技术的飞速发展,视频信息处理的理论、方法和应用模式都有了巨大的变化,视频内容检索己成为多媒体信息技术研究和应用的活跃方向之一。视频语义中含有大量概念的、主观的成分,语义内容相当丰富,但视频图像的数字化表征,并不直观地体现其内容含义,视频语义信息的提取、理解和检索等重要环节均呈现出多样性和模糊性的特点,受限于当前图像理解技术的发展水平和对人类思维认知原理的揭示水平,基于语义的视频内容检索中不确定性仍是无法回避的关键难题,要在不确定性与确定性之间建立联系,从而使偏重主观定性的视频语义特征与偏重客观定量的视频视觉特征之间实现映射和转换仍需要面临许多挑战。本文对基于语义视频内容检索中关键环节所涉及的典型模糊不确定性问题开展研究:首先针对视频语义特征提取和分类时先验约束条件缺失问题和噪音样本、孤立样本等对语义对象智能分类的干扰问题进行研究;然后对多线索、多特征的复杂语义分类推理规则的冲突和不协调问题,及定性推理到定量推理的转换问题进行研究;以上述研究为基础,针对视频语义匹配中多粒度、多层次语义概念间关系匹配,以及语义相关性对检索的干扰问题进行研究。在基本理论模型和应用技术方法等方面取得了如下的成果和贡献:5.基于粗糙集属性约简的多分类模糊支持向量机视频语义提取过程中需要使用图像低层特征信息和先验约束条件进行智能分类和识别,因视频数据具有复杂性和时空多维性的突出特点,视频图像中诸多干扰因素,会造成分类识别所需的先验约束条件缺失,使目前许多分类方法失准,另一方面分类器训练样本中既存在噪音样本、孤立样本等干扰数据,也同时存在对分类边界无重要贡献的冗余数据。针对这些问题,本文将粗糙集属性约简原理与模糊支持向量分类器相结合,在分析和研究了模糊线性可分、近似模糊线性可分和模糊非线性模糊支持向量分类机的数学特性基础上,建立相应的分类函数模型并构造隶属函数,训练集经过属性约简处理后,减弱了噪音和孤点对分类的干扰,缩短了分类器训练时间,提高了多类别分类的精度。通过仿真实验验证,对UCI数据集中8组典型测试数据集分别采用此方法与1-r-1SVM、K-SVM. CS-SVM、FSVM方法对照,训练时间平均缩短了4.4%-34%,分类精度提高0.5%~7.65%。6.基于可能性测度和必然性测度的定量化模糊推理方法在理解和建立视频语义概念的过程中,需要以计算机能够实现的定量形式进行领域知识匹配和不确定推理,还需要面对多线索、多特征的复杂语义分类推理规则之间的冲突和不协调问题。针对这些问题,本文从可能性测度和必然性测度来出发,对定量化不确定性推理进行数学定义和描述,并从理论上证明和推导了有关性质,使其能够对给定命题进行数值转换和推理计算。还以包含度和相似度为基础,建立推理规则间协调度的量化评价关系,为识别复杂知识规则之间的冲突,进而消除推理时的矛盾提供了一种理论方法。7.基于WordNet的图像语义相似性度量方法;视频图像的语义在不同粒度、不同层次的抽象,可以蕴含多种语义概念关系,这些关系主要包括:局部与整体关系、上位与下位关系、同义关系等,当使用基于关键字进行语义匹配时,关键字本身难以直接体现概念间的多种关系。针对此问题,本文的研究借助WordNet的树形概念层次结构,提出依据词汇语义概念间的关系来组织标注关键词和检索关键词的思想,因概念树中两个结点之间有且仅有一条路径,路径的长度可以作为这两个概念的语义相似性的一种度量,从而将两幅图像间的语义相似度量转换成WordNet中词汇概念之间的路径距离,使图像语义概念之间关系得以体现,进而可实现其语义关系匹配。8.视频文档隐含语义相关性分析方法在视频片段的各帧之间,在语义结构的各层次之间均广泛存在着各种相关性,当使用向量空间形式表示和处理语义概念时,会产生同义与多义现象干扰检索结果的不利影响,其原因是表达语义概念的词汇数据之间存在着向量相关性,需要消除冗余的相关性,并要保留核心的语义内容。针对此问题,本文对隐含语义分析方法加以改进和扩展,提出视频特征词典空间的构建方法,以视频特征词典为基础,建立反映视频内容结构特征的视频文档集合矩阵,通过消减视频文档集合矩阵中的隐含相关向量,保留视频内容核心结构特征值,以达到消除相关性干扰的目的。经过TERCVID数据集进行仿真实验验证,对典型的17个语义项用本方法与对照的K-NN算法进行比较,本方法对其中70.59%的语义项检索效果优于对照算法,对17.65%的语义项检索效果与对照算法基本持平,取得了较好的效果。综上所述,本文以基于语义的视频内容检索为研究背景,对视频语义提取、领域知识推理、语义概念匹配等关键环节中所面临的几个典型模糊不确定性问题展开研究,在基本理论和应用技术方面为视频语义检索拓展了值得进一步深入探索和发展的途径,具有重要的理论意义和应用价值。

【Abstract】 Over the last decade, with the development at full speed of computer technology, network communication and multimedia technology, there are enormous changes for visual information processing theory, method, and application mode. Content-base video retrieval has become one active area of multimedia technical research and application. The semantic of video content contains a large number of conceptual and subjective components, but the digitization modality of the video image does not reflect its meaning of content concretely and ocularly. Some essential processing procedure, such as video semantic information extraction, understanding and retrieval, is to show diversity and fuzziness characteristics. Uncertainty is still a key difficult problem that cannot be avoided in the content-based video semantic retrieval. While to build the relationship between uncertainty and certainty, and to realize the mapping between video semantic features and visual features, researcher still has to face many challenges.The main achievements of this dissertation are summarized as follows:(1) Multi-category fuzzy Support Vector Machines method based on the rough set attribute reductionVideo semantic extraction process requires the use of low-level image feature and a priori constraints for intelligent classification and identification. Various interference factors in video images will lead to inaccurate classification. On the other hand, there were not only the noise samples and isolated samples in classifier training sets, there was also redundant data that is useless for identifying classification boundary. To solve this problem, based on the analysis and study mathematical characteristics of fuzzy linearly classification, nearly fuzzy linear classification and fuzzy nonlinear classification fuzzy support vector classification, classification of function model is established and membership function is constructed. The training set which is processed by attribute reduction, reduce the interference of the noise and isolated points on the classification, reduce the classifier training time and improve the classification accuracy. Experiments show that training time is reduced by an average of4.4%-34%, and that the classification accuracy increased by0.5%~7.65%(2) The possibility measure and necessity measure based on quantitative method of fuzzy reasoning During the process of understanding and establishment video semantic concept, conflicts and inconsistencies between complicated reasoning rules have to face and solve. These rules are more clues and more complex features of the semantic classification. To solve this problem this dissertation is to define and describe the quantitative of uncertainty reasoning, to prove and deduce theoretically related properties.(3) WordNet-based image semantic similarity measureVideo image semantic in different particle size, different levels of abstraction, can contain various kinds of semantic relations. These relationships mainly include local and the overall relationship, upper and lower level relations, synonymous relation, etc. When using keyword-based semantic matching, keyword itself is difficult to be directly reflect the various relations between concepts. To solve this problem, This dissertation presents the method based on the relationship between lexical semantics to organize annotation keywords and retrieve keywords. Because there is only one path between two nodes in the semantic concept tree, path length between two concepts can be used as a measure of semantic similarity. Therefore, the measure of semantic similarity between two images can convert the measure of path length between two concepts in WordNet tree.(4) The video latent semantic correlation analysis methodBetween video clips, between the various levels of semantic structure, there are varieties of correlations. This may be obstruct the retrieval result, so it need to eliminate the redundant correlation and to retain the essential semantic content.To solve this problem, this dissertation presents the method that builds the space of video feature dictionary. Using video feature dictionary to describe the structure of video content, then latent correlation vectors were removed, and the essential feature of video contents were retained. As a result, correlation interferences were eliminated. Experiments show that this method was70.59%better than the Contrast algorithm for the retrieval result of semantic items, and was17.65%equal to the Contrast algorithm for them.

【关键词】 视频内容检索视频语义不确定性
【Key words】 Video RetrievalSemantic-basedUncertainty
  • 【网络出版投稿人】 武汉大学
  • 【网络出版年期】2012年 10期
节点文献中: