节点文献

视觉媒体语义自动提取关键技术研究

Study on Key Techniques in Automatically Extracting Semantic Information from Visual Media

【作者】 蒋树强

【导师】 高文;

【作者基本信息】 中国科学院研究生院(计算技术研究所) , 计算机应用, 2005, 博士

【摘要】 近几年来,随着计算机和网络技术的发展,数字化视频与图像信息越来越多的涌现,基于多媒体信息服务的信息时代正在向我们走来。人们对视频和图像等视觉媒体内容的需求也越来越多,越来越广泛。这就需要行之有效的技术手段来满足用户的各种需求。而“语义鸿沟”是横在人与计算机和谐交互中的一个重要障碍,这是由于人的大脑对视觉媒体的评判标准和计算机系统对视觉媒体的评判标准存在着很大差异。虽然目前针对视觉媒体的语义分析和理解有了很多研究,但这一倍受关注的技术还远远不能满足用户的普遍需求。他们需要利用更多自动提取的语义信息。本文对视觉媒体语义自动提取中的几项关键技术进行了研究,提出了语义提取的四层技术框架,即对象语义层、场景语义层、知识及情感语义层和语义应用层,并分别研究了对象检测、场景分类、高级语义概念提取和基于本体的语义应用等多项关键技术。由于想找到一条普遍通用的语义提取技术是非常困难的,因此往往针对给定应用和利用专业领域知识对特定的视觉媒体内容采取各个击破的策略来分析和自动理解。体育视频的分析和理解由于具有广泛的用户群和巨大的市场潜力而成为近几年来的一个热门研究方向,而随着北京奥运会的临近,体育视频的语义分析和理解对中国具有更强的现实意义。另一方面,通过计算机技术对数字化艺术图像进行分析,并提取它们类别、风格、以及包含的内容等语义信息是一个非常重要而且迫切的问题,正逐渐获得越来越多的关注,国画是中华艺术的瑰宝,对国画等数字化艺术图像的研究也是一个重要的问题。因此本文针对视频和图像这两种视觉媒体,分别研究了体育视频和艺术图像中的语义提取技术。最后还给出了夜景图像的场景分类方法,该技术也具有重要的应用价值。具体来说,论文主要的研究成果包括:1)首先对视觉媒体的语义自动提取的系统框架进行了宏观分析,这是必要的,一方面可以对整个问题有个全局的认识,另一方面可以指导我们实现具体的语义提取技术。给出其中所包含的各个层次的语义信息;并对视觉媒体语义提取的应用框架和解决方案分别进行了系统分析。2)针对体育视频提出了一个鲁棒的球场对象分割检测方法。在很多种体育视频的自动分析中,球场区域起着至关重要的基础性作用,许多语义线索可以在球场分割结果的基础上获取。采用高斯混合模型(GMMs)为球场区域建立颜色模型,这是由于GMMs可以对复杂的,非线性的颜色分布进行建模,从而在进行球场区域的像素检测时具有足够的通用性。经过高斯混合模型的像素检测过程之后,采用区域分析方法把检测的像素连成区域,区域分析主要包括形态学的方法和区域增长的方法,这样得到最终的分割结果。实验证明,本文提出的方法对于不同的体育视频均能有效地实现球场区域的检测。论文还研究了体育视频场景

【Abstract】 In the past few years, techniques of computers and Internet are improving very fast. This causes the amount of image and video content increasing drastically, and more and more people could conveniently access various equipments to obtain the desired visual information. Information era as the core of multimedia information services is coming to us. Techniques to process and analyze visual information need to be constructed to meet various application demands of users on images and video clips. However,“Semantic Gap”is a great challenge for human and computer harmonious interaction; this is because low-level features used by computers could not be always interpreted to high-level concepts that are commonly used by human. Although there exist many research works on semantic analysis and understanding of visual media, this important research area is far from satisfactory, as users need more automatically extracted semantic information.In this thesis, we make a study on some key techniques in automatically extracting semantic information from visual media. A four-level technical framework for semantic extraction is proposed, including object semantic layer, scene semantic layer, knowledge and emotion semantic layer, and semantic application layer. Four kinds of key techniques are investigated respectively: object detection, scene classification, high-level concept extraction and ontology-based semantic application. It is hard to provide a general solution to extract all the semantic concepts from visual media, and is best approached by a divide-and-conquer strategy. Sports video always appeals to large audiences, automatically extracting useful semantic information from sports video to facilitate retrieval and organization is an important problem; and this has emerged as a hot research area recently. With the Beijing 2008 Olympic Games being near; research on semantic understanding of sports video has a special meaning for China. On the other hand, automatically analyzing and understanding digitized art images and extracting their type, style and other semantic information is an important and imperative problem that needs to be addressed. Traditional Chinese Painting is the gem of of Chinese traditional arts; research on this kind of art images is also an important problem. This thesis investigates on extracting semantics from visual media including video and image content; particularly, we concentrate on sports video and art images. At the end, we propose a technique to classify night scene images, which is also an important problem in semantic processing of images. The contributions of the thesis are as follows:1) Firstly, we perform a global analysis on system framework of automatically extracting semantic information from visual media. This is a necessary work, as on

节点文献中: 

本文链接的文献网络图示:

本文的引文网络