节点文献

多媒体数据语义建模及应用研究

Research on Multimedia Semantic Modeling and Applications

【作者】 栾悉道

【导师】 吴玲达;

【作者基本信息】 国防科学技术大学 , 控制科学与工程, 2008, 博士

【摘要】 多媒体数据以其丰富的视听内容,越来越多地参与到当今以用户为中心的信息服务体系中。但是,其数据的多样性、语义提取和表示的复杂性、难以逾越的语义鸿沟及多媒体数据的可重用性等问题,已经越来越成为多媒体数据研究与应用的严重阻碍。本文从认知学的角度,探讨研究了多媒体数据服务过程中所遇到的各种难题,特别是多媒体数据的语义表示、建模及检索问题。通过认知心理学方面的研究,对信息用户的信息需求、语义鸿沟等问题进行了深入分析,并开展了多媒体数据语义内容模型、语义概念模型及语义建模等理论和应用研究。论文的主要贡献体现在以下几个方面:1、基于信息用户的认知心理学特点和需求,提出了信息用户的检索行为模型,并对语义鸿沟问题进行分析与扩展。该模型以用户为中心,描述了用户在外界刺激下,根据用户的信息心理和认知结构,对信息需求进行分析、认识、选择信息系统并对检索结果进行交互、过滤和修正检索的过程。将语义鸿沟问题进一步扩展为:思维与自然语言的鸿沟、人机交互鸿沟、特征提取鸿沟、实体语义鸿沟和抽象语义鸿沟等鸿沟。这种扩展与细化,有助于发现多媒体数据分析处理与使用过程中的症结所在。还通过具体分析与说明各个鸿沟的性质,对如何解决语义鸿沟问题、解决问题需要的条件等进行了深入的探讨。2、通过对多媒体数据获取过程的研究和总结,指出目前多媒体数据语义问题的根源在于多媒体数据的获取方式。在这种方式中,数据创作者与使用者分离。前期便利地“获取”数据,是以牺牲后期用户便利地“使用”多媒体数据为代价的。提出根据脚本的描述,结合对象模型、规则模型等数据,生成或者再现脚本所体现的视听觉场景,即基于脚本来生成、获取多媒体数据。提出多媒体数据的内容空间与表示模型。该模型能够对摘要、对象探测等多媒体数据处理进行形式化表示,并从时间、空间和表示粒度三个维度表示出现在多媒体数据中的对象、场景及事件的变化关系。该模型既具有对故事进行整体抽象和概括的能力,又可以体现具体对象的细节,事件和场景的变化过程一目了然,还可以根据需要,在不同的粒度层次上了解场景的含义和内容。3、提出了多媒体数据的语义层次模型。该模型由R轴(规则)、原数据层、低层语义层、高层语义层所组成。提出了基于概念的多媒体数据语义表示模型——概念层次网。研究了概念层次网的扩展,使用概念层次网来表示概念间的空间关系,用以支持基于概念及概念分布的检索(甚至包括某些抽象语义的检索),以及概念层次网的存储、生成、表示以及一致性判断等问题。4、利用描述逻辑所具有的有效推理功能,开展基于描述逻辑表示多媒体数据领域的概念及其语义关系的研究,在理论模型与实际应用之间架起桥梁。研究包括:基于描述逻辑的多媒体数据语义建模相关定义,基于描述逻辑SHOQ(D)实现语义分层——概念层次网的构造、调整等一系列算法,以及基于描述逻辑的多媒体数据语义匹配和检索问题。5、结合国家863项目“数字媒体语义特征分析平台”,重点开展以下三个方面的应用研究:①数字视频的分析、标注与检索;②图像语义标注与检索研究;③基于脚本的动画场景生成。在这个平台里,验证了本文提出的多媒体数据的语义层次模型、基于概念的多媒体数据语义表示和检索模型,以及基于描述逻辑的多媒体数据语义建模等内容。综上所述,本文针对多媒体数据的语义问题,探讨研究了该问题存在的各种症结,如语义鸿沟问题、语义提取和表示问题、语义扩展及检索问题,提出了多媒体的语义内容模型、语义概念模型等相关模型,并进行基于描述逻辑的多媒体数据语义的建模,在国家863项目“数字媒体语义特征分析平台”平台研究中对这些模型进行了验证。

【Abstract】 The user-centered service mode has become a key weapon in this information service war. Based on users’cognitive psychology and environment, this thesis pays more attention to users’inner information need, ways of needs expressing, and providing users with more abundant, convictive and efficient information service.Multimedia, such as video and image, has become one of main data types in nowadays information systems and services. But, there still exist some difficulties, for example, the variety of data-type, the complexity of extracting and expressing semantics, and the well-known“semantic-gap”.The thesis discusses these difficulties that exist in multimedia service from the viewpoint of cognitive psychology, especially multimedia data’s semantic extracting, expressing and retrieving. It discusses the existing modes of acquiring multimedia data and extracting semantic content, extends and refines the customary semantic gap. The thesis raises a concept-based multimedia semantic layer-oriented model and retrieval model, and constructs concept set’s directed acyclic graph by description logic SHOQ(D).The original contributions of this thesis include the following:Based on characters of information users’cognitive psychology in using multimedia system, a retrieval psychology-behavior model is proposed. It is a user-centered model. It describes the course of users’choosing search engine, analyzing and expressing their information needs, interacting with the search engine, filtering the retrieval results and adjusting the retrieval behavior under the environment’s stimulating.Smeulders defines semantic gap as“lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given situation”. This thesis extends this definition and divides semantic gap into some layers, such as gaps between thought and natural language, people and computer, gaps of extracting characters, entity semantic gap and abstract semantic gap etc. This extension and refinement is helpful in finding the sticking point existing in multimedia data analyzing and retrieving. The thesis also discusses the properties of these gaps and introduces their related characters.After analyzing the process of acquiring multimedia data, multimedia semantic difficulties are found originated from their creation mode. In this traditional mode, the data creator and the user are apart. The creator can create and acquire multimedia data in an easy manner, which sacrifices the end user’s using these data easily. This paper raises a script-based multimedia-data acquiring method, which means creating multimedia data according to the content of script. The method combines models of objects and rules that the world operates, re-creates or reforms the audiovisual content recorded in the script, and the data’s semantic content can be got by analyzing the corresponding script.This paper raises a model of expressing multimedia data’s content, which can describe multimedia data’s processing, for example, video summarization and object detection. The model describes objects, scenes and events appearing in multimedia data in three dimensions, which are time, space and granularity. User can learn from the model about the story’s details and general situations at the same time.A concept-based multimedia semantic representation model is proposed. It designs and extends Concept Hierarchy Net to express concepts, their relationships and distributions. It is also used to define, judge and retrieve some abstract semantic content. For description logic is a formal tool of knowledge representation and reasoning, the thesis adopts description logic to reason, construct and adjust multimedia Concept Hierarchy Net by concept subsumption, which composes a base of multimedia semantic matching and retrieval.The paper introduces and practices the models mentioned above in digital video and image’s annotating and retrieving, applied in National Project“Platform of Analyzing Multimedia Semantic Character”. The followed is the script-based creating cartoon system, which is to prove the new way of multimedia data creation raised in this paper.In a word, this thesis focuses on multimedia data’s semantic content, pays attention on sticking points existing in extracting, expressing and retrieving multimedia semantic content. It raises a top-shaped hierarchical semantic model for multimedia data, realizes and validates these concept-based models by description logic. It explores a new way to fight for semantic difficulties of multimedia data.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络