节点文献

互联网图像高效标注和解译的关键技术研究

Research of Large-Scale Web Image Annotation and Interpretation

【作者】 夏丁胤

【导师】 庄越挺;

【作者基本信息】 浙江大学 , 计算机科学与技术, 2010, 博士

【摘要】 作为支持互联网大规模图像检索的一种有效和实用方法,互联网图像自动标注和理解已成为学术界和产业界的热点问题而被深入研究。本文研究了图像视觉内容与伴随文本语义之间的潜在关联关系挖掘、图像解译、大规模数据聚类以及图像视觉特征深度学习等关键性问题。论文的主要工作有:一、提出了一种基于数据驱动的互联网图像自动标注和理解框架(Automatic Web Image Annotation and Interpretation, AWIAI)。在图像自动标注过程中,AWIAI框架先计算图像伴随文本中单词可见度属性来构建“图像-单词”关系矩阵,然后对该关系矩阵进行隐性文法分析以扩展备选标注单词,最后通过图像视觉内容的无监督学习和对单词两两共生关系进行分析和排序,得到图像标注最终结果。二、在图像自动标注结果的基础上,提出了图像解译的概念和具体实现方法。现有图像自动标注方法未能对标注单词之间存在的语法关系进行分析,因此得到的图像标注结果是若干离散单词,难以对图像所蕴含丰富语义进行自然语言的深层次描绘(如对图像产生“熊猫吃竹子”的分析结果)。该方法在AWIAI框架下得到图像标注单词后,分析标注单词之间的语句关系,产生句法群组,以自然语言方式对目标图像内容进行解译。三、对存在致密相似度关系的大规模数据,本文针对性提出了两种改进的近邻传播聚类的方法,即在聚类过程中通过局部信息传递来加快整体信息传递速度的方法,以及通过对局部采样数据进行信息传递,再将其它数据内嵌进去从而得到快速全局近似结果的方法。AWIAI框架以数据驱动为核心进行图像智能处理,因此需要解决大规模数据高效聚类这一难点问题。四、在AWIAI的图像理解过程中,本文提出了一种结合模型和数据驱动的深度学习方法(Deep Model-based and Data-driven, DMD)来提取图像理解中最具区别性的视觉特征。近来神经科学理论研究成果认为大脑对外界视觉信息感知是一个逐层学习过程。DMD方法通过一个从简单到复杂的深度学习流程来提取图像视觉特征,先以无监督学习方法获得特征并将其稀疏化,然后通过有监督学习方法实现图像语义理解和标注。

【Abstract】 As one of practical and effective ways for large-scale web image retrieval, automatic web image annotation and understanding have been hot topics both in academic and industrial research areas. This dissertation mainly focuses on research issues such as mining of relevance relationship between visual features and surrounding text, image interpretation, large-scale data clustering and deep learning of image features.In order to resolve above mentioned issues, this dissertation proposes a data-driven automatic web image annotation and understanding framework (Automatic Web Image Annotation and Interpretation, AWIAI). For the sake of annotating images with suitable words, AWIAI first calculates the visibility of words in surrounding text to build the "image-word" matrix, then extends the initial annotation result by latent visual and semantic analysis, and the final annotated words are obtained by unsupervised learning of visual correlation and co-occurrence of annotation words.The current approaches of image annotation only utilizes several discrete words to describe the image semantics since those approaches neglect the statement-level syntactic correlation among the annotated words. As a result, those approaches are inability to render natural language interpretation for images such as "pandas eat bamboo". To solve this problem, "Image Interpretation" is proposed in this dissertation. The basic idea of image interpretation is to discover the statement-level syntactic correlation among annotated words, and produce interpretation results by natural language.AWIAI framework is a data-driven pipeline for image processing, which often encounters the problem of large-scale data clustering. This dissertation presents two kinds of clustering approaches for large-scale data with a dense similarity matrix. Partition Affinity Propagation (PAP) passes messages in the subsets of data first and then merges all of data together. PAP can effectively reduce the number of iterations of clustering. Landmark Affinity Propagation (LAP) passes messages between the landmark data first and then clusters other data. LAP is a large global approximation method to speed up clustering.Recent advancements in neuroscience have indicated that our human being brain perceives the outside world with a hierarchical learning process. Motivated by such research, a model-based and data-driven hybrid architecture (DMD) is proposed in AWIAI to boost image annotations by learning out discriminant features. DMD first selects a deep learning pipeline to progressively learn visual features from simple to complex. Then DMD integrates deep model-based learning and data-driven learning pipelines together. After the discriminant image representations are obtained by a sparse regularization from both pipelines in an unsupervised way, a supervised learning algorithm is conducted to predict image objects in images.

  • 【网络出版投稿人】 浙江大学
  • 【网络出版年期】2011年 08期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络