节点文献

基于互联网数据集的图像标注技术研究

Study on Image Annotation Based on Web Training Data

【作者】 荚济民

【导师】 俞能海;

【作者基本信息】 中国科学技术大学 , 信号与信息处理, 2009, 博士

【摘要】 随着数码设备的日益普及以及互联网技术的迅速发展,Web图像资源越来越丰富。但由于Web数据具有多样性、复杂性和无规则性等特点,如何快速、准确地从海量Web资源中查找用户感兴趣的图像成为一项非常具有挑战性的任务。解决这一问题的重要途径就是通过对互联网图像进行自动图像标注,建立图像底层视觉内容与高层语义之间的联系,并利用标注词对图像进行索引。近年来,以Flickr为代表的图片共享社区的兴起与繁荣也让图像标注在Web 2.0环境下被赋予了新的生命。此外,自动图像标注在家庭影集的管理、医学图像检索、商标检索和人脸识别等方面都有着广泛的应用。由于图像数量的巨大,依靠手工对图像进行标注费用昂贵,已经不能满足实际的需要。从标注使用的训练集来看,自动图像标注技术经历了两个阶段:第一个阶段可以看成是在有限数据集上的图像标注,利用一些传统的机器学习、物体识别的方法建立图像底层特征和高层语义的联系,如基于分类器的方法、基于跨媒体相关的方法、基于翻译模型的方法以及基于隐变量的生成式模型方法等;第二个阶段是基于互联网数据集的图像标注方法,这种方法更多的是从标注的框架和效率入手,充分利用了互联网的丰富资源,大大拓展了训练集的范围,因而更符合互联网环境下图像标注的实际需要,也是近年来图像标注研究的热点。本文主要对基于互联网数据集的图像标注中的一些关键问题进行研究,主要成果和创新之处包括以下几个方面:讨论了构建互联网标注词词典的重要性,研究了如何从浩如烟海的互联网词汇中选择合适的标注词集合,并分析了词典中词语需要满足的条件。论文根据图片共享社区中词语的统计特性,提出了一种基于随机游走的标注词重要性建模方法,词语的重要性是根据用户的历史标注情况以及词语之间的相互关系衡量的,然后根据词语的重要性排序构建标注词词典。此外,还根据图片共享社区提供的标注词的丰富的语义资源,对带有初始关键词的互联网图像标注进行语义消歧,通过寻找待标注图像在图像共享社区中的合适的语义类,减少“语义鸿沟”的影响,使最后学习出的标注词语义更加一致。提出利用多模态相互加强原理进行图像标注。首先给定单幅图像,利用基本图像标注模型得到初始标注词,然后在基于随机游走的图像标注优化框架的基础上,通过标注词相关图和图像内容相关图之间相互加强原理,利用稳定状态下的新的相关性进行优化,可以更好地保证图像内容和最终标注词之间的关联,同时也保持了标注词的语义一致性。由于互联网图像所在网页的文本提供了丰富的语义信息,我们提出利用网页文档之间相似性与正文中命名实体的相互加强原理,更好地表示了网页文档之间的相似性。提出了一种基于互联网数据集的家庭影集联合标注框架。与单幅图像标注问题不同,我们考虑了利用影集内图像的相关性对多幅图像进行联合标注。首先对家庭影集中的图像进行聚类,然后从互联网数据中学习图像簇的初始标注词,再将初始标注结果输入半监督学习框架中进行后续处理,这里的半监督学习框架同时考虑了视觉内容相关性、标注词相关性以及时间相关性等。提出了一种基于跨媒体相关的个性化图像标注词推荐模型P-DCMRM。该模型综合考虑了视觉内容空间、标注词空间以及用户空间。P-DCMRM模型克服了已有的标注词推荐系统中忽略图像视觉内容的问题,同时也在DCMRM的基础上考虑了用户空间。在模型估计中,综合考虑了训练集的全局统计特性和用户局部空间的统计特性。对于用户上传的图像,系统可以自动地根据不同用户的标注历史和兴趣向用户推荐不同的标注词。

【Abstract】 With the rapid development and popularity of digital devices and Internet,Web image resources have become more and more prosperous.The variety,complexity and irregularity of Web data make it a very challenge task for users to search images they need from such large web resources.One important strategy to address this problem is automatic image annotation,which builds the relationship between image visual content and high level semantics,and then images could be indexed by these annotations.In recent years,the promising development and prosperity of image sharing sites,such as Flickr,makes image annotation an important and valuable research direction.In addition,image annotation is widely used in many applications such as personal album management,medical image retrieval,trademark image retrieval and face recognition.Providing annotations manually requires too much human resources and money, which makes it unrealistic for such large number of images.Generally speaking, image annotation evolves through two stages:the first stage is image annotation based on limited training data,which build the relationship between low level features and high level semantics by applying existing machine learning and object recognition techniques,such as classifiers based methods,cross media based methods,translation based methods and latent topic based generative model;the second one is image annotation based on training dataset from web,which makes use of the abundant resources on the web and extends the scale of training data greatly.This latter strategy is more applicable under the environment of web,and attracts a lot of attention in recent years.This dissertation focuses on several key problems in image annotation based on training data from web.The main contributions of the dissertation are as follows:We discuss the necessity of building annotation dictionary based on web resources.The requirements of annotation words in the dictionary are analyzed for choosing proper words from large scale of words on the web.A random walk model is proposed to build the importance of words in the dictionary,based on the statistical properties on the image sharing websites.In addition,with the use of abundant semantic services on these sharing websites,the disambiguation of initial keywords is studied.By matching the query image with the proper semantic classification,the problem of semantic gap is reduced,which makes the final annotations more coherent.A novel image annotation framework is proposed based on multimodal similarity reinforcement theory.The initial annotations for the query image are firstly obtained by using basic annotation algorithm such as CMRM,CRM.Then the visual content correlations and the annotations correlations are mutually reinforced for computing the final annotations,based on the random walk annotation refinement framework. The final annotations could be more coherent and related to the content of images by incorporating both visual content correlations and annotation correlations.Since the webpages provide abundant semantic interpration for the images on these webpages, we propose to fuse correlations between webpage documents and named entities in the documents,which helps represent the document similarity better.A joint image annotation framework is proposed for annotating personal albums. Different with annotation for a single image,the correlations in the personal albums are considered.The personal album is firstly clustered and the initial annotations are learned from web images for these clusters.The initial annotations are then refined in a semi-supervised learning framework,which combines visual content correlations, annotation correlations and temporal correlations.A cross media based personalized image annotation recommendation model P-DCMRM is proposed.The model combines visual content space,annotation space and user space together.P-DCMRM overcomes the problem of existing image annotation methods,which neglects the visual content of images or the properties of user interests.The global statistical properties and the local ones are both considered for estimating the model.For an uploaded image,annotations could be produced according to different user interests and their annotation history.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络