

Research and Implementation on Image Annotation Using Probability Modeling

【作者】 丁雷

【导师】 须德;

【作者基本信息】 北京交通大学 , 计算机科学与技术, 2010, 硕士

【摘要】 自动图像标注是解决人工标注问题的具有挑战性的工作,它试图在高层语义特征和底层视觉特征之间建立一座桥梁。特别随着机器学习理论的不断发展,很多学者设计出了不同的学习模型,大致可分为两类,即基于概率建模的图像标注和基于分类器的图像标注。本文首先研究两种具有代表性的基于概率建模的标注算法,分别是共现模型和翻译模型。共现模型将图像划分成规则区域,根据图像区域和关键词的共现概率来标注图像,即观察关键词与图像区域的联合发生概率。翻译模型改进了共现模型,提供一种描述图像的新概念——视觉词元。视觉词元通过图像特征聚类后得到,那么每幅图像都包含一个视觉词元集合,图像标注可以看作是从视觉词元“翻译”成为关键词的过程。结合共现模型和翻译模型的思想,本文设计了一种改进相关模型。假设有一个已标注的训练图像集合,通过图像划分聚类后可获得其视觉词元集合,那么每幅图片就可以用视觉词元和关键字两个集合联合表示。再给定一个测试图像,使用语言生成模型方法假设存在一个潜在的概率分布,即相关模型,其包含所有可能出现在图像中的关键词和视觉词元,那么标注过程就是对这个概率分布进行随机抽样。通过训练集可以近似估计这个联合分布,再通过抽样概率值大小提取最有代表性的关键词作为图像的标注结果。这种改进相关模型技术可以有效地利用大规模的带标注的训练图像集,达到更好的标注效果。最后,在Corel数据集上的实验证实了该模型的有效性。

【Abstract】 Automatic image annotation is a challenging work to solve the problem of manually annotation; it tries to build a bridge between the semantic features in high-level and bottom visual features. Especially with the development of machine learning theory, many researchers have designed different learning models about automatic annotation algorithms, which generally can be divided into two categories: probability-based model and classifier-based model.This paper studies two representative annotation algorithms using probability models firstly. They are the Co-occurrence Model and the Translation Model. In the first model they observe the co-occurrence of keywords with image regions which are created using a regular grid. And they annotate the images by the association probability. The Translation Model is a substantial improvement on the Co-occurrence Model. It provides a new concept to describe images using a vocabulary of blobs. Blobs are generated from image features using clustering. Each image is generated by using a certain number of its blobs. They assume that image annotation can be viewed as the process of translating from a vocabulary of blobs to a vocabulary of keywords. Based on Co-occurrence Model and Translation Model, the paper improves and uses a relevance model. For a training set of images, each image in the set has a dual representation in terms of both keywords and blobs. Given a test image, we adopt a generative language modeling approach and assume that there exists some underlying probability distribution, referred to relevance model. The model can be thought of as a set that contains all possible blobs that could appear in the image, as well as all words. So the annotation process is the result of random samples from it. It is to develop probabilistic models to estimate the conditional probability between words and blobs by the training set. This model gets a better significantly performance on the large set of annotated images. Experiments on Corel image databases show the effectiveness and efficiency of the proposed approach.

