节点文献
基于大规模视觉模式学习的高性能图像表示
Large-scale Visual Pattern Learning for High-performance Image Representation
【作者】 李大瑞;
【导师】 张宏江;
【作者基本信息】 中国科学技术大学 , 信息安全, 2014, 博士
【摘要】 随着数字媒体设备和智能手机的普及,以及社交网络和网络共享的流行,网络上的图像数据规模越来越大,相应的识别需求越来越多。大规模图像数据为图像识别领域中的识别、分类、检索等问题带来了更多的挑战,也孕育着更多的机遇。在过去几年里,物体检索是大规模图像检索中的热门问题。大规模词表产生的稀疏图像表示是检索中快速查询的保证,高性能图像表示是检索性能的保证。本文通过对局部特征空间中视觉模式学习和图像表示的研究,可以快速产生高性能的图像表示,提升大规模图像检索系统的性能。为了解决大规模图像的识别问题,视觉属性和中层图像表示最近几年成为研究热点。本文通过研究视觉属性学习和中层图像表示的产生,可以快速学习大规模视觉属性、产生可用于识别和检索的高性能表示。本文的主要研究工作和创新之处如下:(1)提出了一种快速构造高性能大规模视觉词表的算法。针对当前大规模图像检索系统的性能瓶颈,本文提出了一种快速构建高性能大规模视觉词表的算法。大规模图像检索系统依赖于大规模视觉词表,用以产生稀疏表示,进而实现快速、准确搜索。当前最好的近似算法构造大规模词表时不能同时兼顾速度和性能。本文利用近似算法迭代过程中视觉模式的继承关系,提出一种可以保证快速收敛的鲁棒近似算法。该算法基本不增加时间、空间代价。理论分析表明,算法会在有限轮收敛到精确算法的收敛解。实验验证表明,产生同等性能的视觉词表,所需时间是己有最优算法的1/10。大规模图像检索系统利用该算法可以快速产生更大规模的高性能词表,为系统的速度和性能提供技术保证。该算法也可以应用到其它视觉模式发现中,快速构造大规模视觉模式集合。(2)提出了一种基于给定的大规模视觉词表产生高性能图像表示的算法。大规模图像检索系统中,针对给定大规模词表后的图像表示产生问题,本文提出了一种高性能且对参数鲁棒的算法,用于量化局部特征并产生图像表示。本文分析了多重量化对提高大规模图像检索中稀疏表示性能的作用,测试了汇集环节中不同汇集方法在大规模图像检索问题中的效果,并比较了检索和识别中已有量化算法的差异。本文从高斯核函数具有的尺度选择性出发,提出一种算法,最小化核函数空间重构误差的。该算法逻辑清晰、目标简洁、求解简单,而且应用到实际实验中可以产生更好的表示。该算法可以更好地利用更多近邻信息产生高性能稀疏图像表示;学得的多重量化权重能够更好地利用距离中局部信息,使得产生的表示对于近邻参数变化更加鲁棒。(3)提出了一种快速产生高性能线性表示的方法。针对一般图像表示问题,本文从线性中层表示出发提出了一种间接地快速学习大量潜在视觉属性并产生高性能表示的方法。当前基于视觉属性的中层表示的各种研究,多数直接将属性模型的输出值组成一个长向量作为中层表示。这种表示方式,中层表示是模型输出的线性映射,表示具有线性不变性。本文以此为出发点,提出通过学习这样的语义子空间,间接地学习视觉属性。通过子空间学习算法可以快速学习包含大规模潜在视觉属性的语义子空间,这样的语义子空间不仅可以通过线性映射产生维度可变的高性能中层表示,而且语义空间的投影具有很强的语义性,可以借助人工标注给其语义含义命名。(4)提出了一种产生高性能非线性表示的方案。在一般图像表示问题中对所有线性形式表示都不能充分利用属性模型信息的缺陷。本文受其它问题中非线性表示的研究启发,提出一种基于属性的非线性中层表示方案,用以产生高性能中层表示。该非线性表示方案对视觉属性定义、属性模型学习和表示产生三个环节分别提出要求:定义高度有偏的二元分类问题,学习局部有效的支持向量机模型,最后采用恰当的尺度参数利用非线性映射产生中层表示。其中,非线性表示可以更好地利用属性模型的偏移和尺度信息,因而具有更高性能;局部有效的属性模型指明输出值中存在一定冗余信息,使后续的信息压缩成为可能;高度有偏的二元分类问题保证很容易定义大量视觉属性,且这些视觉属性都只作用于特征空间的一个局部,为产生稀疏表示提供坚实的基础。实验验证了非线性表示可以显著提高表示的性能。本文通过前两点的工作,提供了一种快速建立高性能稀疏表示的完整方案,对于当前大规模图像检索的系统瓶颈问题给出了有效的改进,保证大规模图像检索系统快速可以产生更高性能的高维稀疏表示。本文的后两点工作,从线性表示和非线性表示角度,对于视觉属性和一般图像中层表示问题进行了系统地研究。本文提出的快速产生线性表示的方法、产生高性能非线性表示的方案,为后续的视觉属性和高性能中层表示研究提供了坚实的基础。特别是本文最后给出的非线性中层表示,该方案容易得到稀疏表示,具有应用到大规模图像检索系统中解决同类物体检索问题的潜力。本文的研究表明,从视觉空间出发,通过研究其中的视觉模式特点并学习具体的模型,可以产生更好的图像表示,也可以为更好地理解图像的内容提供了坚实的基础;图像表示是联系图像视觉外观和语义内涵的桥梁,高性能的图像表示才为产生高性能的识别、检索结果提供坚实的基础,进而通过改进系统的其它环节推进整个研究领域的不断进步。
【Abstract】 With the popularity of digital devices and smart mobiles, and with the popularity of social networks and photo sharing by internet, the scale of web images becomes larger and larger and there are more and more requirements for the associated applications. Large-scale image data and its associated applications are a great challenge and also a good chance for the research topics in the image recognition area, such as object detection, image classification and image retrieval.In the past few years, object retrieval is the hot topic of image retrieval. The sparse image representation generated by a large vocabulary is a good way for the fast search in image retrieval. By our studies on learning visual pattern in local feature space and on image representation, we can generate high-performance image representation rapidly, so as to contribute to a better image retrieval system.To perform the recognition for large-scale images, visual attributes learning and mid-level image representation become hot research topics in recently years. We studied the learning of visual attributes and the generation of mid-level representation, to learn large-scale attributes rapidly and generate high-performance mid-level representation for recognition and retrieval.Our contributions and novelty are summarized as follows.(1) To handle the bottleneck of the available large-scale image retrieval system, we proposed an algorithm for the fast construction of high-performance visual vocabulary. Large-scale image retrieval system depends on large-scale vocabulary, to generate sparse representation indexed by inverted table for fast and exact search. Using the inheritance of visual patterns in the iterations of approximate algorithm, we proposed a robust approximate algorithm that guarantees convergence rapidly. The proposed algorithm requires nearly no more consumption of time and memory. Theoretical proofs guarantee that the algorithm converges to the converged solution of the exact algorithm. The experiment results show that the speed of our algorithm is about10times that of the available state-of-the-art algorithm for generating the equivalent vocabularies. By utilizing it, large-scale image retrieval system is easy to generate an even larger vocabulary with high performance, which is an effective technical support for the search speed and performance of the retrieval system. Besides, the proposed algorithm is also used in other tasks of visual pattern discovery, to construct a set of visual patterns rapidly.(2) In the large-scale image retrieval system, to handle the generation of image representation, we proposed a high-performance parameter-insensitive algorithm of quantizing the local feature and generating image representation. By the locality of the Gaussian kernel function, we proposed an algorithm to minimize the kernel reconstruction error. The proposed algorithm utilizes more neighbors in a better way to generate high-performance and sparse image representation; the learnt quantization weights get more information from the distance so that the image representation is more insensitive to the neighbor number parameter.(3) For the representation of general images, we proposed an indirect method, motivated by linear representation, to learn large-scale latent visual attributes rapidly and generate high-performance image representation. In the area of attribute-based mid-level representation, most available works concatenate the outputs of attribute models into a long vector as the representation. We proposed to indirectly learn visual attributes by learning one semantic subspace. The subspace learning algorithm can learn large-scale latent visual attributes rapidly into the semantic subspace. The semantic subspace is rich of semantic concepts so that the linear representation generated by linear projections is high-performance. Besides, the linear projects are semantic-aware and can be manually labeled with descriptions.(4) In the representation of general images, we proposed a nonlinear representation based on visual attributes for high-performance representation. All the works of representing in linear form have the shortcomings that they cannot utilize all the information of attribute models. The proposed representation scheme is motivated by the nonlinear representation in other problems. The scheme contains requirements for the3procedures, the attribute definition, the attribute model learning, and the representation generation:the attribute is defined as a quite biased binary classification; the learning model is advised to use supper vector machine; the representation is generated by nonlinear mapping with a proper scale value as the parameter. The experiments show that nonlinear representation can improve the representation significantly.By the former2works, we proposed a scheme to generate high-performance sparse representation, which guarantee that the large-scale image retrieval system can generate high-dimension sparse representation rapidly.The latter2works study the visual attribute and mid-level in the views of both the linear representation and nonlinear representation. The proposed method to fast learn liner representation and the proposed scheme to generate high-performance nonlinear representation are helpful for the future works on visual attributes and high-performance mid-level representation.