节点文献

基于机器学习的物体识别

Machine Learning Based Object Recognition

【作者】 刘光灿

【导师】 俞勇; 林宙成;

【作者基本信息】 上海交通大学 , 机器学习与计算机视觉, 2013, 博士

【摘要】 计算机视觉是人工智能领域的核心问题之一,它的目标是让计算机拥有人的视觉能力,也就是让机算计像人一样理解现实世界中的图像。计算机视觉在医学、工业、军事、航天等领域拥有广泛的应用。但是,根据人的视觉占用至少60%的人脑资源这个事实,计算机视觉在学术界被认为是“人工智能完全”问题,或者至少是“人工智能困难”问题。在众多计算机视觉的问题中,广义的物体识别,即在任意环境下识别任意物体,又是最核心的问题之一。总的来说,物体识别是让计算机自动地把图像中的物体分类。这是个非常具有挑战性的问题,它也是很多应用问题的最紧要瓶颈所在,比如图片搜索问题。虽然世界上诸如麻省理工、斯坦福、耶鲁、剑桥、普林斯顿等众多非常有实力的研究机构已经研究这个问题多年,广义的物体识别问题还远远没有得到很好的解决。但是,从机器学习的角度来说,物体识别的问题至少在一定程度上是可行的。准确的说,只要能合适地抽取图像特征、合适地描述物体和找到合适的分类模型,实现一个能满足实际应用的物体识别系统是可行的。在这篇论文里,我们将介绍一个基于机器学习的物体识别系统原型。这个原型系统包括三个部分:物体分割子系统、物体描述子系统和一个分类器。在这三个要点上,我们创造性地提出了自己的方法:一个基于混合图模型(HGM)的物体分割算法、一个基于拉多表示的物体描述算法(RRFD)和一个称为神经编码分类器(NCC)的分类算法。随后,我们对这个基本原型系统做一些改进工作:包括基于低秩描述(LRR)的图像聚类算法、基于局部线性转换(LLT)的多标签分类算法和基于反馈嵌入(FE)的大规模相似图像查找技术。具体来说,本文的创新点有:我们提出了用于一般半指导分类的HGM (Hybrid Graph Model,混合图模型),并建立了一个有效的物体自动分割方法。根据我们所知,我们是第一个将混合图引入机器学习的人。不同于传统的物体分割方法,我们的基于HGM的方法是自动的,即不需要手动分割好的训练数据。这使得我们的物体识别系统更加实用。我们提出了基于Radon变换的物体描述算法,称为RRFD (Radon Repre-sentation Based Feature Description,基于拉多表示的特征描述)。在物体已经从图像中分割出来后,RRFD可以把物体的形状、颜色、纹理等信息综合地集成到一个维度比较低的特征向量中去,并由此而实现精确的物体识别。除此之外,RRFD也可以作为一个一般的特征描述算法,它可以描述任意一个图像区域。物体识别中的最后一个步骤是对特征向量进行分类。我们提出了基于神经编码的分类器,称为NCC(Neural Coding Classifier,神经编码分类器)。和传统的诸如SVM的分类算法相比,NCC不仅能够很好地处理测试数据与训练数据同分布的情况,也能更好地处理测试数据与训练数据概率分布不同的情况。实验结果表明,在测试数据和训练数据概率分布相同的情况下,NCC的分类精度度略微超过SVM;在测试数据和训练数据概率分布不同的情况下,NCC可以显著地超过SVM。当一张图像中可能含有多个类的物体时,物体识别中对应的分类问题就是一个MLC(Multi-Label Classification,多标签分类)问题。多标签分类问题可以用MOR(Multi Output Regression,多输出回归)模型来处理。我们提出了用于定义回归分析中损失函数的LLT(LocallyLinear Transformation,局部线性转换)机制,并在SVR(Support VectorRegression,支持向量回归)框架下提出了一种结合LLT和SVR的多输出回归算法,即所谓的LLT-SVR。LLT-SVR即提供了一种很好的多输出回归分析工具,又为我们的物体识别系统提供了一种有效的多标签分类器。为了提高物体识别系统的实用性,我们需要一种有效的图像聚类机制。我们首次提出了用于处理矩阵数据信号的LRR(Low-Rank Representation,低秩表示)。LRR是一种新的压缩传感(Compressed Sensor)技术,和传统的SR(Sparse Representation,稀疏表示)相比,LRR能更好的描述数据的整体结构,从而在诸如图像聚类之类的数据聚类问题中,LRR有明显的优势。基于LRR,我们提出了一种有效的图像聚类算法。除图像聚类外,LRR子空间分割算法也是一种基本的数据聚类法。更重要的是,LRR首次提出了“低秩”(Low Rank)准则。LRR不但在机器学习领域产生巨大的理论影响,而且在计算机视觉和图像处理领域有着广泛的应用。为了提高物体识别系统的运行速度,我们需要一种高速的相似图像查找技术。我们提出了称为FE(Feedback Embedding,反馈嵌入)的数据降维算法。基于FE,我们可以设计出一种有效的语义哈希算法,进而实现在大规模物体识别系统中的快速相似图像查找。除研究物体识别和一些相关的机器学习问题(比如分类、聚类和降维等)外,本文也对一些根本的科学问题进行了讨论。比如我们探究大脑是如何处理视觉信号的,并提出了一个新颖的神经编码假设,即大脑是基于信号重构来处理信号的。

【Abstract】 Computer vision is one of the core problems of artificial intelligence. Its ultimategoal is to make computers own the visual ability of human, i.e., to see and interpret thevisual scenes in the way of human. Computer vision has wide applications in medi-cal, industry, military and aerospace etc. However, as it has been known that humanvision occupies at least60percentage of human brain, it is generally accepted that com-puter vision may be an “AI-complete” problem, or is at least an “AI-difcult” problem.Among the various vision problems, the problem of classifying the objects picturedby images into classes, so called as object recognition, is one of the most fundamentalproblems. It is a very challenge problem and is also one crucial bottleneck that blocksthe advance of many important applications such as image search. Although this prob-lem have being explored for many years by the world’s most competitive academiessuch as MIT, Stanford, Yale, Cambridge and Princeton, the problem is still not wellsolved. However, with the viewpoint of machine learning, object recognition is feasi-ble, at least to some extend. Namely, it is possible to implement an practicable objectrecognition system that fits the requirements of real applications, provided that onecould appropriately extract the features from images, appropriately represent the ob-jects, appropriately represent the object class, and establish an appropriate mechanismto classify the objects.In this thesis, we firstly introduce a prototype of a machine learning based ob-ject recognition system, which is consisting of an object segmentation sub-system, anobject representation sub-system and a classifier. We devise novel algorithms to es-tablish these sub-systems, including an HGM-based object segmentation method, anobject representation approach named RRFD and a classifier named NCC. In order toimprove the performance of the object recognition system, we propose the models ofLRR, LLT and Feedback Embedding for image clustering, multi-label classificationand fast similarity search, respectively. To be precise, the innovations of this paper include:We propose HGM (hybrid graph model) for semi-supervised data clustering. Tothe best of our knowledge, we are the first to introduce the hybrid graph intomachine learning. Based on HGM, we devise an efcient and efective systemfor automatically segment objects without annotated training images. This au-tomatic object segmentation approach makes our object recognition system bemore appealing.We propose a new feature descriptor based on the Radon transform, called as theRRFD. Given the images with objects being separated from backgrounds, RRFDconverts the objects to a feature vector that encodes the shape, texture and colorof the objects. Moreover, RRFD can be also taken as a general feature descriptorto generate feature vectors for an arbitrary image region.To recognize object categories, we need to classify the feature vectors into theirrespective classes. Based on a neural coding hypothesis, we devise a new classi-fication algorithm, called as the NCC. In comparison with the widely used SVMmethod, NCC performs much better in handling the data with diferent training-testing distributions. While the testing data is sampled from the same distributionas the training data, NCC also slightly outperforms SVM.When a single image can contain objects of multiple classes, the classificationproblem becomes a MLC (Multi-Label Classification) problem. We propose anovel mechanism, called as the LLT, for defining the loss functions in regressionframeworks. Based on the well established SVR framework, we implement anefective MOR (Multi-Output Regression) algorithm, called as the LLT-SVR.LLT-SVR also provides an efective way for multi-label classification. So it canextend our system from single object class to multiple ones.In order to improve the practicability of the object recognition system, we needa mechanism to group images into their respective topics. We establish the cri-terion of low rankness and propose a new method named LRR (Low-Rank Rep-resentation). To the best of our knowledge, we are the first to introduce thelow-rank criterion into machine learning. Based on LRR, we have established an efective algorithm for image clustering.In order to achieve fast recognition in large-scale database, we devised a new se-mantic hashing indexing structure. The core of this structure is a new dimension-ality reduction algorithm, called as the FE (Feedback Embedding). Comparingto previous methods such as LLE (Locally Linear Embedding), FE provides amore convincing mechanism for dimensionality reduction.Besides the object recognition and some corresponding machine learning problems, inthis article we also explore some essential issues of science. For example, we try toanswer the question of how human brains process visual signals. Namely, we makea new neural coding hypothesis that reveals the reconstruction mechanism in humanbrain.

  • 【分类号】TP391.41;TP181
  • 【被引频次】3
  • 【下载频次】2363
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络