节点文献

面向图像分类和识别的视觉特征表达与学习的研究

Research on Computer Vision Feature Representation and Learning for Image Classification and Recognition

【作者】 杨钊

【导师】 金连文;

【作者基本信息】 华南理工大学 , 信息与通信工程, 2014, 博士

【摘要】 视觉特征的提取是图像分类和识别中的一个关键环节,良好的特征设计能够减轻对后续机器学习算法的依赖性,特征的好坏直接制约着整个视觉系统的性能。因此,特征的研究一直是计算机视觉领域的一个重要研究方向。在长期的研究过程中,研究人员提出了各种特征提取方法用于解决具体的分类问题,这些特征包括基本的颜色特征、纹理特征、局部特征及全局特征等等,它们分别在各种图像分类和识别任务上取得了较好的应用,然而这些传统的特征提取方法存在两个问题:首先,随着视觉任务规模的增大以及复杂性的增强,如果直接用这些基本的特征进行分类任务,经常表现出不足。为此,研究人员提出了“特征表达”的方法,它是在最基本的特征基础上进行矢量量化、稀疏编码或其它表达方式以形成一幅图像最后的特征。最典型的特征表达方法是“词袋”(Bag of Words, BoW)模型,它是对图像的基本特征进行再次统计以形成最后的特征表示,基于该思想的特征表达方法在近几年(2006年~至今)得到了广泛的研究和应用,并在图像分类和识别上取得了非常好的性能。其次,针对某一视觉问题,通常情况下我们需要非常强的先验知识或者通过不同的特征尝试及参数选择才能得到令人满意的特征,给整个分类问题带来复杂性。因此,最近几年(2007年~至今)出现了“特征学习”的研究,它试图从原始的像素出发通过特定的神经网络结构自动发现图像中隐藏的模式以学习出有效特征。典型的方式有基于单层网络结构的特征学习和基于深度结构的特征学习,它们在图像分类和识别上均取得了成功的应用。针对以上情况,本文以提取有效的视觉特征为目的,着重研究面向图像分类和识别的视觉特征表达与学习。在分析目前特征提取方法的基础上,提出了新的特征表达与学习方法并用于解决具体的视觉问题,主要研究内容和创新工作包括以下几个方面:1.提出了基于局部约束编码的Kinect图像特征提取方法,即分别对RGB图像和深度特征提取dense SIFT特征并进行局部约束线性编码以形成Kinect图像对(image pairs)的特征表示,应用于场景分类和目标分类,在NUY Depth和B3DO数据集上验证了特征表达的有效性。2.对行人重识别(Person Re-identification)的特征提取方法进行对比研究,并提出HSV和Lab颜色统计直方图局部约束线性编码的方法用于提高行人重识别率。针对目前行人重识别中特征提取方法的复杂性,提出了目标中心编码(Object-Centric Coding, OCC)的外观模型,该方法是对行人图像进行SCA(Stel Component Analysis)分析以提取行人的轮廓区域,然后对该目标区域进行局部约束编码,有效地减少了杂乱背景的影响。我们在VIPeR行人重识别数据库上进行行人重识别实验,同时采用不同的距离学习方法来评价OCC的有效性。结果表明OCC能够极大地提升行人重识别的正确率,且在不同的距离学习下具有非常高且一致的识别率。3.分析对比了几种常见的单层网络特征学习方法,并提出了L2正则化的稀疏滤波(L2Regularized Sparse filtering)特征学习算法。该方法在保证特征学习的稀疏性同时对特征映射权值矩阵进行约束,以增强算法的泛化能力。我们在四种不同的特征学习数据库STL-10、CIFAR-10、Small Norb以及脱机手写汉字上进行对比实验,证明了该方法比原始的稀疏滤波具有更好的性能。4.研究了基于深度学习的特征学习方法,并针对传统两级手写汉字识别系统中相似手写汉字识别(SHCCR)受特征提取方法的限制,提出了采用卷积神经网(Convolutional Neural Networks, CNN)对相似汉字自动学习有效特征并进行识别,并采用来自手写云平台上的大数据来训练模型以进一步提高识别率。实验表明,相对于传统的基于梯度特征的支持向量机(SVM)和最近邻分类器(1-NN)方法,识别率有较大的提高。通过上述的研究工作,结果表明:有效的特征表达方法能够极大地改善视觉图像分类和识别的性能;基于单层网络和深度结构的特征学习能够对原始的图像数据学习出有效的特征,避免了人工设计特征的复杂性,是一个非常前沿的研究方向且具有广泛的应用前景。

【Abstract】 Vision feature extraction is critical for image classification and recognition. Featureswith good performance could reduce the dependence on complex machine learning algorithmsto get satisfactory results, and they directly influence the performance of a whole visionsystem. Therefore, feature extraction is an important research direction in the field ofcomputer vision. During the research process, several feature extraction methods such ascolor feature, texture feature, local feature and global feature, have been proposed to solvespecific problems, and often could obtain good results. However, there are some problems forthe traditional feature extraction methods.First, as the increasing complexity of the vision tasks, the basic features could not getsatisfactory results when they are used for classification tasks directly. Therefore, featurepresentation methods are proposed to improve the performance of the features. Featurerepresentation refers to conduct vector quantization,sparse coding or other methods to get afinal feature representation of an input image. The most typical method is “Bog of Words”(BoW) model, which performs statistical analysis on the basic features according a dictionary.The methods based on BoW have been extensively studied and used in recent years (since2006), and get better results on image classification and recognition.Second, for a certain vision task, generally it requires much prior knowledge or complexparameter selections to get a satisfactory result, increasing the difficulty of the classificationproblems. To address this problem,“feature learning” method is proposed in recent years(since2007), and it works by learning features automatically from the raw pixels through aneural network structure. In general, there are two types of networks could be used for thispurpose, including single-layer neural networks and deep neural networks, and both of themget successful applications in image classification and recognition.Considering above situations, this dissertation dedicates to the research of featurerepresentation and learning based on image classification and recognition for the purpose ofget effective vision features. By analyzing the current feature presentation and learningmethods, we proposed new approaches and applied to solve the specific vision tasks. Themain work and innovations of this dissertation are as follow: 1. In this dissertation, we propose the feature extraction method for Kinect imagesbased on locality-constrained linear coding (LLC). Specifically, we extract denseSIFT features from the RGB image and depth image of a Kinect image pairs andconduct feature coding respectively. The features are used for Kinect sceneclassification and object classification, and the experiments on NUY Depth andB3DO datasets to show the features performance.2. We investigate a comparative study of several feature extraction methods for personre-identification, and propose a new feature extraction method by integrating theLLC and HSV、Lab color histogram. Additionally, due to the complexity of featureextraction methods in recent literatures, we propose a new appearance model calledObject-Centric Coding (OCC) for person re-identification. Under the OCCframework, the silhouette of a pedestrian is firstly extracted via Stel ComponentAnalysis (SCA), and then dense SIFT features are extracted followed by LLCcoding. By this, the coding descriptor could focus on the genuine body eliminatingthe influence of the background. Results from the comparative experiments alongwith the existing approaches show the OCC model significantly improves theperson re-identification rates, while several metric learning methods are used toevaluate its effectiveness.3. We analyze several feature learning methods based on single-layer networks, andproposed L2regularized sparse filtering for feature learning. This method couldguarantee the sparsity distribution of the learned features and gain bettergeneralization ability in the meantime. Classification experiments on four differentdatasets: STL-10, CIFAR-10, Small Norb and subsets of CASIA-HWDB1.0handwritten characters, show that our method has improved performance over thestandard sparse filtering.4. We also investigate the feature learning methods based on deep learning. In viewthat the recognition rates of the Similar Handwritten Chinese Character Recognition(SHCCR) in traditional two-level classification systems are not very high due to therestriction of the feature extraction methods, a new method based on ConvolutionalNeural Networks (CNN) is proposed to learn effective features automatically and conduct recognition. In addition, we use big data from a handwritten cloud platformto train the network to further improve the accuracy. The final experimental resultsshow that our proposed method achieves better performance comparing withSupport Vector Machine (SVM) and Nearest Neighbor Classifier (1-NN) based ongradient feature.In conclusion, through above work, it turns out that effective feature representationmethods could improve the performance of image classification and recognition greatly andthe feature learning methods based on single-layer networks and deep architectures couldlearn feature from the raw image data avoiding the complexity of the design of thehand-crafted features. And the feature learning is a very frontier research direction and haswide application prospects.

节点文献中: