节点文献

基于图像内容的成人图像检测

Pornographic Image Detection Based on Image Content

【作者】 王宇石

【导师】 高文;

【作者基本信息】 哈尔滨工业大学 , 计算机应用技术, 2009, 博士

【摘要】 当前,互联网已经成为人们生活中密不可分的重要组成部分。但同时在互联网海量的图像中,出现了大量有害的成人图像。检测并过滤互联网上的成人图像,已经成为各国研究者日益关注的一个紧迫问题。其中,多数的研究者采取了分析图像内容的方法来识别成人图像。近些年来提出的识别算法大多都建立在各种低层图像特征的基础之上,例如颜色、纹理和肤色区域等方面的特征。这类方法会产生大量的误检,特别是当图片中包含了大面积类似肤色的区域,例如人物类图像被误判的情况就很普遍。于是本文展开了对图像内容的深入分析,试图减少对肤色检测结果的依赖,并降低误检率。本文选择图像的局部特征为突破口。在系统中,将局部特征量化为视觉单词,据此可以高效地分析图像的上下文语义,并结合其它方面的低层特征,对图像的类别给出综合的判断。论文的具体研究内容如下:首先对各种重要的低层特征展开了的研究,并进行了相应的改进。所考察的特征包括:颜色、肤色分布、局部特征、以及边缘线条特征。对于各种特征,都探讨了不同的方案在成人图像检测中的效果。并针对传统方法的不足,提出了有针对性的改进,主要包括:通过统计肤色块局部模式的出现规律,提出了一种新的描述肤色分布的特征;在局部特征方面,同时使用了局部的形态和纹理信息,并适度简化了局部特征点的采集算法;此外,对于局部特征无监督量化中所产生的随意性,通过调整和限制局部特征量化簇集的半径,提高了量化结果(即“视觉单词”)的质量;对图像中线条的分布情况,以局部短线段为基础,建立了旋转不变的描述。实验证明,上述特征产生了更好的识别能力。然后以普通的视觉单词为基础,建立了对成人图像视觉单词上下文的多层描述体系。该体系总共分3个层次,除了普通的视觉单词,还包括:中间层的词组,以及更高层的兴趣区域(region of interest,ROI)话题。词组是视觉单词的局部相邻关系的描述模型,本文建立了一种简单而高效的局部词组生成算法。ROI话题则用于在更大的尺度上(ROI)描述成人图像中视觉单词的上下文关系。在实验中发现,高层的视觉单词降低了普通单词的歧义性,并提高了对成人图像的识别性能。此外,还提取了敏感单词分布特征,从而补充了对视觉单词的全局分布信息的描述。最后,将子空间学习的思想融入到算法中,通过向量映射,不但使图像特征向量得以显著降维,而且使图像的语义距离和空间距离更为协调。通过上述各项对视觉单词出现规律的多层次分析,有效地提高了成人图像的识别准确率。实验结果证明,相比于传统类型方法,基于视觉单词的方法不再从根本上依赖肤色检测,从而明显地降低了误检率,尤其是在人物类图像中效果更加明显。基于上述的多层描述体系,提出了一种融合了视觉单词上下文的图像核函数。该核函数以单词和词组的多粒度直方图金字塔为基本框架,利用直方图的交运算来计算图像的相似性,并在其中融入了各个单词所处的上下文类别信息。实验结果显示,不论是在一般意义的图像识别中,或是在本文所讨论的成人图像识别中,均可以借助这种核函数来提高支持向量机(support vectormachine,SVM)的识别性能。考虑到基于上述核函数的检测方法具有较高的计算复杂度,于是又提出一种将核函数与局部学习相结合的识别算法。该算法使特征空间中成人图像模式的分析变得尽可能局部化,从而可以只使用一幅图像邻近的训练数据来对其进行分类。首先利用一些普通的特征将图像分成若干组;而后在各组的训练数据中采集了部分有代表性的数据点作为代表点;继而在各代表点邻域内建立了子SVM分类器,并依据各个子SVM的识别性能对其赋以相应的权重;最终利用测试图像的k个近邻子SVM来共同判断图像的类别。在实验中证明,这种基于局部空间分析的策略不但有效地控制了计算复杂度,而且能够准确地识别散布于各个局部空间中的成人图像。本文充分利用了成人图像中各种类型的信息,全面地分析了图像的语义,以视觉单词为基础,发展出了一套完整的识别策略。系统的检测性能明显地超越了传统类型的成人图像检测方法,在以往难以准确识别的图像中,错判大为减少。

【Abstract】 Internet has been an important part of our life. However, in the sea of webimages, there are a large number of pornographic ones. The urgent task of detect-ing and filtering those harmful images attracts more and more attention of researchersthroughout the world. Most of the researchers try to find pornographic images throughanalyzing image content. Traditional approaches proposed in recent years are basedon simple, low-level visual features such as color, texture, skin regions, etc. Thosesystems generate many false positives when they detect benign images with large re-gions of skin-like colors, for example, human images. The dissertation aims to providea deeper understanding about image content and create detection systems which areless dependent on skin detection to generate much fewer false positives.This dissertation analyzes pornographic images mainly based on local features.Local features can be coded as visual words by which the context of images can beconveniently expressed and analyzed. Images will be classified based on both visualwords and other low-level features. The detailed descriptions of the methods are asfollows.First, a comprehensive research is done on different types of image features,including color, edge, skin region distribution, and local features. Discriminative fea-tures are selected or proposed for each feature type to re?ect the characteristics ofpornographic images. The dissertation proposes to represent the skin region distribu-tion with the occurrence statistics of local uniform patterns of skin blocks. For thevisual words, local appearances and textures are extracted as local features. The ex-traction of local features is simplified to speed up the computation. Moreover, therandomness of the construction of visual words is reduced by adjusting the clusters’radii in the unsupervised coding of local features. Finally, a group of rotation-invariantfeatures of edge distribution are developed based on short line segments. Experimen-tal results show that all of these features are more discriminative for pornographicimage detection.Next, a multi-level image representation is constructed based on visual words.The model comprises three levels: word, phrase, and ROI (region of interest) topic. An effective method is proposed to construct phrases which code the co-occurrencepatterns of neighboring words. ROI topics reveal the context of words in a largerscale. It is proved in the experiments that the higher level representation can reducethe ambiguity of common words. Thus the pornographic images can be detected moreaccurately. To describe the global distribution of words, the author also developsthe distribution features of pornography-related words. At last, with the means ofsubspace learning, the multi-level representation is projected into a low dimensionalspace to bridge the gap between the semantic similarities and geometric distances ofimage pairs. After the multi-level analysis of visual words, results of skin detectionare no longer crucial in the detection. The proposed method outperforms traditionalmethods, especially in human images.Based on the above multi-level representation of visual words, the dissertationproposes a novel kernel which fuses multi-level context of visual words. The authorconstructs multi-resolution histogram pyramids of words, phrases, and the classes ofthe ROIs in which they are located. Then the similarity of image pairs is evaluatedby the intersection of the histograms. Experimental results demonstrate that supportvector machines (SVM) using the kernel can perform better in not only pornographicimage detection but also general image-classification problems.Considering the high computational cost of the image classification based on thenovel kernel, the author proposes to integrate the kernel into local learning. Then,the pattern analysis of pornographic images is limited to corresponding local featuresubspaces. In other words, an image can be classified only based on a small part oftraining images close to it. First, images are grouped according to low-level features.Second, in each group the algorithm selects one part of representative training dataaround each of which a local SVM is constructed. Then, these SVMs are weightedby their classification performances. Finally, a test image is classified by its k nearestSVMs jointly. It is shown in the experiments that the system has a lower compu-tational cost and accurately recognize pornographic images distributed in differentsubspaces.In this dissertation the author makes use of various information in pornographicimages and gives a comprehensive semantic analysis. Based on visual words, a com-plete detection strategy is developed. Compared with baselines, the proposed systemshave better performance, particularly in the images which are difficult to be classified by traditional methods.

节点文献中: