节点文献

基于内容的敏感图像过滤技术的研究

Research on the Technologies of Content-Based Erotic Image Filtering

【作者】 孙竞媛

【导师】 申铉京;

【作者基本信息】 吉林大学 , 计算机应用技术, 2007, 硕士

【摘要】 在中国互联网迅速发展的同时,互联网中的不良信息给网络虚拟世界带来了不和谐之音。防止网络黄毒蔓延的研究已不满足于采用网址封锁和敏感关键词匹配技术,由此引发基于内容的图像过滤技术的研究逐渐深入起来。本文依托于2004年度珠海市科技项目(PC20041101)——“基于内容的敏感图片过滤技术的研究及其在IE浏览器中的实现”,着重于研究前人提出的基于内容敏感图像过滤器中所采用的关键技术——肤色检测技术,改进了基于统计直方图的贝叶斯分类模型,提取了五个效率较高的分类特征,并在此基础上构造了一个敏感图像分类器。首先构造了一个基础数据库,用于统计肤色象素点及非肤色象素点的RGB颜色分布其概况,在此基础上进行了后续研究。针对以往基于统计直方图的贝叶斯肤色分类算法的不足,在大量统计数据的基础上建立了一个改进的统计直方图模型,进而提出先验概率及条件概率公式,最后实现了一个新的贝叶斯分类模型。而后本文对实现象素点肤色分类的条件概率数据进行了反复筛选,以求达到较精确的肤色检测效果。为了实现对敏感图像的有效分类,本文提取了效率较高的十个分类特征,经测试后选取五个特征作为分类器的输入向量。其中加入了基于AdaBoost算法的快速人脸检测技术,以降低大头贴式肖像类图像的误检率。本文建立的模型在标准掩码库的检测下正检率达到80.83%,误检率为13.20%。在4624幅测试图像库上的总体正检率达到88.51%(敏感类正检率为71.15%,正常类为91.23%)。

【Abstract】 Currently, the Internet is flooded with all kinds of eroticism and pornography with rapid growth, which has terrible influence on the cleanness and harmoniousness of the virtual world. In order to restrain the rapid spread speed of these eroticism information over the Internet, traditional technologies such as blockage based on IP or sensitive keywords matching haven’t work effectively any more. In this situation, the research focus on the image filtering technology has been developing rapidly. Founded on the project“Research on Content-Based Erotic Image Filtering technique and its realization in IE”of Zhuhai Science and Technology Planning Projects in 2004, in this paper, it studies the key technology of Content-Based erotic image filtering, the skin-color detecting technology, and finally constructs the Byes Classifier model based on skin-color statistical histogram. After that, extract five feature vectors for classifying erotic images.Erotic images are characteristic of bareness skin, so we use skin detecting models, texture models to detect skin-color area and build binary image, then distill character vector, and finally use corresponding classing algorithm to filtrate images. We construct a more complete image database, containing a marked skin-mask bank of 1442 images and a test image bank of 15890 images, and sign the images using the classification strategy. All the work we have done in this paper is based on the image bank.The main work of the dissertation is as follows:(1) Construct a database including two types of tables: the statistics table used to store statistic data and the tests table used to store conditional probability of skin-color detecting model. The essential data in both statistics table and tests table is based on the pixels of the images from the standard skin-masked images bank, which contains 1442 images and 0.75 billion pixels. We use Microsoft SQL Server 2000 to build the database with ADO (ActiveX Data Object) database technology, which contains 16,777,216(256* 256*256) rows of records. On the one hand, these records are useful for the research on the distribution of RGB values of skin and non-skin pixels; on the other hand, they provide data for further tests in the model. (2) The research of the skin-color detecting model. The skin-color detection seems simple but complicated mainly for the influence of the factors such as race, illumination, noise and so on. At present there are three methods of skin-color detection in common use in the research field: the Chroma Space Algorithm, the Byes Classifier Algorithm based on skin-color statistical histogram and the Seed Diffusion Algorithm based on neighboring information. This paper improves three inadequate places of the Byes Classifier Algorithm based on skin-color statistical histogram mentioned by Jones and Rehg, which are the construction of skin and non-skin models with images containing skin and not containing, statistics for pixels and 32 bins per channel in RGB color space.First construct two kinds of RGB histogram model from images containing skin in 256 bins per channel in RGB color space. Based on more suitable model, we promote Byes Classifier Algorithm, then after comparing the auto-generated masked images with hand-generated masked images, we collect and analysis statistical rates of Omission Rate and False Positive Rate, extract the needed prior probability formula and the conditional probability formula, and finally build the Byes Classifier Model. After that, by comparing the cnt’s values in the statistics table, we select the relatively valuable records, and insert them into the test table as the conditional probability. The cnt’s value shows the appeared times of skin-pixels in a row of record. In order to check the correctness and completeness of the selection, we collect statistic rates of Omission Rate and False Positive Rate from the generated marked images and build a check table of the rates of Omission Rate. Then by adding the omitted RGB values of skin pixels, we complement the test tables, and finally obtain the actual conditional probability of the test table in the Byes Classifier Model based on RGB histogram.After images’detecting, we adopt one-rank-gray stat as the texture model. The area (such as Yellow of sofa, yellow of woolen blanket etc.) will be masked as non-skin. It decreases the false positive rates and supports the corresponding classing algorithms with valid characteristics.Compared with one mentioned by Jones and Rehg, our model decreases the influence of non-skin pixels in a skin-marked image. We evaluate the optimal threshold through estimating the Equal Error Rate and choose the thresholdθ= 0.07 in our training set. Compared with the 76.55% correctness and 14.59% omitted ratios in Jones and Rehg’s Model, in our model, the correctness of skin-color detecting can achieve 80.83% on the test set which contains 1442 images, and the omission rate of 13.20%.(3) The feature vector extraction and evaluation for classifying erotic images. Before classifying, we extract ten features that are relatively more appropriate for classifying from masked images and its corresponding original images, and then respectively, we evaluate these features by considering their capability for classification, and finally select five features to form the classification character set. In order to reduce the false positive rate of classification for portrait image effectively, the human face detection mechanism is utilized in the filter. Take into account of both precision and computing speed, in this paper, we use the face detection mechanism proposed by P.Viola, which combining AdaBoost and Cascade technology, and achieved by OPENCV. The results show that the precision of our system can be improved largely (about 10% on our test set) after adding the face detection mechanism into our erotic image classifier.(4) Experiments and analysis show that our erotic image classifier can identify the benign images and erotic images effectively, with precision of about 88.51%(while the precision for erotic images recognition is 71.15%, the precision for benign image is 91.23%) on our test set with 4624 images.There are still many aspects of our filtering system that need to be improved and perfected, such as more efficient skin-color pixel detecting model, the correctness of the face detection mechanism, the optimization of the system real-time capability. These are also our future work.

  • 【网络出版投稿人】 吉林大学
  • 【网络出版年期】2007年 04期
  • 【分类号】TP391.41
  • 【被引频次】8
  • 【下载频次】316
节点文献中: 

本文链接的文献网络图示:

本文的引文网络