节点文献

基于内容的敏感网页过滤器的研究与实现

Research and Implementation of Filter for Erotic Webpage Based on Content

【作者】 孙健

【导师】 申铉京; 陈海鹏;

【作者基本信息】 吉林大学 , 软件工程, 2012, 硕士

【摘要】 互联网的快速发展使人们能够轻松地实现海量信息资源的传输与共享,给人们的生产、生活和信息交流带来了极大的便利,对全球经济、文化的交流起到了巨大的推动作用。然而,这也给不法分子发布和传播色情、暴力、反动等敏感信息提供了机会。Internet上的信息量以指数形式飞速增长,信息类型也变得更加丰富,由单一的文本方式逐步变为以图像、视频等多媒体信息为主的表现形式。色情、暴力等敏感视频因其具有强大的视觉冲击力而成为了不法分子广为传播的对象,借助互联网这种跨地域、跨国界、开放式的通讯方式,它的不良影响将遍布世界各个角落,给社会稳定、人们的日常生活带来了严重的毒害作用。因此,敏感网页过滤器的设计与开发对营造我国绿色互联网环境、维护安定的社会环境、保护网民特别是青少年身心健康具有非常重要的意义。基于此,本文利用BHO技术设计并实现了敏感网页过滤器,该过滤器由网址过滤器、网页文本过滤器和网页敏感图像过滤器三个子过滤器组成。首先对网址进行过滤,BHO技术可以实现从IE浏览器的地址栏中获取访问网页的URL。将该URL与敏感URL数据库中的信息进行比较,如果该URL是敏感网址,则返回空白网页,否则进行网页文本和网页图像的检测。其次过滤网页文本,如果获取的URL是非敏感网址,则浏览器下载网页资源并进行网页文本和图像过滤,通过DocunmentComplete事件可以获知网页内容是否下载完毕,下载完毕后,通过DHTML文档模型来获取文本内容,并采用最大跳跃(SMA)算法将网页文本与敏感词汇数据库进行匹配。最后进行网页敏感图像过滤,采用人脸检测、肤色检测、皮肤纹理检测和分类器识别结合的敏感图像检测算法进行检测。人脸检测的目的是确定图像中包含人物,利用Sobel算子和统计直方图模型进行基于纹理的肤色检测,以确定图像中的肤色区域,利用Gabor滤波法对图像中的肤色区域进行皮肤纹理检测,采用分类器对敏感图像和非敏感图像进行识别。实验测试结果表明,本文设计的敏感网页过滤器能够有效拦截并过滤敏感网页,基本上实现了对敏感站点访问的控制。

【Abstract】 With the rapid development of Internet, people can easily transfer and share vastamounts of information resources. It brings great convenience to people’s production, life, andinformation exchange, promoting global economic and cultural exchanges. However, Internetalso provides a chance for lawbreaker to release and spread information such as pornography,violence, reactionary. The amount of information on the Internet is rapid growth at anexponential form, and the type of information has become richer from containing just a singletext gradually into containing images, video and other multimedia information. Pornography,violence and other sensitive video because of its powerful visual impact has been used widelyby lawbreaker, due to Internet’s cross-regional, cross-border, and open communication, itsconsequences is around the corners of the world, and endangers social stability and people’sdaily life. Therefore, the design and development of sensitive Web filter to create a greenChina’s Internet environment, maintain a stable social environment and protect Internet users’especially young people’ s physical and mental health has very important significance.Based on this, sensitive web filter was designed and implemented in this study using theBHO technique. The filter is composed of the URL filter, page text filter, and web-sensitiveimage filter. First, URL filtering, the BHO technology can get access to the URL of the pagefrom the IE browser’s address bar. Compare the URL with sensitive URL databaseinformation, if it is sensitive URL, return a blank page; if it is not, detect web page text andpage image. Second, filter page text, if it is not sensitive URL, the browser device downloadsweb resources and filters web page text and image, it can be informed when the downloadedcompletes by DocunmentComplete event. Once finished downloading, achieve text contentby using DHTML document model, and match the web page text and sensitive wordsdatabase with the largest jump (SMA) algorithm. Last, filtering of sensitive image inWebPages, using the algorithm combination of face detection, color detection, skin texturedetection and classification device identification to detect. The purpose of face detection is todetermine that if the image contains characters. We use Sobel operator and statistics straightside graph model to implement skin color detection based on texture, to determine the regionof the skin color. We use Gabor filtering to detect texture of the skin in the region of the skin color. Identify the sensitive images and non-sensitive image with classifier.The test results show that the sensitive web filter designed in this paper can effectivelyintercept and filter sensitive pages, basically control the access to sensitive sites.

  • 【网络出版投稿人】 吉林大学
  • 【网络出版年期】2012年 09期
  • 【分类号】TP391.41
  • 【下载频次】106
节点文献中: 

本文链接的文献网络图示:

本文的引文网络