

【作者】 王影

【导师】 卢显良;

【作者基本信息】 电子科技大学 , 计算机系统结构, 2005, 硕士

【摘要】 伴随着Internet 的普及,电子邮件以其快捷、方便、低成本的特点日益得到了广泛的使用,成为了最流行使用的沟通工具之一。然而,作为其发展的副产品――垃圾邮件,却给Internet 用户、网络管理员和网络服务提供商ISP 带来了无尽的烦恼,收件人的时间、带宽和存储资源被无效占用,网络链路因此造成拥塞,还被作为不良信息的载体被到处散发。现在成熟应用的垃圾邮件过滤方法是采用通过软件自动过滤与人工管理相结合的方式,但这不能很好的适应垃圾邮件的多样性,只能过滤掉50%左右的垃圾邮件。因此,迫切需要引入更加智能化的垃圾邮件过滤技术来治理日益猖獗的垃圾邮件问题。本论文课题的主要目标是探索一种具体的垃圾邮件过滤模型,实现并测试该模型。研究中要观察所选择的模型是否适当,注意此模型自身参数和环境参数调节对过滤性能的影响,因此,实验需要能够彻底的检测出模型的有效性和可行性。作者在课题研究期间很好的完成了上述目标。本论文提出了LVQ 邮件过滤模型和改进型BW 邮件过滤模型,详细的描述了两个模型的设计原理,讨论了两者之间的关系以及它们与邮件服务器的关系,并给出了重要的实现框架与代码。LVQ 邮件过滤模型解决了布尔型邮件过滤模型特征项离散、垃圾邮件与正常邮件边界定义模糊的问题;改进型BW 邮件过滤模型针对传统黑白名单模型提出了改进,减少了用户对边界地址错误界定带来的损失。虽然当前已经存在多种多样的垃圾邮件过滤方法,但是还有许多垃圾邮件相关问题没有找到好的解决办法,这大大的影响了邮件过滤系统的过滤性能,使得垃圾邮件的危害没有减轻。本论文提出的新的邮件过滤模型解决了其中的一些问题,在一定环境下能够提高邮件过滤系统的过滤性能,因此,本课题的研究是具有意义的。

【Abstract】 As the popularization of Internet, e-mails are more and more frequently used,benefiting from its high efficiency, convenience and low cost. At the same time,however, their byproduct, spams are bringing endless trouble to Internet users,network administrators and Internet service providers. With the spreading of thesecarriers for bad or useless information, the users’time is wasted, the bandwidth andstorage space are consumed, and even the Internet is congested. The mature spamfiltering methods used now combine both the automatic filtering of the software andmanual management, which has been proved not adaptive to variety of spams, and itis estimated only 50% of spams can be detected. Therefore, more intelligent filteringtechniques are required.This main goal of this paper is to explore a specific spam filtering model,implement and test it. During our research, we need to examine carefully whether themodel is a fit one, and observe how the parameters of this model itself and theenvironmental parameters influence the filtering performance. So, the test shouldreveal the feasibility and efficiency of this model thoroughly. The author has achievedthe goal above.This paper put forwards two filtering model, Learning VectorQuantization(LVQ)and improved Black&White List(BW), described the designprinciples, discussed their inter-relationship and their relationship with the mail server,and provided important implementation framework and codes. LVQ model solved thediscretion of eigenitem in the Boolean filtering model and the difficulties indistinguishing between spams and normal mails. Improved BW model madeimprovements over the traditional black list and white list model, and decreased theusers’loss due to incorrect bordering address.Though varieties of filtering methods exist now, a large number of problemsabout spam filtering still remain to be solved, which impedes the filteringperformance from improving. New filtering models brought forward in this paper hasprovided some solutions. It has been proved that they can improve the filteringperformance in some environment. Therefore, the research is of great value.

  • 【分类号】TP393.098
  • 【下载频次】67

