节点文献

基于指纹分析的垃圾邮件过滤技术研究

【作者】 陈爽

【导师】 张凤荔;

【作者基本信息】 电子科技大学 , 软件工程, 2011, 硕士

【摘要】 电子邮件已经成为人们生活、工作不可缺少的工具,但同时垃圾邮件的肆意泛滥,又造成了极大的危害。因此,反垃圾邮件技术一直是国内外的研究热点。本文详细分析了垃圾邮件的相关特点,深入探讨了对其进行分析控制的方法。首先,对国内外当前的垃圾邮件过滤技术进行了详细分析,包括其具体的检测方法、效果以及优缺点。在研究分析的基础上发现,这些过滤技术虽然能够达到比较高的识别准确率,但大多数都是在垃圾邮件发送完成后才开始对其进行分析;黑名单和域名反向查询等能够在邮件传输过程中进行分析的技术,又很容易被垃圾邮件躲避。因此,本论文的主要工作在于寻找一种比较理想的分析技术,能够在垃圾邮件传输过程中,对其进行准确的识别。其次,由于垃圾邮件发送者通常伪造邮件头,造成某些字段信息被主流过滤技术忽视。通过大量的对比和分析发现,对于在一段时间内来自同一发送源的垃圾邮件,其邮件头的某些字段带有相同的特征。为了更好的描述这些特征,本文提出并实现了一种基于邮件头的指纹分析技术。该技术根据邮件头中5个关键字段,生成特定的指纹数据,并依据垃圾邮件指纹库进行比对,能够在邮件的传输过程中对大批量发送的垃圾邮件进行准确的分析和识别。另外为提高指纹提取和比对的效率,本文采用了MD5加密算法和二叉树结构进行设计和实现。最后,针对目前的过滤技术仅仅对垃圾邮件进行识别,缺少对垃圾邮件发送者行为的抑制措施,本文在深入分析TCP可靠传输的基础上,设计并实现了三种发送行为控制机制,包括:增加响应时延、丢弃数据报和混合机制。这些机制能够依据指纹分析的结果,在不同程度上对垃圾邮件剩余数据报的传递造成阻塞,控制发送方的传送效率,实现降低其吞吐量的目的。本文通过实验证明了本技术的可用性和有效性。目前,本文提出的指纹分析和发送行为控制技术,已经作为重要模块集成到自主研发的企业级垃圾信息综合举报系统中。

【Abstract】 E-mail has become an indispensable tool to people living and working, but at the same time, spam has caused great harm, because of its overflow. Therefore, the anti-spam technology has always been hot topic in research domestic and oversea. This thesis makes a detailed analysis about the related characteristics of spam, and depth of its detection and control method.First of all, domestic and foreign current spam filtering technologies are analyzed in detail, including the process methods, effects and advantages and disadvantages. Based on the analysis of the study, found that, although these filtering technologies can achieve high identification accuracy, but most only start to identify spam when mails are completely received. Blacklist and reverse-domain which can be effective during mail transferring, could led spam escape easily. For this reason, the major work of the thesis is to search a more ideal filtering technology, in order to accurately identify spam when they are in transmission process.Secondly, as spam senders usually forge mail header, causing some field information of header ignored by mainstream filtering technologies. Through a lot of contrast and analysis, found that, spam, from the same source in a period of time, have same features in some fields in their headers. In order to describe these characteristics in a better way, this thesis presents and implements a fingerprint analysis technology, which based on mail header. According to 5 field in mail head, this technology generate its specific fingerprint, and process comparison on the basis of spam fingerprint database, carries accurate analysis and identification during the mail transmission. To improve the efficiency of the fingerprint extraction and comparison, this thesis adopts the MD5 encryption algorithm and binary tree to design and implement.Finally, as the current filtering techniques only process identification about spam, are lacks of inhibition measures to sender. Based on analysis about TCP reliable transmission, this thesis designs and implements three sender behavior control mechanism, including increase response delay, discard pocket and mixed mechanism. These mechanisms based on result from fingerprint analysis, block spam data transmission in different degree, control the senders’transmit efficiency, and achieve the purpose of reducing its throughput. This thesis through experiments proves that, this technology is usable and effective. At present, the fingerprint analysis and send behavior control technology, which this thesis discussed, have been integrated to enterprise-class garbage information comprehensive reporting system.

节点文献中: