节点文献

一种基于数据挖掘技术的垃圾短信用户预识别方法

A Pre-recognition Method for Junk Message Filtering Based on Data Mining

【作者】 张婷

【导师】 马义德; 张国雄;

【作者基本信息】 兰州大学 , 电子与通信工程(专业学位), 2013, 硕士

【摘要】 近年来随着移动通信业务的飞速发展,手机短信越来越成为人们通信交流的主要方式。但是在短信业务发展的同时大量的垃圾短信也随之涌现,对手机用户造成了日常生活的干扰。垃圾短信由于其隐藏性较强,发送的内容、文字形式、发送的频率多变,使得对其进行识别比较困难,单纯的依靠关键字或流量监控已经不能满足运营商对垃圾短信过滤的要求。目前主流的垃圾短信过滤主要是通过文本分析,使用黑白名单或基于机器学习的方法过滤,这些方法都是针对垃圾短信本身进行分析。无论是根据用户反馈还是主动拦截,均属于事中拦截或事后限制。当运营商采取措施时,已经有大量的垃圾短信流向用户,如果能对发送垃圾短信行为进行预测,将垃圾短信发送扼杀在初期,则能大幅降低垃圾短信发送量。本文选择了一种基于决策树分割的训练模型,通过客户入网属性,客户通信行为信息、客户账单信息等多个维度构建模型,对垃圾短信号码进行识别,形成垃圾短信号码高风险名单。相比传统基于短信内容识别、发送量控制的事中控制,本系统能够进行垃圾短信发送行为预测,配合垃圾短信拦截系统将垃圾短信在未形成大规模发送前拦截。实验结果证明该模型能够有效的识别垃圾短信号码,对监控系统拦截垃圾短信起到很好的辅助作用。本文的第一章对研究背景以及目前主流的垃圾短信过滤方法进行了概述,阐明研究的意义;第二、三章主要介绍算法所需数据集的准备及处理过程;第四章主要介绍了模型的原理及建立过程;第五章对实验结果进行分析,评价该算法的准确度及效率;最后一章进行了总结,指出算法的不足以及对未来进行展望。

【Abstract】 With the rapid development of wireless communications services, SMS service become more and more important in everyday life.Meanwhile, large junk message emerge with the short message service development, it disturb the daily life of mobilephone users.Junk message has strong gender of concealment, it’s content,written form and the frequency of message sending kept changing, so it is very difficult to find out. Depends on tranditional method of monitoring key words and traffic could not meet the operator’s requirments.Nowadays the text analysis,black or white list filtering and machine learning are mainstream methods of junk message filtering, compare with these methods of analysing the message content, this paper select a new way which based on decision tree, through customer attribute, customer behavior and other dimension to construct the model, then find out the telephone number which send the junk message, formed a high risk telephone number list. The result show that the method can effectively identify the telephone number which send out the junk messages, this model will be beneficial to supporting the monitoring system.This article the first chapter introduce the background of junk message filtering and the main methord to filter the junk message; The second and third chapter introduce the algorithm and the process of prepare data; The fourth chapter introduce the principle of module and how to construct it;The fifth chapter is mainly analysing the result, estimate the efficiency and veracity; The last chapter make a summary,point out the fault of algorithm and prospect for the future.

  • 【网络出版投稿人】 兰州大学
  • 【网络出版年期】2013年 11期
  • 【分类号】TP311.13;TN929.53
  • 【下载频次】155
节点文献中: 

本文链接的文献网络图示:

本文的引文网络