节点文献

Web挖掘在检测网络广告欺诈行为中的研究与应用

Application of Web Mining Technology to Click Fraud Detection

【作者】 李爱春

【导师】 滕少华;

【作者基本信息】 广东工业大学 , 计算机应用技术, 2011, 硕士

【摘要】 随着互联网的发展,网络广告已经成为一种新的市场推广手段。各行各业的市场人员通过多姿多彩的网络广告宣传自己的产品和品牌的同时,也为这些广告支付广告费用,其中按点击付费广告是目前互联网界简单易行且流行的广告计费方式,它以每次网页上的广告被点击并连接到相关网站或者详细内容页面为基准的网络广告收费模式。点击欺诈(Click Fraud)存在于网络广告的按点击付费模式中,当一个人对广告本身没有兴趣,而只是为了某种利益,采取手动或者利用计算机程序的方式模仿正常用户点击广告时,点击欺诈便产生了。点击欺诈的出现和泛滥,极大地危害了互联网的健康发展。本文主要是研究Web挖掘应用于网络广告中的点击欺诈,针对国内外有关点击欺诈检测方法进行深入研究,结合Web挖掘的离群点挖掘、多元线性分析、时序分析等算法,设计了一套基于Web挖掘的网络广告欺诈点击检测模型,同时系统地介绍了该模型的检测体系。此检测体系分为两大步:初步评估、评估修正。初步评估主要是根据当前点击流和短时间内点击流进行分析,然后给出此点击的初步评估分,并反馈到前台。评估修正主要的工作是利用Web挖掘技术对初步评估进行修正和预测。在数据处理上,首先对数据进行预处理,由于采集过来的数据属性标识的很明确,我们需要做的有数据清洗、会话识别、属性选择、格式转换、归一化等操作,但由于我们采集的数据集有服务器日志和脚本点击流两部分组成,所以我们还需要完成数据整合的任务,同时还要完成数据补充和校对的功能。在算法上,首先分离出离群点,然后对这些离群点单独分析,而对于新进来的数据我们需要结合历史数据集进行多元线性回归分析,从而预测出可能是点击欺诈行为的数据,通过修正初步评估分把预测结果反馈到前台。前台是相对于服务器而言的,包括网站主、广告主和广告联盟。通过本文涉及的点击欺诈检测模型能有效检测或屏蔽各类点击欺诈行为,有效屏蔽无意识的无效点击,并且在不影响广告展示速度的基础上显著提高检测点击欺诈的效率。本文通过多组实验对检测模型进行了测试,并对实验结果进行了对比和分析。实验结果也表明,本文提出的解决方案可以有效检测采用手动或者利用计算机自动点击程序的方法模仿正常用户进行点击欺诈的行为,从而证明了该模型的可行性和方案的有效性。本文最后对论文阐述的内容做了简要总结,针对欺诈点击检测的发展趋势和发展方向做出展望,对本文的检测脚本、用户识别、挖掘算法、后续分析等不足之处进行了分析探讨,这些都将成为下一步继续研究的工作重点。

【Abstract】 With the development of the Internet, online advertising has become a new marketing tool. When many marketers of life through the colorful online advertising promote their products and brands, they also need pay for these ads, Cost-Per-Click advertising is easy and popular way of advertising billing, which is marked when the online advertising of web page is clicked and linked to relevant websites or details of the advertising. Click fraud is existed in Cost-Per-Click model of the online advertising industry, click fraud will be occurred when a person takes manual or uses computer program to imitate a legitimate user of web browser clicking on an ad’s link, and who is not interested in the ad’s link itself, but merely to gets some benefit. The emergence and proliferation of click fraud have hindered greatly the healthy development of the internet advertising industry.The purpose of this pager is to study the application of web mining technology to the click fraud of online advertising, This pager designed a click fraud detection model of online advertising based web mining algorithm, which the detection mode is referenced the methods of domestic and foreign research, and combined with Web mining outliers mining, multivariate linear analysis, timing analysis and etc.. Then systematic introduction to the detection system of the model. The detection system is divided into two steps:preliminary assessment, assessment modification. The preliminary assessment analyzed the data mainly based on the current click stream and the click stream of a short time, and then given a point of preliminary assessment of the click and feedback to the foreground. The main work of assessment modification is using Web Mining algorithms to correct and predict the preliminary assessment. In the data processing, first of all, the data need to preterit, because the collected data is regular, we need to do data cleaning, session identification, attribute selection, format conversion, normalization, etc, but since we collected server log data sets and script click stream, we also need to complete the task of data integration, complete and proofread the data sets. In the algorithm, firstly, we need to isolate the outliers, and then need a separate analysis for these outliers; the new incoming data need to run multiple linear regression analysis with historical data sets, the result of detection may be click fraud, and then feedback to the foreground. The foreground is relative to the server, including the site owners. advertisers and ad network.The detection model can detect or shield effectively various types of click fraud, and shield effectively the unconscious invalid clicks, and improve significantly the efficiency of click fraud detection based on no affect of the rate of ads showing. In this paper, several experiments were tested on the detection model; the experimental results were compared and analyzed. The experimental results also show that the proposed scheme could be effectively detected the click fraud of the persons who took manual or used computer program to imitate a legitimate user of web browser, the feasibility of the model and the effectiveness of the scheme is proved.Finally, the paper has described a brief summary for the contents of the paper, has prospected the development trend of click fraud detection, and has analyzed and discussed on the deficiencies of detective scripts, user identification, mining algorithms, follow-up analysis and etc, which will be the next steps.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络