节点文献
面向协同过滤的推荐攻击特征提取及集成检测方法研究
Research on Feature Extraction and Ensemble Detection Approaches for Recommendation Attacks in Collaborative Filtering
【作者】 周全强;
【导师】 张付志;
【作者基本信息】 燕山大学 , 计算机应用技术, 2013, 博士
【摘要】 协同过滤推荐系统能够依据建立的用户概貌,过滤出用户感兴趣的信息并主动推荐给用户,可以有效解决互联网上出现的“信息过载”问题,已经被广泛应用在电子商务等诸多领域。然而,由于协同过滤推荐系统自身所具有的开放性,攻击者出于商业竞争等目的,人为地向系统注入大量虚假的用户概貌,企图使系统产生对他们有利的推荐结果。这种“托”攻击或推荐攻击给协同过滤推荐系统带来了极大的安全隐患。为了消除推荐攻击产生的安全隐患,关于推荐攻击检测方法的研究受到广泛关注。本文在对国内外研究现状综合分析的基础上,进一步对推荐攻击特征提取及检测方法进行了深入探讨。首先,针对已有专用特征提取方法不能有效描述已知类型推荐攻击的问题,通过引入Hilbert-Huang变换、词频-逆向文档频率和互信息,提出一种推荐攻击专用特征提取方法。在分析已知类型推荐攻击的基础上,利用Hilbert-Huang变换、词频-逆向文档频率和互信息,提取已知类型推荐攻击的专用特征,作为检测已知类型推荐攻击的基础。其次,针对已有通用特征提取方法不能有效描述未知类型推荐攻击的问题,通过引入信息熵,提出一种推荐攻击通用特征提取方法。从用户评分分布的角度,利用信息熵提取未知类型推荐攻击的通用特征,作为检测未知类型推荐攻击的基础。再次,针对已有有监督检测方法误报率太高的问题,提出一种基于支持向量机的推荐攻击集成检测方法。利用上述提出的专用特征提取方法提取用户概貌的特征,利用随机采样技术生成有差异的基训练集,利用生成的基训练集训练支持向量机生成基分类器,对测试数据进行检测,采用多数投票机制融合基分类器的检测结果。然后,针对已有检测方法不能有效检测未知推荐攻击的问题,提出一种基于仿生模式识别的未知推荐攻击集成检测方法,利用上述提出的通用特征提取方法提取用户概貌的特征,利用仿生模式识别技术覆盖真实概貌样本,将覆盖范围之外的用户概貌判断为攻击概貌,在此基础上,通过调整覆盖范围的大小生成基分类器,检测测试数据,采用多数投票机制融合基分类器的检测结果。最后,在MovieLens数据集上与相关工作进行了实验对比,验证了所提方法的有效性。
【Abstract】 Collaborative filtering recommender systems can filter out the information to satisfythe users’ interests according to the established user profiles and recommend theinformation to users actively. They can solve the information overload problem on theInternet effectively, which have been widely used in many fields, e.g., e-commerce sites.Due to their natural openness, however, attackers artificially inject a large number of fakeprofiles into a collaborative filtering recommender system in order to bias therecommendation results to their advantage. These "shilling" attacks or recommendationattacks bring great security risk to collaborative recommender systems. To reduce thesecurity risk produced by recommendation attacks, the detection approaches forrecommendation attacks have attracted widespread attention. On the basis ofcomprehensive analysis for the current research in this area, this paper has conductedfurther deep research on the feature extraction methods and detection approaches forrecommendation attacks.Firstly, aiming at the problem that the existing special feature extraction methods cannot describe the known recommendation attacks effectively, through introducingHilbert-Huang transform, term frequency-inverse document frequency, and mutualinformation a special feature extraction method for the known recommendation attacks isproposed. Based on the analysis of known recommendation attacks, Hilbert-Huangtransform, term frequency-inverse document frequency, and mutual information are usedto extract special features for these attacks. The extracted special features are used as thebasis of detecting known recommendation attacks.Then, aiming at the problem that the existing general feature extraction methods cannot describe the unknown recommendation attacks effectively, through introducingentropy a general feature extraction method for the unknown recommendation attacks isproposed. From the perspective of user rating distribution, entropy is used to extractgeneral features for the unknown recommendation attacks. The extracted general featuresare used as the basis of detecting unknown recommendation attacks. Next, aiming at the problem that the existing supervised detection approaches sufferfrom high false alarm ratio, an ensemble detection approach based on support vectormachine is proposed. The above proposed special features extraction method is used toextract features of user profiles. The bootstrap technique is used to generate the diversebase training sets. The generated base training sets are used to train support vectormachine to generate the base classifiers. These classifiers are used to detect the test sets.The majority voting strategy is used to integrate the detection results of the baseclassifiers.After that, aiming at the problem that the existing detection approaches can not detectthe unknown recommendation attacks effectively, an ensemble detection approach basedon bionic pattern recognition is proposed. The above proposed general features extractionmethod is used to extract features of user profiles. The technique of bionic patternrecognition is used to cover the samples of genuine profiles. User profiles outside thecoverage are judged as attack profiles. On this basis, through adjusting the area of thecoverage the base classifiers are generated for the detection of test data. The majorityvoting strategy is used to integrate the detection results of the base classifiers.Finally, the comparative experiments are conducted with the related work onMovieLens dataset. The effectiveness of the proposed approaches is verified.
【Key words】 collaborative filtering; recommendation attacks; attack detection; ensemblelearning; support vector machine; bionic pattern recognition;