节点文献

面向隐私保护的关联规则挖掘研究

Research on Privacy Preserving Association Rules Mining

【作者】 盛荣华

【导师】 郑建国;

【作者基本信息】 东华大学 , 管理科学与工程, 2012, 硕士

【摘要】 数据挖掘目前是数据库研究中最活跃的分支之一,不论科学研究还是商业应用,数据挖掘都取得了可喜的成果。但与此同时,数据挖掘也面临着很多问题的挑战。其中,数据挖掘的个人隐私与信息安全问题尤其得到关注。误用、滥用数据挖掘可能导致用户数据特别是敏感信息的泄漏,越来越多的人们对此表示担忧,甚至拒绝提供真实的数据。如何在不暴露用户隐私的前提下进行数据挖掘,也就成了人们非常感兴趣的课题。本文针对关联规则挖掘中的隐私保护问题进行研究。首先介绍了相关背景知识,对现有的隐私保护关联规则挖掘作了分析和介绍。接着详细阐述并分析了典型的Apriori算法。以及对隐私保护关联规则挖掘算法MASK算法作了详细介绍,并且对MASK算法和Apriori算法在运行时间上作了个比较;针对MASK算法其存在的问题及其原因进行了详细分析。在此基础上,从隐私保护对象为原始数据集的角度出发,针对关联规则挖掘中如何保护隐私数据信息的问题,首先从数据存储结构角度进行改进,利用数学集合理论,改变数据存储方式,从而减少了重构原数据支持度过程中的扫描数据库的数目,消除了重构原数据项支持度的指数复杂度,并给出了其描述;其次从概率变换矩阵角度出发,采用随机参数扰动方法对数据进行歪曲,然后对概率矩阵进行变换,再进行关联规则的挖掘,并使用传统隐私保护度评价方法与矩阵变换的方向隐私保护度相结合的方法评价变换的隐私保护度。有效地解决了按照一般的隐私保护度的评价方法会产生一些特殊值与实际值不符的情况,以及在数据集容量很大的情况下运算量大的问题。通过理论分析和实验论证,证明了该方法具有很好的隐私性、高效性和适用性。本文最后将基于改进的隐私保护关联规则挖掘算法应用到协同商务知识共享中,分析了算法的应用背景,然后详细说明了算法的应用过程,并对算法的应用情况作出了初步的评价。

【Abstract】 Data mining has long been an active area of database research.In the field of science research or business application, data mining both has gained pleasing achievement, however, accompanying such benefits are concerns about information privacy.Because of these concerns, some people might decide to give false information in fear of privacy problem, or they might simply refuse to divulge any information at all.So privacy is an important issue in data mining and knowledge discovery.Design and analysis of privacy preserving data mining is meaningful and has attracted much interest in this field.In this thesis, the author studies privacy preserving association rules mining. First introduces the relevant background knowledge and analyzes and introduces the existing typical privacy preserving association rules mining, and then analyzes the characteristics and limitations of a typical algorithm called Apriori. As well as describing privacy protection association rule mining algorithm called MASK algorithm in detail, and makes a comparison between MASK algorithm and Apriori algorithm in running time; MASK algorithm for the existence of problems and their causes are analyzed in detail.On this basis, Object from the privacy point of view of the original data set for mining association rules on how to protect data privacy issues, first from the perspective of improving the data storage structure, the use of mathematical set theory, changes the data storage means, thereby reducing the reconstruction of the original data support the process of scanning the number of databases, eliminating data entry support reconstruction of the original complexity of index, and its description is given. Then, in the transformed dataset, first performs cluster analysis to obtain normalized data, and then mines association rules, and evaluates the privacy preserving degree of the matrix transformation using the combination of the traditional evaluation method and the direction privacy preserving degree. The algorithm has solved some problems effectively, such as the problem that some special values are inconsistent with the actual values according to the traditional evaluation method and the computational problems when handling large data sets. Theoretical analysis and demonstrations show that the method in this paper has very good privacy, efficiency and applicability.The thesis concludes with an application of the algorithm in this paper in knowledge sharing of collaborative commerce. It analyzes the application background of this algorithm, and then presents elaborate detailed application process of the algorithm, puts up a preliminary evaluation of the results.

  • 【网络出版投稿人】 东华大学
  • 【网络出版年期】2012年 07期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络