节点文献

多参数扰动的隐私保护关联规则挖掘算法研究

Research on Multi-parameters Perturbation Privacy Preserving Association Rules Mining Algorithm

【作者】 李威

【导师】 刘杰;

【作者基本信息】 哈尔滨工程大学 , 计算机软件与理论, 2010, 硕士

【摘要】 随着信息技术、网络技术、数据存储技术和高性能处理器技术的进步,数据资料的规模急速膨胀,从而促进了数据挖掘(Data Mining,DM)技术的产生和飞速发展。数据挖掘在不断的挖掘出知识和规律,为政府、企业和个人带来便利的同时,也不可避免的涉及到人们的隐私问题。同时,随着社会的进步,人们对隐私的重视程度也越来越高,这也给数据挖掘带来了新的困难。隐私保护数据挖掘就是为了解决隐私保护和数据挖掘之间的矛盾而产生和发展的。本文首先阐述了数据挖掘、关联规则挖掘的基本理论和隐私保护数据挖掘的主要技术。在此基础上,对隐私保护关联规则挖掘的经典算法MASK算法进行了深入浅出的介绍和分析,并对多参数扰动算法做了详细的研究。与MASK算法相比,多参数扰动算法提高了隐私保护度和数据挖掘的准确度,但多参数扰动算法的频繁项集还原部分仍存在时间效率不高的问题,尤其是随着项集的增大,这个问题变的越来越严重。针对这个问题,本文对多参数随机扰动算法进行了较深入的研究,并根据该算法频繁项集还原模型的特点提出了两个改进的方法。方法一把求解转换矩阵逆矩阵的过程由两步变为一步,从而提高了时间效率。方法二由要求出转换矩阵逆矩阵的所有元素变为只求出转换矩阵逆矩阵的首行元素,从而又进一步提高了时间效率。最后通过理论分析和实验结果,表明方法一改进后的算法的时间效率高于原算法的时间效率,方法二改进后的算法的时间效率高于方法一改进后的算法的时间效率。另外,方法二改进后的算法在空间效率方面比原算法也有一定的改进。因为各种多参数扰动算法的频繁项集还原模型是一样的,所以对多参数随机扰动算法的改进也可以应用到别的多参数扰动算法上。

【Abstract】 As information technology, network technology, data storage technology and high-performance processor technology advance, the size of data expands quickly, and all these contribute to the generation and the rapid development of the data mining technology. While data mining helps government, enterprises and individuals to get knowledge and rules, and brings benefits to them, it inevitably involves people’s privacy. Meanwhile, with the progress of society, people pay more and more attention to privacy, and new difficulties was brought to data mining. In order to get across the gap between data privacy and data mining, privacy preserving data mining emerged and has been developing quickly.Firstly, the basic theory of data mining, association rules mining and the main technology of privacy preserving association rules mining is described in this thesis. Then, one of the classic privacy preserving association rules mining algorithms-MASK algorithm is introduced simply, and multi-parameters perturbation algorithms are studied carefully.Compared with MASK algorithm, multi-parameters perturbation algorithms improve the degree of privacy preserving and data mining accuracy. However, the time-efficiency of restoring the frequent itemsets in multi-parameters perturbation algorithms is still not high, and the problem becomes more and more serious with the increase of itemset. Addressing this issue, multi-parameters randomized perturbation algorithm is dissected carefully. Two methods are proposed in this thesis to improve the time efficiency of multi-parameters randomized perturbation algorithm according to the characteristics of the model to restore frequent itemsets. The first method improves the time efficiency by merging the process of inversing the transformation matrix from two steps into one step. The second method improves the time efficiency further by getting the elements of the first line of the inversed matrix of transformation matrix, while the first method need get all the elements of the inversed matrix of transformation matrix. Finally, both theoretical analysis and experimental results indicate that the first improved algorithm is more efficient than the original algorithm and the second improved algorithm is more efficient than the first improved algorithm. In addition, the second improved algorithm is more space-efficient than the original algorithm. Because the models of restoring frequent itemsets for a variety of multi-parameters perturbation algorithms are same, so the improvements can also be applied to other multi-parameters perturbation algorithms.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络