节点文献

面向数据发布的隐私保护技术研究

Research on Privacy Preserving Technology for Data Publishing

【作者】 程冕

【导师】 苏金树;

【作者基本信息】 国防科技大学 , 计算机科学与技术, 2018, 博士

【摘要】 我们正生活在一个大数据的时代,大数据给我们带来了新的机遇,但很多数据通常都表现出独特的行为模式和敏感信息,对原始数据的处理可能会侵犯其拥有者的隐私。美国在线(AOL)公司将搜索日志发布所带来的隐私泄露问题是一个典型的反面例子,其中部分用户的身份信息很快就被一家通讯社的记者重新识别并公布。因此,如何在数据发布前进行有效处理,在保持隐私性,安全性的同时提高数据集的可用性,对个人、企业保护其数据的敏感信息具有重要意义。已有的隐私保护研究主要集中在静态数据集的单次发布,但一方面现实生活中产生的数据更多地呈现动态变化的趋势,变化前后数据的敏感属性之间存在可能的推理关联,这将导致敏感信息的泄漏;另一方面,由于数据量规模的迅速增长,数据类型的繁多,越来越多的机构希望与他人共享各自的数据,从而获得更准确的处理结果,这也暴露出了更多的安全和隐私问题,例如如何保证通信安全以及防止多用户勾结进行攻击。因此,本文主要针对动态数据流发布和多用户协同数据发布中的隐私保护问题展开研究,主要贡献和创新点包含以下几个方面:(1)提出并实现了一种面向动态数据集发布的隐私保护策略定位技术的快速发展和基于位置服务的广泛应用,产生了大量与移动用户相关的轨迹数据集。为了避免轨迹数据集发布产生的隐私泄漏风险,本文基于(K,C)L隐私模型及滑动窗口机制,设计并实现了一种带敏感属性的动态轨迹数据流隐私保护机制TA-SA,能够实时对用户的轨迹信息进行匿名处理,于此同时保护用户的敏感属性不被泄露。该机制具备以下三个特点:一、这是一种用于动态轨迹数据流的匿名发布方法,能够有效避免动态数据发布中用户的轨迹信息因产生关联而导致隐私泄露的风险;二、提出了一种基于FP-tree的滑动窗口模型,有效降低了滑动窗口对存储空间的需求,同时提高了匿名算法查找轨迹数据的效率;三、添加了敏感属性的匿名方法,数据拥有者能够在保护用户轨迹信息的同时保护其相关敏感属性的隐私。(2)提出并实现了一种面向数值型动态数据集发布的差分隐私保护策略相对于直接发布个体属性的数据集,数值型统计数据集的发布是一种更为常见的发布类型。为了进一步提高数据集发布结果的隐私性并降低敏感泄漏的风险,本文针对大规模数据环境下攻击者背景知识难以衡量的特点,利用差分隐私保护技术的强保护性特征,提出了面向数值型动态数据集发布的隐私保护机制EG-Privacy。该机制能够对数值型动态数据集进行聚合发布,同时保证恶意攻击者无法从发布结果中推断出任何用户身份相关信息。该机制具有以下特点:一、完全脱离对恶意攻击者可能拥有背景知识范围的假设,差分隐私保护技术使得发布结果具有相当强的匿名性;二、基于w-event滑动窗口模型,实现了无限数据流的差分隐私发布,解决了传统算法中差分隐私机制难以应用于连续数据保护的问题;三、实现了一种针对数据加噪的分组优化策略,令数据根据变化程度大小进行弹性的加噪处理,进一步提高了数据发布结果的有效性。(3)提出并实现了一种面向协同数据集发布的隐私保护策略随着数据量的不断增多,单一数据拥有者的数据发布有时难以展现有价值的结果,更有效的方法是联合拥有相似数据的其他数据拥有者进行协同发布,然而多用户之间的协同数据发布也带来了更多的隐私问题。本文基于m-privacy模型,设计并实现了一种无可信第三方环境下多用户数据协同发布的匿名隐私保护机制MK-A,能够在可信第三方不存在的情况下实现多个数据拥有者联合共享各自的数据,同时保护数据的敏感信息不被泄露。该机制具有以下特点:一、有效解决了多用户间的合谋威胁,即使存在若干个数据拥有者同时也是恶意攻击者,也能保证最后发布的结果满足隐私保护的需求;二、设计并实现了一种不可信环境下多用户的安全数据交互协议,数据提供者通过对准标识符和敏感信息采取不同的匿名化传输方式,从而确保攻击者无法将准标识符内容关联至敏感信息。(4)提出并实现了一种面向数值型协同数据集发布的差分隐私保护策略多用户环境下的协同数据集发布虽然能够提供有价值的信息,但同时也增大了用户敏感信息被攻击识别的可能性,因此我们希望能够规避对攻击者可能拥有的背景知识的猜测。针对上述问题,本文采用分布式差分隐私技术以及安全多方计算协议,设计并实现了一种面向数值型数据集的多用户数据协同发布策略DFTA。该机制具备以下特点:一、将差分隐私技术应用于协同数据发布环境中,解决了多用户环境下攻击者背景知识更难以衡量的问题;二、采用多方安全计算协议实现多用户在安全环境下共同完成数据计算,同时避免任何一方的原始输入内容被其他用户获取;三、为应对数据拥有者可能出现意外故障的情形(如突然离线),设计并实现一种故障反馈机制,能够在不重启协议的情况下继续完成数据交互。

【Abstract】 We are living in an era of big data.Big data has brought us new opportunities.However,many data often show unique patterns of behavior and sensitive information.The processing of raw data may infringe on the privacy of its owners.The problem of privacy leakage brought about by the United States online service company’s release of search logs is a typical negative example,in which some of the users’ identity information was quickly re-identified and announced by a news agency correspondent.Therefore,how to effectively handle data before it is released,and to increase the availability of data sets while maintaining privacy and security,is of great significance for individuals and enterprises to protect their sensitive data.The existing privacy protection research mainly focuses on the single release of static data sets,but on the one hand,the data generated in real life presents more dynamic trends,and there are possible inferences between the sensitive attributes of the data before and after the change.This will lead to the leakage of sensitive information.On the other hand,due to the rapid increase in the volume of data and the variety of data types,more and more organizations want to share their data with others to obtain more accurate processing results.This also reveals more security and privacy issues,such as how to secure communications and prevent multi-user collusion attacks.Therefore,this thesis focuses on the issue of privacy protection in dynamic data stream publishing and multi-user collaborative data publishing.The main contributions and innovations include the following aspects:(1)We proposed and implemented a privacy protection strategy for the release of dynamic data setsBased on(K,C)L privacy model and sliding window mechanism,this paper designs and implements a privacy protection mechanism TTA for dynamic trajectory data stream with sensitive attributes.It can process the user’s trajectory information in real-time and protect it at the same time.The user’s sensitive attributes are not leaked.The strategy has the following three characteristics: First,it is an anonymous publishing method that can be used for high-dimensional trajectory data flow,which can effectively avoid the risk that a large number of users’ trajectory information will be leaked due to possible association;Second,the use of sliding The window backtrack mechanism,which processes the data of the current window and combines historical data to make more optimized judgments,further expands the scope of privacy protection;Third,added an anonymous method of sensitive attributes,the data owner can protect user trajectory information at the same time Protect the privacy of its sensitive properties.(2)We proposed and implemented a differential privacy protection strategy for the release of numerical dynamic data setsCompared to a data set that directly publishes individual attributes,the publication of numerical statistical data sets is a more common type of publication.In order to further improve the privacy of data set publishing results and reduce the risk of privacy leakage,this paper addresses the insurability of attacker background knowledge in large-scale data environments and uses the strong protection of differential privacy protection technology to propose a group-based optimization strategy.Dynamic data stream privacy publishing mechanism G-Privacy.It can aggregate and publish numerical dynamic data sets,and at the same time ensure that malicious attackers cannot infer any user identity related information from the publishing results.The mechanism has the following characteristics: First,it completely breaks away from the assumption that a malicious attacker may have a range of background knowledge,differential privacy protection technology makes the publishing result has a strong anonymity;Second,using the w-event privacy model to achieve online unlimited data The privacy release of streams avoids the situation in which traditional algorithms cannot continue to be effectively protected over time.Third,the conditionalized packet-plus-noise mechanism allows the data to be processed with different levels of noise according to the degree of change,further improving the data.The validity of the published result.(3)We proposed and implemented a privacy protection strategy for collaborative data set publishingBased on the m-privacy model,this paper designs and implements an anonymous privacy protection mechanism,MK-A,which can be released cooperatively without multiuser data in a trusted third party environment.It can achieve multiple data ownership in the absence of trusted third parties.Jointly share their own data,while protecting the sensitive information of the data from being leaked.The mechanism has the following characteristics: First,it effectively solves the collusion threat between multiple users,and even if there are several data owners who are also malicious attackers,it can ensure that the results of the final release meet the needs of privacy protection;Second,the realization of the untrusted In the context of multi-user security data interaction in the environment,the data provider adopts different anonymized transmission methods by aligning identifiers and sensitive information,thereby ensuring that the attacker cannot associate the quasiidentifier content with sensitive information;thirdly,the semi-honest model is abandoned.This mechanism discusses privacy protection strategies under a full-fledged environment and is closer to reality.(4)We proposed and implemented a privacy protection strategy for the release of numerical collaborative data setsIn view of the characteristics of the above-mentioned digital statistical data sets,this paper designs and implements a multi-user data cooperative publishing strategy DFTA for aggregated data sets using distributed differential privacy technology and secure multiparty computing protocols.The mechanism has the following characteristics: First,the differential privacy technology is applied to the collaborative data publishing mechanism to solve the problem that the attacker’s background knowledge is more difficult to measure in a multi-user environment.Second,the multi-user security computing protocol is used to implement multi-users in a secure environment.Completion of data calculations together,while avoiding any party’s original input content being acquired by other users;Third,in response to data owners may be unexpected failure situations(such as sudden offline),design and implement fault feedback mechanism,can not restart the agreement Continue to complete the data interaction under the circumstances.

节点文献中: