节点文献

云计算平台可信性增强技术的研究

Improving the Dependability of Cloud Computing Systems

【作者】 陈海波

【导师】 臧斌宇;

【作者基本信息】 复旦大学 , 计算机系统结构, 2008, 博士

【摘要】 云计算是当前计算模型的一次重要革新。通过将各种互联的计算资源进行有效整合并实现多层次的虚拟化与抽象,云计算有效地将大规模的计算资源以可靠服务的形式提供给用户,从而将用户从复杂的底层硬件逻辑,软件栈,与网络协议解放出来。目前,主要IT企业如Google,Microsoft,IBM,EMC,Amazon等纷纷推出其云计算解决方案。同时,学术界也不断的对云计算平台进行深入研究。云计算的一个关键需求是其基础架构与服务的可信性。其中包括:(1)平台的安全性,云计算平台必须提供可靠的安全保障技术,以防止来自网络的安全攻击;(2)可维护性,云计算平台必须有效地处理各种软硬件维护需求,从而有效地降低各种软件硬件维护对云计算服务的可用性造成影响;(3)可用性,云计算平台必须有效地屏蔽各种软硬件错误,从而为云计算服务提供24X7的可用性;(4)可信性,云计算平台必须确保用户的各种应用的可信性,从而避免包含商业机密的用户隐私数据和代码的泄露等。当前,云计算方兴未艾,针对云计算平台的可信性研究也在不断进行。目前,针对云计算系统可信性的研究主要可以分为以下几个方面:(1)计算机系统结构级的研究,致力于通过对当前处理器结构,设备,总线等进行扩展以提供硬件级的安全,容错等解决方案。例如,XOM通过对CPU增加加密模块,扩展缓存和寄存器等,以提高运行于处理器上的关键应用的安全性和隐私性;(2)操作系统级的解决方案,力图对当前操作系统的安全和可靠性进行增强,包括重新设计新的安全操作系统,以提高关键应用的安全性,隐私性与容错性。例如,Asbestos,HiStar等基于强制访问控制的操作系统通过对应用的信息流进行控制,以提高应用的安全性;(3)应用级的解决方案,力图提供在编译器、二进制翻译器以及程序语言层次的支持,以增强应用程序的可信性。例如Ginseng通过对编译器进行扩展,以支持软件的动态更新,从而提高软件的高可用性。LIFT通过在动态二进制翻译的过程中对数据流进行跟踪和控制,以提高应用程序的安全性。然而,当前针对计算机系统乃至云计算系统的可信性的研究尚存在着一些问题:大多数研究仅关注于系统的某一个方面,而较大地忽略或者牺牲了其他方面。例如,一些研究在提高计算机系统的安全性同时,却对于系统的功能,后向兼容性以及易用性方面带来较大的限制,使得其较难应用到实际系统中去。如Asbestos和Eros均不支持现有应用,而是要求用户将应用移植到这些系统并且限制了应用程序的功能。另外,一些系统在增强系统可信性的同时,却造成了较大的性能损失,从而限制了其在性能敏感场合下的应用。如LIFT对运行于其中的SPECINT-2000造成平均3.6倍的性能损失。此外,一些系统可信性增强技术需要特殊的软硬件支持,从而很难应用到通用的系统中去。如XOM需要相当数目的处理器扩展,而目前尚不存在提供这些扩展的处理器。本文在充分分析当前云计算平台软硬件与服务的可信性需求的基础上,提出了一个系统的解决方案,以提高云计算平台的可信性。本文将从计算机硬件,操作系统与应用级三个方面对云计算系统可信性进行研究,以提高云计算平台的可用性,可维护性,可信性,安全性与容错性等。相对之前的研究工作而言,该研究致力于提供一套实际、有效以及高性能的解决方案,从而使现有应用与系统能利用该研究的成果,并且对于当前的系统性能的影响降到最低。并且,本文提出的可信性增强技术不仅仅对于云计算具有重要意义,并且在其他应用场合如桌面应用等也具有较好的应用前景。具体而言,本文的主要贡献如下:1.首次提出基于猜测执行硬件的信息安全技术,设计并实现了SHIFT与BOSH系统以提高云计算平台中软件的安全性。SHIFT系统利用处理器对延迟异常的支持,设计并实现高效的动态信息流跟踪技术。该系统能有效地检测缓冲区攻击等底层攻击,同时还能有效地防御基于SQL注入攻击(SQLInjection)与跨目录遍历(Directory Traversal)等高层语义攻击。该系统的性能是目前所有基于动态信息流跟踪技术的实际系统中最好:对ApacheWebServer只有1%的性能损失,对SPECINT-2000也只有不到1.27X的性能损失。BOSH系统则有效地利用信息流跟踪技术对程序的控制流与数据流进行混淆,从而有效地防御代码注入攻击以及对软件版权进行有效保护。该系统能将程序的全部控制流进行混淆,并且在SPECINT-2000上只带来平均不到28%的性能损失。2.首次提出了基于双向写穿同步的动态软件更新技术,以支持对操作系统与多线程应用等复杂系统的动态更新。设计并实现LUCOS与POLUS动态软件更新系统以提高云计算平台中系统的可维护性与可用性。LUCOS系统首次提出了使用虚拟机监控器对操作系统进行动态更新,使得操作系统在不需要重启的情况下对操作系统的进行升级以及打补丁,从而有效的提高了操作系统以及运行于其上的可用性与可维护性。LUCOS对操作系统完全透明,并且造成的性能损失不到1%。POLUS是第一个支持对多线程应用进行数据结构动态更新的系统。POLUS能对当前主流的服务器软件(如vsftpd,sshd,Apache WebServer)在不同版本之间进行动态切换。并且,POLUS还提供特定编译器以支持自动生成动态热补丁。同时,POLUS所造成的平均性能损失不到1%。3.首次提出了并实现基于软件的动态虚拟化技术,以允许操作系统动态地在真实硬件与具备完整功能的虚拟机监控器上进行切换,从而在获得系统虚拟化带来的高可用性的同时保证系统性能。设计并实现的Mercury系统通过动态的将一个计算平台进行自虚拟化,即将一个虚拟机监控器动态地插入到一个运行的操作系统下,从而使一台物理机器具备迁移以及在线维护等功能,并且有效地避免了虚拟化所带来的性能损失。实验结果表明Mercury系统所带来的平均性能损失小于1%。4.首次提出了对云计算平台及其应用的双向行为约束机制(代号为Talos)。设计并实现CHAOS与Shepherd系统,分别实现对操作系统的行为约束,以防止恶意操作系统窃取应用的数据,以及对应用的行为约束,以防止云计算应用对操作系统与运行环境进行破坏。CHAOS系统使用虚拟机监控器保护云计算应用的隐私性,从而在操作系统与其他应用不可信的情况下保证应用的隐私数据不会被恶意泄露。Shepherd通过对云计算应用进程的系统调用进行权限审计,异常检测以及将应用对关键系统资源的修改进行隔离,以防止一个恶意的云计算应用对云计算平台的攻击。目前Talos可信计算框架已经被EMC的可信云计算平台Daoli采用作为其可信基础。

【Abstract】 Cloud computing is an appealing innovation of modern computing mode. By integrating networked computing resources and virtualizing them through different levels of abstractions,cloud computing provides users massive computing resources using a common interface.Further,cloud computing hides the complexity in deployment and management of hardware resources,software stack and networking protocols from users.Being aware of its importance,both researchers and industry have put significant efforts in cloud computing.One of the indispensable requirement of cloud computing systems is the dependability of the infrastructure as well as the running services.Specifically,a cloud computing system should satisfy the following criteria:(1) security,which requires the cloud system can protect both the computing systems and the running services;(2) maintainability,which requires that the system be easy to maintain, thus mitigate the impact of inevitable hardware and software failures and deployment of new features to suit business need;(3) availability,which demands the system be constantly operable and be providing correct services,even in the present of possible hardware and service failures.(4) trustworthiness,that the cloud system should provide trustworthy services that users’s code and data,which may contain business secrets,will not be improperly divulged and abused.Dependability has always been the major concern of computing systems and been the focus of both researchers and industry.The emerging cloud computing have put even more challenges on it due to the scale of a cloud computing system. A larger scale means more complexity in management and more probability to fail,thus less MTTF(Mean Time To Fail).Generally,previous research efforts can be categorized into three levels:(1) computer architecture level,which investigates solutions in enhancing the existing processors,memory and I/O systems to improving the dependability.For example,the XOM system,which extends the instruction set architecture,bus and registers to enhance the trustworthiness of the mission-critcial applications running on commodity operating systems.(2) operating system level,by enhancing the dependability of subsystems of existing operating systems,or even implementing new operating systems.For example, Nooks improves the dependability of driver subsystems in Linux while asbestos is an OS built from scratch to improve the security of applications.(3) applica- tion level,which aims to utilize advances in language,compiler,binary translator technologies to improve the security and availability of applications.For example, Ginseng is a dynamic update system by extending existing compiler while LIFT utilizes a binary translator to do taint tracking to defeat possible software attacks.However,there are still several problems in existing researches in dependability. Most currently research usually only focus on one aspect,while neglects or sacrificing other aspects:some architecture level solutions require non-trivial changes to the processor architecture,memory,and bus,which are not easy to be quickly commercially available,examples include the XOM trust system;some operating system solutions requires a new re-constructions of the existing software stack,breaking backward compatibility.For example,the Singularity and Asbestos provides completely new operating system abstractions,making existing application hard to benefit from them;some existing security systems bring prohibitive performance overhead,preventing their uses in production runs.For example,the TaintCheck system incurs up to 36X performance degradation while LIFT,currently the system with best performance for taint tracking systems, brings about 3.6X performance slowdown.Based on a detailed analysis on the requirement of the dependability in currently cloud computing system,this dissertation proposes a practical and systematic solution to improve various aspects of currently cloud computing system from different levels,while not sacrificing the performance and not mandating design changes to existing architecture,OS abstractions and applications.Specifically,the proposed solution is composed of the following key techniques and systems that solve different problems in different levels:1.Practical and efficient security enhancement by combining speculative execution and dynamic information flow tracking.Design and implementation of the SHIFT and BOSH secure systems based the idea.The SHIFT system leverages existing hardware support for deferred exception handling to implement a practical,efficient taint tracking systems.The SHIFT system is with be best performance among real-world taint tracking systems, with only 1%performance overhead to server applications,and about 1.27X performance overhead for SPECINT-2000.Based on the idea and design of SHIFT,the BOSH system further leverages the hardware support for taint tracking to support a low-overhead binary obfuscation scheme.BOSH can obfuscate the whole control and data flow of a program to defeat attacks that alter the control and data flow,as well as protecting software copyright,with only 27%performance overhead.2.The idea of a bi-directional write-through based synchronization scheme for dynamic updating operating system kernel and multi-threaded software,and the LUCOS and POLUS dynamic updating systems that embody the idea. LUCOS is the first system that uses virtual machine monitors to dynamically update the operating systems running on,with less than 1%performance slowdown.It is the first system that support updating Linux with changes to the data structure,without modifying the the Linux kernel.POLUS is the first system that support online switches of multi-threaded applications among different versions,both backwards and forwards,with less than 5%performance degradation.3.The Mercury on-demand virtualization system,that improve the availability of the cloud by tolerating possible hardware failures.Mercury supports dynamically inserting a virtual machine monitor beneath a running operating system and the uses the VMM to migrate the whole operating system environments to other node upon a machine failures.4.The Talos trust system infrastructure that provides behavior conformity to both the cloud computing platform and cloud application.Two systems implements Talos behavior conformity:CHAOS utilizes a VMM to protect the application running in a commodity(and untrusted) operating system, to prevent the code and data in a cloud application from being divulged and abused;Shepherd process shepherding system that prevents cloud services from attacking the cloud platform.The performance overhead in CHAOS and shepherd are also modest.

  • 【网络出版投稿人】 复旦大学
  • 【网络出版年期】2009年 08期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络