节点文献
面向Hadoop存储系统的节能优化技术研究
Research on Optimizing Energy Efficiency for Hadoop Storage System
【作者】 刘晨光;
【导师】 黄建忠;
【作者基本信息】 华中科技大学 , 计算机系统结构, 2012, 硕士
【摘要】 近年来,基于云计算的互联网服务不断涌现,其中MapReduce计算范式和HDFS分布式文件系统已逐渐成为开发大型数据密集型应用的首选模型。从硬件供应商的角度,这类应用部署的规模如此巨大,降低服务集群的功率消耗既可以显著降低运营成本,又能降低碳排放量,从而提高整体能效。在传统服务器节能策略的基础上,针对提供MapReduce作业服务的集群,提出一种节能优化算法。该算法能根据集群当前整体和局部的工作负载动态地重构节点或节点上的数据;同时,控制数据放置策略很好的支持上述操作。该节能优化算法具备了节能效果明显、实时性高以及负载均衡开销小等特性,可应用于数据密集型计算集群和企业数据中心等环境中。具体地,实现节能优化的能耗控制系统由数据分发模块、节点失效模块和能耗调节模块三个组件组成。数据分发模块通过对HDFS数据块分发和对应副本放置流程进行修改,实现人为控制数据块号到DataNode节点映射;节点失效模块使得HDFS具备容忍节点增加/缺失的功能;能耗调节器是提高能效的核心,包含两个线程,分别对应两种算法:dilution和enrichment。在集群的某个机架利用率高于管理员预定的阈值时,能耗调节模块会根据dilution算法适时地添加新节点,并将附近节点上的数据迁移到新节点上;当集群出现某个机架的利用率偏低时,能耗调节器可依据enrichment算法移除目标节点,并其上的数据迁移到附近节点上。从而,实现系统当前工作负载与系统性能的动态匹配。在最终测试方面,利用GridSim Toolkit对节能优化算法从功能和节能效果两个方面进行了测试和评估,前者主要验证了enrichment和dilution算法是否能在集群负载变化时重构节点或者数据;后者着重测试了在平均负载和低负载条件下节能的效果,并与传统Covering Set技术进行了比较。实验结果表明提出的节能优化算法在MapReduce计算中高负载情况下能达到30.32%的节能效果,在低负载情况下能达到69.77%的节能效果。
【Abstract】 With the recent emergence of cloud computing based services on the Internet,Mapreduce and HDFS have emerged as the paradigm of choice for developing large scaledata intensive applications. Given the scale at which these applications are deployed,minimizing power consumption of these clusters can significantly cut down operationalcosts and reduce their carbon footprint-thereby increasing the utility from a provider’spoint of view.This paper addresses energy conservation for clusters of nodes that run MapReducejobs. The algorithm dynamically reconfigures the cluster and the data on it based on thewhole and part cluster utilization. And it can also benefit from the controlling of datalayout. So much good features can be displayed in the algorithm, such as saving powersignificantly, strong real-time, and less load balancing cost, that it can be applied fordata-intensive clusters or enterprise data-centers.In the realization of the algorithm, the entity, which implements the algorithm, iscalled power controller. It is comprised of three modules, data locator, failure controller,and power tuner. First, data locator modified the data distributing and duplicating processof HDFS to customize the data layout. Second, failure controller allowed NameNode inHDFS to tolerate the failure of DataNode. Finally, power tuner is the core part of powercontroller, which contains two threads implement dilution and enrichment methodsrespectively. One of the threads implement the dilution mean, which adds nodes into theHDFS in case that utilization of one of racks rises above thresholds; Another one carry outthe enrichment mean, which retires nodes in the HDFS when utilization of one of racksfalls below thresholds.This paper use GridSim toolkit to simulate HDFS architecture and performexperiments. To verify whether the algorithm is able to reconfigurate the cluster inaccordance to the workload imposed on it, functional test was conducted; To check theenergy saving effect of the algorithm, back-to-back test with the traditional CS was carriedout. The paper evaluated the algorithm and the results show that the proposed algorithm achieves an evident energy reduction of30.32%under high workloads and up to69.77%under low workloads.
【Key words】 Energy-efficiency storage; Hadoop system; Data layout; Node mappingstrategy;