节点文献

图形处理器通用计算的功耗分析与优化研究

Power Analysis and Optimization of the General Purpose Computing of Graphics Processing Unit

【作者】 王海峰

【导师】 陈庆奎;

【作者基本信息】 上海理工大学 , 系统分析与集成, 2013, 博士

【摘要】 随着大数据时代的来临,在各种计算机研究领域都需要进行大规模数据实时处理,如社会计算中的实时数据分析、网络安全中的异常流与内容检测、图像处理中的海量视频分析等。由于图形处理器GPU通用计算的大力发展,并且GPU非常适合处理数据密集型的计算任务,因此GPU或GPU集群已经成为大规模数据实时处理的重要解决方案。在实时处理大规模数据时,能量成为一种需要关注的计算资源,对计算可靠性和系统扩展性起约束作用。因此有效的能耗管理和优化成为GPU通用计算中函待解决的问题,也是绿色节能计算的要求。本文主要围绕GPU通用计算中能耗管理及优化这一中心目标,从GPU计算能耗测量、功耗预测、单节点并行处理策略的能耗分析到GPU集群能耗的实时控制,层次分明地研究GPU通用计算中的能耗问题。GPU能耗测量和预测是能耗管理与优化的基础,能耗优化和实时控制是整个研究的重点,在能耗优化的过程中保证计算性能和可靠性的损失最小是课题的难点。综合分析图形处理器通用计算中的关键技术,以GPU体系结构的发展趋势为线索,详细讨论通用计算编程模型、存储模型、通信模型、负载均衡等重要方面的研究内容、方法、工具,为GPU及GPU集群能耗优化和可靠性研究奠定基础。在此基础上进行了下列创新工作:1.提出两种GPU通用计算程序能耗预测方案,第一种方法分析中间语言PTX指令的能耗特征,根据通用计算程序的结构特点展开循环指令并统计PTX指令数,由此预估应用程序的计算能耗,这是一种简单可行、普适性强的预测方案。第二种方法则从源代码层次分析,用程序切片法分解程序并以非线性回归和小波神经网络的方法建立预测模型。该方法创新点是进一步区分应用程序的结构,分别为分支稀疏和分支稠密的应用程序建立预测模型,提高了应用程序的预测准确性。2.以处理大规模实时数据为背景,针对单GPU节点的计算提出一种通用性较好的并行处理策略,可以应用到各种实际算法中,文中以复杂网络聚类算法作为一种典型应用来验证并行策略的有效性。对这两种并行处理策略进行计算能耗分析,并提供了各种适合的应用场景。此外,提出解决单GPU节点计算的故障检测及容错机制,在保证计算可靠性的前提下优化能耗。3.提出一种针对GPU集群能耗优化控制系统,该控制系统以模型预测为核心控制策略,能够适应动态计算负载的变化,实时调整GPU能耗状态来消减计算过程中的冗余能耗。构建网络诱骗系统获得实际的网络入侵数据,并以此作为实际工程数据对能耗控制系统进行应用验证。4.以最大熵函数产生了组合计算性能、可靠性及能耗的综合控制指标,以此改进能耗优化控制系统,改善能耗状态调整机制对计算可靠性造成的损失。该方法突破了传统的多目标转化为单目标方法的局限性,能够正确辩识候选解的优劣,动态调整GPU集群的工作状态,使其达到计算性能最优、稳定性最好、能耗最低的目标。

【Abstract】 With the development of Big Data the real-time processing of Large-scale streamsdata appears various application fields, such as social network analysis, abnormalstreams detection and abnormal context detection in network security and video qualityanalysis. Due to the general-purpose computing development of GPUs and the fact thatdata intensive computing is very suitable to the GPUs, both single GPU and GPUclusters have become significantly parallel computing schemes to process theLarge-scale real-time streams. Energy is an important computing resource in thereal-time processing that limits the system reliability and extensibility. So the powerconsumption management and optimization need to be solved imminently. This workbelongs to the green computing field.This paper mainly focuses on the power consumption management and optimization.We study the computing power consumption and optimization from power measurement,power consumption prediction, power-aware parallel strategies to GPU cluster powerconsumption control. Power measurement and prediction are the basic issue of thepower consumption management and optimization. The mainly research work is poweroptimization and real-time control and the difficult point is the tradeoff between thecomputing performance and the reliability. We firstly summarize the key techniques inGPGPU and discuss study methods and tools in program model, memory model,communication model and load balance based on the development of GPUs architecture.This work supports the power consumption optimization and research about the systemreliability. The contributions of this paper include as follows.1. We propose two different power consumption prediction schemes. The first one isto analyze power consumption feature from the PTX level and to count the dynamicinstruction number by unrolling the simple loop structure. This approach is simple andgeneral prediction model. The second prediction model is based on program slice fromthe program source code level. This method firstly decomposes the programs into manyslices and builds the slice prediction model by nonlinear regression and wavelet neuralnetwork. The contribution of the second model is that distinguishes the program controlstructure. And the branch-sparseness and branch-densense models are built respectivelyin order to improve the prediction accuracy.2. Aiming at Large-scale real data processing we propose two general parallel processstrategies on single GPU and can be applied into various algorithms. Here complex networks clustering algorithm is used to verfiy those parallel processing strategies.Additionally, we analyze the power consumption of the two different parallel strategiesand provide the application scenes. Finally, the fault detection and recovery mechanismare proposed to guarantee the system reliability.3. Power consumption optimization control system is designed based on the ModelPrediction control theory that may be adapted to the variation of workloads. Thiscontroller can reduce the redundancy power consumption in real-time computing. Webuild Honeynet to capture abnormal network packets to verify the validity of the powerconsumption control system.4. Reliability-aware power consumption controller is proposed by using maximizeentropy method to combine performance, reliability and power consumption as acomprehensive control variable. This control system reduces the reliability cost due tothe power state adjusting mechanism. This method can overcome the limitation of thetraditional approach that transforms the multi-objective function into single-objectivefunction and distinguish the solutions quality. This controller can dynamically adjust thepower state of the GPU cluster and achieve the best status in the performance, reliabilityand power consumption.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络