节点文献

基于KUSU的超大规模Linux集群系统的设计与实现

Ultra-large-scale Custer Linux Sstem Design and Implementation Based on KUSU

【作者】 王维高

【导师】 吴江;

【作者基本信息】 西北大学 , 计算机技术, 2011, 硕士

【摘要】 随着计算机技术进入高速的网络时代,网络业务量、数据流量和计算强度的爆炸性增长,单一的服务器已经难以满足高性能计算的要求。而大型机由于非常昂贵的价格,同时由于硬件和系统软件的专用性,软硬件系统的维护费用也非常高,致使一般的企业无能力购买。集群是利用许多普通的PC机以某种拓扑结构组织起来,成为一个具有高性能计算能力的服务器。而对于超大规模集群的搭建耗时和难以管理成为企业使用集群的瓶颈。本文研究的内容是优化KUSU,从而满足KUSU的用户富士通公司在云计算时代对超大规模集群的快速搭建、容易管理、高可用性和高可靠性的需求,主要完成的工作包括:1)系统架构优化。以减轻超大规模集群中主节点的承载能力为目标,采用树的思想提出分层结构对KUSU的系统架构进行优化。2)高可用性集群的研究。对于超大规模的集群处理一些关键业务时,要求集群不间断的处理作业,为了防止主节点停机导致集群的崩溃,本文采用高可用性集群的基础技术——心跳机制,实现Linux集群的高可用性。利用postgresql9.0的流式复制解决由KUSU搭建的超大规模集群主备节点之间的数据同步。从而解决超大规模集群的高可用性。3)高可靠性集群的研究。超大规模集群节点之间的通信和数据传输对网络的依赖性很大,网络的畅通是高可靠性集群的保障,本文采用Linux网卡绑定的方法,把多个网卡结构组合成一个逻辑’’bonded"接口,从而防止了集群节点之间的通信故障,提高了集群节点之间的数据传输。设置bond的工作模式为0,实现数据传输的负载均衡。从而实现高可靠性集群。4)集群的实现。文章最后给出了如何使用KUSU部署集群,展现了一个超大规模集群的原型,并给出了对超大规模集群的实验数据和用户在真实环境下的实验数据。本文的研究是建立在自动化搭建集群——KUSU的基础之上,首先对小型集群的架构优化,从而适应与大规模集群,其次从高可用性和高可靠性集群等方面研究以到达集群的稳定性,使其能满足企业对大规模集群的需求。

【Abstract】 With the computer technology into the high speed of network times, network traffic, data flow and calculation of the strength of explosive growth, single server has been difficult to meet the requirements of high performance computing. While large machine due to the very expensive price, at the same time as the hardware and system software software Specificity, hardware and software system maintenance costs are also very high, resulting in general business without the ability to buy. Cluster is the use of many ordinary PC machine organizations by a certain topological structure, become a high computing capability server. For large scale cluster structures is time-consuming and difficult to manage become to bottleneckof the enterprise using clusters.The contents of this paper are to optimize KUSU, to meet KUSU user of Fujitsu the needs for quickly set up a largescale cluster, easy management, high availability and high reliability in the cloud computing era. The main work of down including:1) System architecture optimization. In order to reduce the large scale clusters’s the main node’s carrying capacity as the goal, adopt the thinking of the tree and use the hierarch-cal structure to optimization system architecture for KUSU.2) High-availability cluster research. For ultra-large-scale clusters to deal with some critical business, requires a cluster continuous processing operations, in order to prevent the master node shutdown lead to the cluster collapse, this paper adopts the high-availability cluster’s basis technology-the heartbeat mechanism to achieve Linux cluster’s high availability. Used the postgresql 9.0’s streaming replication to solove the date synchronizati-on between the master node and standby node by KUSU built the large scale cluster. So as to solve the large scale cluster’s high-availability.3) High-reliability cluster reseach. Large scale cluster node communication and date transfer highly dependent on the network, the network flow is the high reliability cluster gua-rantee. This paper adopts Linux NIC bonded method, the multiple NIC structure are combined to a single logical "bonded" interface, thus preventing communication failure bet -ween cluster nodes, and improve the date teansmission between the cluster nodes. Set the bond working mode it 0, to achieve data transmission load balancing. So as to achieve high-Reliability cluster.4) The cluster implementation. Finally, the article shows how to use KUSU deploy cl-uster and cluster implementation, show an ultra-large-scale cluster’s prototype. And gives to large scale clusters of the experimental data and the user’s experimental date in the real environment.The paper;s research is based on the automanted build cluster-KUSU. Firstly, cluster ar-chitecture optimized for the small cluster, to thus to fit with the ultra-large-scaler cluster. Secondly, research cluster’s high-acailabilityhigh and high-reliability to reach the cluster’s stability. So that is can meet the enterprise needs for ultra-large-scale cluster.

  • 【网络出版投稿人】 西北大学
  • 【网络出版年期】2012年 05期
  • 【分类号】TP311.52
  • 【被引频次】1
  • 【下载频次】63
节点文献中: