节点文献

基于地震资料处理的计算网格技术的研究

Computing Grid Platform Technology Based on Seismic Data Processing

【作者】 梁鸿

【导师】 仝兆岐;

【作者基本信息】 中国石油大学 , 地质资源与地质工程, 2008, 博士

【摘要】 网格技术是一种力图把整个Internet整合成一台超级虚拟计算机计算平台的支撑技术。网格计算做为高性能计算的一个重要分支,为解决科学和工程领域一些大规模计算问题提供了理想的平台。基于地震资料处理的计算网格的研究将大大降低目前配置地震资料处理计算环境的门槛,为地震资料处理的推广和应用提供新的平台。在本论文中主要做了以下几方面的工作。1.本文针对地震资料处理的特点采用最新的网格技术首次设计出了适应地震资料处理的计算网格平台体系结构。该体系结构由资源监控模块、作业调度管理、数据管理等六个模块组成。在传统网格结构的基础上针对地震资料处理的特点增加了port模块、数据管理模块和进程迁移功能2.在网格资源监控模块的研究中本文对网格底层资源监控系统的理论和体系结构进行了剖析,并针对目前网格监控所存在的问题,提出了一种基于GMA规范的网格监控模型和动态调整监控间隔的方法,由于该模型采用带中间件的三层体系结构,有效地解决了消费者直接访问被监控对象所带来的安全问题,并对异构资源的互访提供支持,通过数据组织、分层结构及函数接口向用户提供了统一的资源对象接口,解决了异构系统的问题。同时为降低系统开销,提高系统的性能采取了动态调整监控间隔的方法和趋势预测机制。3.计算网格资源的分布性、异构性、自治性及动态性特点,决定了网格资源调度的复杂性。本文研究了适合大规模任务处理的网格资源调度系统,目的是在应用程序和网格资源之间做出合理分配,使这些应用获得最佳性能。提出了一个基于树状层次拓扑结构的分级式调度模型,这种调度模型不仅具有高度可扩展性,而且能够较好地适应网格资源动态变化的特性。同时为有效地平衡系统负载,缩短任务的最小完成时间,提高调度效率提出了Dmm算法。4.由于地震资料处理的海量数据特点,本文还进行了管理海量数据的数据网格的研究。针对网格环境中数据传输的速度和稳定性都无法得到保障的“瓶颈”问题。本文分析了网格数据管理及其Replica技术,研究了网格数据传输机制。提出了融Replica技术和数据传输为一体的基于多Replica的数据传输模型MRT,并加以定义,明确了MRT模型的组成、元素间的映射关系和工作流程等,并基于启发式方法,提出了启发式动态任务分配算法,最后对算法进行了复杂度分析5.为解决网格计算中资源故障发生较频繁的情况保证网格的高可用性和高可靠性。本文还对网格计算中的进程迁移和检查点技术做了深入的研究,提出了基于检查点的进程迁移模型(Process Migration Model based on Checkpoint, PMMC)。PMMC模型有效地平衡了节点负载,提高了节点的利用率和吞吐量。最后分析了进程迁移算法,提出一种进程迁移算法的策略6.最后本文在自己搭建的网格平台上重点分析了三维叠前深度偏移的各种算法,旨在将自动并行化技术应用于网格这种超强计算能力中,使得工程实践中已经积累的大量串行应用程序能够在网格环境中高效并行执行。通过阅读大量的三维叠前深度偏移串行程序,分析并归纳出其并行化特征,指出:Kirchhoff积分法叠前深度偏移中,旅行时的计算适合采用按炮点并行计算,成像输出适合基于炮检距的自动并行化;分步傅立叶法叠前深度偏移适合按单炮记录进行并行。在分析三维叠前深度偏移串行程序自动并行化特征的基础上,提出一个自动并行化模型,并介绍模型中各模块所采用的关键技术。针对模型中的数据及循环分布这一难题,论文将其分为无通信及有通信两种情况分别进行分析。无通信情况下,分别针对数组与循环、同名数组间、异名数组间三种情况,提出三个算法,解决对应情况下的数据及循环分布问题;有通信情况下,将该问题抽象成一个数学模型(APDG图的划分)。

【Abstract】 Grid is a kind of support technology which try to turn the entire Internet into a super virtual computer and super computing platforms. As an important branch of high-performance computing, grid computing provides an ideal platform for some large-scale computing problems in science and engineering fields. The research of computing grid based on seismic data processing will greatly reduce the cost of conputing environment configuration which is used in the seismic data processing, and provide new platform for seismic data processing. In this paper, our studies are mainly as follows:1. According to the characteristics of seismic data processing, this paper uses the latest grid technology to design the grid computing platform and data grid platform architecture suitable for seismic data processing for the first time. This architecture is composed of six modules including the resource monitoring module, job scheduling management, data management and so on. According to the characteristics of seismic data processing, it adds port module,data managing module and process migration fuction, on the basis of traditional grid structure.2.In the study of grid resource monitoring module, the paper analyses the theory and architecture of grid resources monitoring system. And according to the problems of current grid monitoring, this paper also provides a grid monitoring module and a method for dynamic adjustment of monitoring interval based on GMA specification. This module can effectively solve the safe problems caused by the consumer visiting the monitored object directly, because of its adopting three layer architecture with middleware. It also supports the interoperability of heterogeneous resoures by providing uniform resource object interface for users through data organizing, hierarchy structure and function interface, and so solves the heterogeneous system problems. At the same time in order to reduce system cost and improve system performance, it also uses a dynamic monitoring interval method and trend forecasting mechanism.3. Scheduling of grid resource is complicated because of distributed, heterogenous, autonomous and dynamic characters of computing grid resources. This paper studies the Grid Resource Scheduling System suited for large-scale tasks. The aim is that we could get the best application performance through making a reasonable allocation between applications and grid resource.A hierarchical scheduling model based on tree structure topology is put forward, which not only have a high scalability, but also can preferably adapt to the dynamic characteristic of grid resource. At the same time, in order to balance the load of system and shorten the smallest time for completing the tasks and improve scheduling efficiency, A Divided Min-Min algorithm is proposed.4. Because of mass data in seismic data processing, we also make research on data grid. According to the "bottleneck" problems of the speed and stability of data transmission in grid environment, we analyze the grid data management and Replica technology, studie the transmission mechanism of grid data. The paper combines Replica technology with data transmission, and defines a data transmission module MRT based on multi-Replica which includes the composition of the MRT model, mapping relationship between elements and work flow, etc.The paper also puts forward a heuristic allocation algorithm for dynamic tasks based on heuristic method and ultimately analyzes the complexity of the algorithm.5. In order to settle the resource trouble that happened frequently in grid computing and ensure high usability and reliability, this paper makes a further study on process migration and checkpoint technology of grid computing and raises Process Migration Model based on Checkpoint, PMMC. PMMC can balance the node load effectively, enhanc the utilization and . throughput. At last the paper analyzes the process migration algorithm, and proposes a strategy of it.6.At last, the paper primarily analyzes various algorithms of 3D pre-stack depth migration on our grid platform. The purpose is to apply the automatic parallel technology to grid computing, and make many serial applications accumulated in engineering practice efficiently implemented in parallel on the platform . By analyzing a lot of 3D pre-stack depth migration serial programs, the characteristics of parallel are summarized, which points out that in Kirchhoff integral method for pre-stack depth migration, computation of travel time should use parallel method on shotpoint, imaging output is suitable for the automatic parallelization based on offset, and Fourier step-by-step method for pre-stack depth migration could parallel on shot records.An automatic parallelization model based on the migration processing program is proposed on the analysis of automatic parallelization of serial procedures of 3D pre-stack depth migration. In order to resolve the problems of data and loops distribution, this paper partition it into parts according as whether the processors communicate with others after distribution. When it does not relate to communications, we put forward three algorithms, which respectively resolve the problems of alignment between arrays and loops, arrays with the same name, and arrays which with different names. As for the other situation, we implement the problems through extracting APDG (Automatic Parallel Distribution Graph) from the multilayer nesting loops, which is subject to the restriction that the edges connected between different subsets after distribution are least.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络