节点文献

数据网格环境下的数据传输及缓存技术研究及实现

Research and Implementation on Data Transfer and Cache Technology in Data Grid

【作者】 劳仲安

【导师】 肖侬;

【作者基本信息】 国防科学技术大学 , 计算机科学与技术, 2009, 硕士

【摘要】 开放的互联网环境中存在容量巨大、形式多样、分散存储的数据资源,对这些数据资源实施有效的管理是一个挑战性问题。数据网格以广域环境下海量、异构的数据资源为处理对象,结合高性能计算设施和大规模存储设备,实现了数据存储、数据传输、数据访问、副本管理、高性能数据处理等功能,为用户提供了一个数据管理与处理的基础设施。由于数据网格先天的广域分布性,使得在广域网中进行高效、可靠的数据传输成为了进行数据共享的必然要求。针对这种情况,我们设计和实现了网格数据传输系统,提供了并行传输、条状传输、普通第三方传输、间接第三方传输、带路由的数据传输等功能,并支持现有的主流传输协议FTP、HTTP以及HTTPS等,从高效性、能行性、稳定性、可靠性及安全性等方面满足了数据网格中分布、异构、海量数据的传输需求,改善了数据共享性能。另外,随着计算机技术的发展,CPU和系统主存的性能得到了极大的提高。然而由于IO设备的发展相对滞后,磁盘性能逐渐成为了影响计算机整体性能的瓶颈。特别是在内存密集型和I/O密集型应用中,磁盘访问的巨大延迟将严重影响应用程序的性能。因此在数据网格环境下数据的访问有可能因为磁盘的巨大延迟而导致性能的急剧下降。针对这种情况,本课题组提出了内存网格用于解决此类问题。由于不同大小的文件在数据网格环境下具有不同的访问特征,为了进一步提高内存网格的可用性,我们结合大规模网络存储系统中数据布局策略提出基于内存网格的文件分类缓存服务,在保证内存网格公平性和高可用性的前提下,对内存网格系统中的文件进行分类缓存,扩展内存网格的可用性。通过基于真实应用的实验模拟,证明了文件分类缓存可有效提高现有内存网格的性能。网格数据传输模块为底层的数据资源开凿了一条连通四面八方、数据高速流动的沟渠,使得数据网格环境下不同节点的数据可以进行有效共享;而使用内存网格对于数据进行缓存则可以有效提高数据访问的性能,因此两者从不同方面提高了数据网格的数据访问性能。

【Abstract】 In the Internet environment, there are massive data resources scattered at distributed locations with different types. The management of massive data becomes a challenging problem because of its high heterogeneity, decentralization and complexity of sharing. Data Grid integrates with high-performance computing facilities and massive storage equipments. It realizes many data management functionalities, such as data storage, data access, data transport, and replica management. It is regarded as a novel infrastructure with justice, self-adaptability and inter-activity for massive data management and sharing.According to inherent wide area distributed characteristic of data grid, developing a data transferring module which supports high speed and reliable data transferring is a necessary but difficult job .In order to suffer this challenge, we design a module which provides these functions: Parallel data transfer, Stripe data transfer, Third-party control of data transfer, Reliable skip data transfer and so on. Also this module supports FTP, HTTP and HTTPS, which are the popular network data transfer protocols. We consider our design and development on module stability, reliability and security in order to meet the target that offering fast and reliable data transfer.With the development of computer technologies, great improvements have been achieved in CPU and main memory. The magnetic disk, however, becomes performance bottleneck of the whole computer system because of the relative delay of IO devices. Data-intensive applications with large and random disk access, such as web server and DBMS, require frequently disk accessing, which can decrease the application performance. In order to improve system performance, RAM Grid is proposed to address performance issue in data-intensive applications, which can shares and utilizes huge memory resources in inter-network. The actual RAM Grid helps improving system performance greatly. But there are still some disadvantages inside such as load balance. And we know different size files present different accessing characteristic. So we try to improve combine this into our design. Large scale network storage data placement policy is a research hot point which will be introduced into our RAM Grid design. Our idea is caching different files (page or data block) in appointed remote caching node. This design is good for our system’s load balance, scalability and performance. And our experiment results show great improving performance on RAM Grid.Data transfer module is designed for fast and reliable data transfer, which makes data sharing in data grid environment easier and more effective, especially for great volumes of data. And RAM Grid is a good choice for caching some data being swapped out by local cache (memory). Fetching data blocks from remote caching nodes is better. These technologies are both useful for improving data accessing performance.

节点文献中: