节点文献

网络存储服务器缓存替换策略研究

The Buffer Cache Replacement Policies in the Networked Storage Servers

【作者】 赵英杰

【导师】 肖侬;

【作者基本信息】 国防科学技术大学 , 计算机科学与技术, 2010, 博士

【摘要】 随着因特网的蓬勃发展和社会的数字化变革,数据爆炸式增长对存储系统提出了巨大的挑战,传统的直连式存储难以满足日益严峻的数据存储需求,促使人们不断探索新型的存储系统。在这种背景下,存储技术与网络技术逐渐融合,产生了网络存储技术。网络存储具有高性能、大容量、可扩展、易管理等特点,已经越来越多地被应用于金融、石油、电力、通信等高性能计算领域中。作为解决I/O瓶颈问题的主要软件方法,缓存技术始终是存储系统的一个研究热点。缓存技术通过将数据暂存在相对高速的存储器中,可以提高后续的重复访问速度。由于缓存的容量通常远远小于I/O路径中下一级存储器的容量,当缓存满了以后,需要在一个恰当的时机选择恰当的数据进行淘汰,回收缓存空间以存储新的数据。这种选择的依据就是缓存替换策略,是影响缓存性能的关键因素。目前,网络存储服务器中使用的缓存策略大多是从直连式存储中移植过来的,由于没有充分考虑网络存储中特殊的缓存层次结构和多客户机访问模式,从而使得网络存储系统中的缓存性能在实际应用中表现不佳。本文从网络存储的缓存层次结构和多客户机访问模式出发,对适应于网络存储服务器的缓存替换策略进行了探讨和研究。着重对网络存储服务器自身、网络存储服务器-单客户机、网络存储服务器-多客户机三种应用架构,由简单到复杂,由集中到分布展开研究,提出了一系列具有低失效开销、高命中率的缓存替换策略,分别适用于不同的工作负载和应用模式,从而可以提高整个网络存储系统的性能。本文取得了如下的主要研究成果:1.提出了一种基于顺序检测的缓存替换算法SABER传统的缓存替换策略忽略了磁盘I/O特性对缓存性能的影响。本文提出的SABER算法通过对缓存块磁盘地址的连续性进行分析,确定缓存块在磁盘上分布的顺序程度,优先淘汰磁盘地址顺序程度高的缓存块,充分利用磁盘I/O所具有的顺序局部性特征,可以减少缓存失效引起的随机访问磁盘次数,避免不必要的磁头寻道和旋转,在时间局部性很弱的网络存储中,以较低的失效开销达到了高性能。2.提出了一种文件系统元信息感知的缓存替换算法FIMAR传统的缓存替换策略工作于文件系统之下,块设备驱动之上,文件系统元信息对于缓存是透明的。本文提出的FIMAR算法把文件系统元信息inode引入到缓存中,以此确定缓存块归属的文件,并预测缓存块在磁盘上分布的顺序程度,然后做出替换选择,使磁盘可以尽可能工作在顺序访问模式下。文件在磁盘上一般是顺序存储的,且具有很高的空间局部性,因此这种替换算法是有效的。3.提出了一种基于驻存时间的排他缓存替换算法RED网络存储中的缓存层次结构带来了数据冗余问题。传统的缓存替换策略在消除数据冗余的同时,往往引入了专用的通信协议,无法兼顾对客户机的友好和高效。本文提出的RED算法利用I/O接口的语义信息,监视客户机缓存的内容,使用数据块在客户机缓存中的驻存时间作为服务器缓存中的替换依据。RED算法不仅可以提供高性能,同时不需要改变客户机软件,具有很高的可用性。4.提出了一种多客户机自适应缓存替换算法MACOR在网络存储中,多客户机工作负载的数据相关性严重影响着服务器缓存的性能,传统的缓存策略往往只能适应于相关性很低或者很高的情况,无法兼顾。本文提出的MACOR算法,在不同缓存层次之间实现了排他缓存,在相同缓存层次之间实现了协作缓存,可以动态自适应多客户机工作负载相关性的变化。MACOR算法还提出了一种动态的缓存划分机制,解决了多客户机对服务器缓存的竞争问题。

【Abstract】 With the vigorous development of Internet and digital society, the traditional direct attached storage systems are difficult to meet the challenge of explosive growth of data, leading to constant investigation on new storage system architecture. As the integration technology of storage and network, networked storage systems have many good characteristics, such as high performance, high capacity, ease of management and scalability, and have been increasingly used in many high performance computing fields, such as finance, oil, electricity, telecommunication, etc.As a major software solution to I/O bottlenecks, caching is a very important research issue for storage systems, no exception for networked storage systems. Cache is actually a relatively high speed memory buffer for holding data, to allow fast re-access of subsequent request. Because cache is usually much smaller than the storage on the next level of I/O path, when cache is full, an appropriate data block needs to be discarded to reclaim space for new data blocks at an appropriate time. A replacement policy is the basis for this choice, which is a key factor to affect the cache performance. Currently, most of replacement policies are derived from researches on direct attached storage systems, which are not aware of multi-client cache hierarchy and access pattern, so they perform poorly when used in the networked storage servers.This thesis focuses on the research of cache replacement policies to solve the negative impact of multi-client cache hierarchy and access pattern in the networked storage servers. The work of this thesis mainly involves three aspects: isolated server, single client and single server, multi-client and single server, in which four replacement policies with low miss penalty and high hit rate are proposed to enhance the cache performance for various workloads and applications. Specifically, the thesis makes the following contributions:1. Since disk access has sequential locality, that is, sequential access is much faster than random access, we propose a Sequential Access detection Based cachE Replacement algorithm called SABER, which obtains LBA of cache blocks to detect sequential blocks for making a replacement decision. SABER can reduce the number of random disk access induced by cache miss, to avoid unnecessary seek and rotation, so it can achieve high performance by reducing miss penalty in networked storage servers which have weak locality.2. Most of cache replacement policies work at the block level, so file system metadata is totally transparent to the cache. In fact, because files are stored and accessed sequentially in most storage systems, metadata can help to identify sequentiality of cache blocks. We present a FIle-system Metadata Aware cache Replacement algorithm called FIMAR, in which inode is attached to each cache blocks for estimating sequentiality. FIMAR assigns blocks of large inode high priority to be discarded, so that disks can work under sequential mode as much as possible.3. To eliminate data redundancy among cache hierarchy in the networked storage systems, traditional cache replacement policies often introduce a dedicated communication protocol, and can not take into account both the client-friendly and efficiency. We propose a REsident Distance based cache replacement algorithm for exclusive storage servers called RED, which tracks the contents of client caches by obtaining semantic information from standard I/O interface, and makes a replacement decision using the time a block stays in the client cache, so it can attain high performance without changes to clients.4. In the networked storage systems, data correlation of multi-client applications seriously affects server cache performance. The traditional cache replacement policies are usually suitable to either low data correlation applications or high data correlation applications, but not both. We present a Multi-client Adaptive COoperative cache Replacement algorithm called MACOR, which enforces exclusive caching among different cache levels and cooperative caching at the same level, so it can dynamically adapt to changes of data correlation. MACOR also proposes a dynamic cache partition mechanism to solve server cache conflict problems induced by multi-clients.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络