节点文献

海量网络存储系统中的多级缓存技术研究

Research on Multilevel Cache Technology of Mass Network Storage System

【作者】 侯昉

【导师】 赵跃龙;

【作者基本信息】 华南理工大学 , 计算机应用技术, 2011, 博士

【摘要】 随着人们对数据应用价值理解的不断加深,现今计算机存储系统中海量数据信息的存储、管理和应用能力已经受到了越来越多的重视。近年来各种以直连存储(Direct Attached Storage,DAS)、附网存储(Network Attached Storage,NAS)和存储局域网络(Storage Area Network,SAN)及其衍生技术为基础的海量网络存储系统技术一直是计算机科学与技术领域中所重点关注的研究对象和研究热点。目前在现有的海量网络存储系统的研究领域中还存在两个必须解决的重要问题:(1)海量网络存储系统体系结构的层次划分不清楚,这样就使得各种结构的实现、兼容和演化存在困难;(2)没有充分利用海量网络存储系统体系结构的层次性来实现海量数据的访问性能优化。海量存储网络体系结构的设计问题和海量数据存取访问的性能优化问题对于海量网络存储系统技术的深入研究有重要的理论意义和实际意义。本文从存储系统层次化的体系结构研究视角出发,重点对海量网络存储系统中的层次化体系结构模型、各级缓存局部性强弱的量化方法、页面访问的周期性规律、访问延迟缺失代价缓存管理算法、网络应用环境下的文件大小分布与访问频率和文件访问的动态模式等关键技术问题进行了系统而深入的研究,取得了一些有创新性意义的研究成果。其主要研究工作和创新性成果体现在以下几个方面:(1)提出了一种层次化的海量存储系统分级模型(Hierarchical Mass Network Storage Architecture,HMNSA)和多级缓存加速思想。HMNSA结构主要包含5个层次,分别是存储应用层、存储表示层、存储连接层、存储网络层和存储物理层。通过各个层次之间相互调用服务、提供服务的方式,使用多种存储技术构建海量网络存储系统。在此基础上,设计并实现了一种基于智能网络磁盘存储系统(Intelligent Network Disk Storage System,INDSS)及其文件系统的层次化海量网络存储系统,通过实验验证了层次化体系结构的海量网络存储系统的可行性和正确性。其次,研究了在海量网络存储系统的多层体系结构中设置多级缓存的必要性和可行性,将多级缓存结构从传统的CPU片上缓存——主存储器文件缓存——外存储器硬件缓存向上拓展到了海量网络存储系统的存储服务客户端——存储网络——存储业务服务端。(2)基于HMNSA模型和多级加速思想,在存储表示层提出了一种基于数据本地局部性强弱的缓存调度算法(Locality Strength Algorithm,LSA)。在海量网络存储环境中,数据从存储网络中被读入本地内存,在本地内存为这些数据提供缓存空间以提高处理器访问速度。如何在多用户多进程环境下为不同用户进程分配缓存空间,是影响进程执行效率的重要因素。为此,对局部性强弱的量化描述进行了研究,给出了量化指标及其计算方法。并在此基础上提出了存储表示层缓存的调度算法LSA,该算法可以减少缓存空间频繁调整导致性能下降的颠簸现象。(3)基于HMNSA模型和多级加速思想,在存储网络层提出了一种基于访问周期性和延迟代价的缓存调度算法(Periodicity and Miss Cost,PMC)。当一个应用访问网络存储系统时,该应用请求首先被海量网络存储系统中的存储节点处理,在作者课题组开发的INDSS系统中,这些节点具备数据缓存能力。对这些节点的缓存管理研究表明:如果提高缓存命中率的研究已经逼近极限,则可以考虑减低不命中代价,而不命中代价是由各Cache块的访问延迟决定。PMC算法的设计思想是:利用应用程序访存的周期性现象和不命中代价,尽可能推迟换出访问延迟大的数据块,以免该块被周期重复访问时又要付出较大代价。因此,该算法降低了系统的加权响应时间。(4)基于HMNSA模型和多级加速思想,在存储物理层提出了一种基于统计结果的访问热点数据缓存调度算法(Reallocation based on Distribution and Visit Frequency,RDVF)。对目前主流操作系统的文件大小分布和空间占用、网络文件服务环境下的文件读取请求和视频下载点播进行了统计分析,研究了海量网络存储系统中交换文件、小尺寸文件和访问集中文件进行优化的必要性和可行性。基于目前提出的若干新的存储器件体系,提出了海量网络存储系统的基于统计结果的混合加速存储物理层结构和访问热点数据缓存调度算法RDVF。实验结果表明,提出的RDVF算法可以缩短I/O响应时间和提高数据传输率,能够改善海量网络存储系统的存储访问性能。

【Abstract】 Since the value of data application has attracted more and more attention, the storage and management of mass data have been playing an increasingly important role in computer system performance development. People from academia, engineering and industry fields have made massive efforts and contributions for the theories, technologies and applications of storage architectures such as Direct Attached Storage (DAS), Network Attached Storage (NAS), Storage Area Network (SAN) and their derivatives.However, there were still two fundamental problems to be solved in the mass network storage system research field: the architecture design of storage network and the performance optimization of mass data access. Several defects could be found if we surveyed models and products mentioned above in the perspective of system architecture. These defects included: (1) the interfaces of different architecture levels were not clearly defined, making the implementation, compatibility and evolution costly; (2) the hierarchical optimizing methods were not adapted in the mass network storage environment. To deal with these defects, the research on hierarchical architecture and access performance optimization of mass network storage was essential both in theory and in practice.From the hierarchical architecture perspective, a series of principles were studied systematically and in detail in this thesis. These principles included: a hierarchical architecture of the mass storage system and its implementation, a quantitative way of evaluating the locality strength and its realization, the periodical pattern of page accesses and the application based on this pattern and page miss cost, the relationship between the static distribution layout of file sizes and the dynamic access pattern of data request in the network environment and the utilization of this relationship. The innovations based on the above research are as follows: (1) A Hierarchical Mass Network Storage Architecture (HMNSA) with five levels and a method of multilevel acceleration with cache between these levels were proposed. The five levels were Storage Application Layer, Storage Presentation Layer, Storage Connection Layer, Storage Network Layer and Storage Physical Layer. By the distinct interface between adjacent layers, various existing storage technologies could be integrated into a hybrid mass network storage system with compatibility. Based on this hierarchical mass network storage architecture, this thesis designed and implemented a prototype of a mass network storage system named as Intelligent Network Disk Storage System (INDSS). The feasibility and validity of this architecture were verified.Furthermore, the performance optimization approach was studied. Since traditional multilevel cache between the adjacent levels of CPU, internal storage and external storage had been proved efficient and effective for system performance, this paper extended this multilevel cache method to the adjacent layers of the above hierarchical mass network storage architecture. The network storage service chain - storage service clients, storage network and storage transaction servers - could be accelerated by these multiple cache.(2) A quantitative approach to the locality strength and its application in the Storage Presentation Layer were studied and a Locality Strength Algorithm (LSA) was proposed. Data in local memory was acquired from remote storage node. Some space in cache buffer would be allocated for the data to speed up their access. The allocation and occupation of this cached data of multi processes in the modern multi clients operating system was a key factor of run-time efficiency. This paper indicated that locality strength was an applicable index for the allocation. The locality strength indexes could be calculated by existing information. Based on these indexes, the LSA reduced the peak memory occupation of some process and decreased the frequency of paging thrashing in the Storage Presentation Layer.(3) The periodical page accesses pattern and their miss costs in the Storage Network Layer were studied and a Periodicity and Miss Cost (PMC) Algorithm was proposed. A data block request would be satisfied by storage network when it was missed in local memory. In our IND system, the distributed intelligent nodes in the storage network were capable of data caching. Our research on the caching management showed that lowering the average page miss cost was a valid alternative to increase the cache hit rate. The average miss cost was determined by various block latencies. Since numerous blocks will be accessed several times while the access costs of blocks are fluctuant, the PMC tries to keep the pages with high access costs in the cache as long as possible for future re-use to prevent repetitive expensive operations. Therefore, the average system response time was improved.(4) The distribution layout of file sizes and access frequencies, especially in the network environment, was analyzed and a Reallocation based on Distribution and Visit Frequency (RDVF) algorithm in the Storage Physical Layer was proposed.The necessity and feasibility of optimization for swap file, small size files and frequently visited files were studied. Combining these results with new devices technologies, a hybrid storage structure in the Storage Physical Layer was illustrated. Experiments and simulation results showed that the transfer rate was increased while the average response time was reduced by RDVF. The performance of mass network storage system was enhanced.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络