节点文献

集群多媒体存储系统的数据组织研究

Research on Data Organization of Cluster Multimedia Storage System

【作者】 万继光

【导师】 谢长生;

【作者基本信息】 华中科技大学 , 计算机系统结构, 2007, 博士

【摘要】 随着网络上多媒体数据的爆炸性增长,大量的多媒体数据在世界各地产生并共享,导致对海量可扩展存储系统的需求快速增长。在分布式网络共享环境,大量的分布式客户端同时访问服务器,对服务器的性能要求更高。针对多媒体的特点和需求,采用自治式服务器集群结构设计一种集群多媒体存储系统CMSS(Cluster Multimedia Storage System),并着重研究CMSS系统的数据组织技术,包括元数据组织、数据组织及迁移和多媒体Cache算法等。CMSS系统设计了一种TLMS(Two-level Metadata Server)元数据组织结构。TLMS通过分离存储数据的逻辑视图与物理视图,来实现两级元数据组织结构。其中逻辑视图由全局元数据服务器GMS(Global Metadata Server)来管理,物理视图由各个存储服务器上的本地元数据服务器LMS(Local Metadata Server)来管理。采用GMS双机热备技术,既实现了单一命名空间,又避免了单点失效;利用全局元数据Cache技术,缩短了请求的处理路径,减轻了GMS的负载,从而提高了系统的性能;采用LMS技术,每个存储服务器能够自主的管理自己的存储资源和元数据及数据本身,并且能够独立提供存储服务;另外,CMSS的两级元数据组织技术避免了传统集中式元数据服务器的性能瓶颈,也解决了分布式元数据组织的元数据一致性和同步开销问题。为了实现高性能、高扩展性,通过对传统分布式和并行数据组织的分析,设计了一种AutoData数据组织结构。AutoData采用多级数据组织结构,既具有分布式和并行数据组织的优点,又能克服两者缺点。AutoData将整个系统的存储空间分为三层:内存并行存储池、磁盘并行存储池和分布式存储池。所有存储服务器内存的一部分组成一个内存并行存储池;所有存储服务器磁盘的一小部分组成一个磁盘并行存储池;所有存储服务器的剩余磁盘存储空间组成一个分布式存储池。内存并行存储池性能最好,但是其容量最小;磁盘并行存储池性能次之,但是其容量要相对大一些;分布式存储池性能最低,但是其容量最大。通过分析,在多个客户端同时访问服务器的情况下,虽然每个客户端访问的地址可能是顺序的,但是从存储服务器磁盘调度看来,这些多个客户端的访问地址是随机的。为了减少对磁盘的随机访问次数,设计了一种CBP(Client-Based Prefetching)预取算法,CBP算法采用基于客户端的策略,为每个客户端设置一定的预取缓存,并采用大的预取数据块,减少对磁盘的访问次数,提高了系统的性能。在Cache替换算法中,基于多媒体请求访问地址可预测的特点,设计了一种FOPT(Forecast OPT)替换算法,FOPT算法根据多媒体访问的地址连续性,来预测将来访问的地址顺序,从而实现基于预测的OPT算法。在千兆以太网络环境下对CMSS系统进行了相应的试验测试和性能分析。在单个服务器的情况下,分别测试了CMSS和NFS的性能。总的来说,随机读情况下,CMSS服务器性能略低于NFS服务器,但是顺序读情况下,CMSS服务器性能要比NFS高20%左右。在多个服务器的并行测试环境,分别测试了CMSS、Lustre和PVFS的性能。测试结果显示,随机读情况下,CMSS服务器性能要高于Lustre,但是低于PVFS服务器,而在顺序读情况下,CMSS服务器性能要比Lustre和PVFS高30-40%。充分说明了CMSS系统针对多媒体应用顺序读优化的有效性。仿真不同的客户端的情况下,对FOPT和LRU算法命中率进行了测试,从结果看,当请求小于64KB时,FOPT和LRU算法的命中率都很高。这个主要是因为服务器Cache采取了64KB大小的预取算法,说明了CBP算法的有效性。当请求达到64KB时,不管客户端数的多少,FOPT算法的命中率比LRU命中率高50-70%,充分说明了FOPT算法的有效性。

【Abstract】 With the explosive growth of multimedia data on the Internet, huge amount of multimedia data have been and are continuing to be generated and shared by users around the globe. Accordingly, the demands for large scale expandable multimedia storage systems are dramatically increasing. Furthermore, multimedia date have unique demands for storage system. In a distributed network environment, a large number of clients access the servers simultaneously, putting even more rigorous performance requirement on the servers. Aiming at the traits and requirements of multimedia data, deploying an autonomous server cluster architecture, we designed a Cluster Multimedia Storage System (CMSS). More specifically, we studied the data organization algorithms including CMSS metadata management, data organization and migration, multimedia caching algorithm, etc.CMSS emplayed a TLMS(Two-level Metadata Server) algorithm. The TLMS algorithm separates the data logical view from the physical view. The logical view is managed by a Global Metadata Server (GMS). The physical view is managed by Local Metadata Servers (LMSs) on individual storage servers. By using online fail-over technology, our GMS server implements a single name space without introducing a single point of failure. With the help of the global metadata caching technique, the request handling process is much simplified, reducing the load of the GMS server and improving the system performance. By adopting the LMS technique, each storage server can autonomously manage its private storage resources as well as metadata and actual data. In addition, each storage server can provide independent storage services. Furthermore, the CMSS two-level metadata management avoids the metadata performance bottleneck exhibited in traditional centralized metadata server solutions. It also solves the consistency and synchronization overhead problems in distributed metadata management solutions.For the sake of combined high performance and scalability, through the analysis of traditional distributed and parallel data organization, we designed a AutoData algorithm, which utilizes a multi-level data management architecture and shares the advantage of distributed systems and parallel data organizations without bearing their deficiencies. The entire storage hierarchy consists of three layers: an in-memory parallel storage pool layer, an on-disk parallel storage pool layer and an on-disk distributed storage pool layer. The in-meory parallel storage pool is then made up of certain amount of memory from each and every of the storage server memory space. Similarly, the on-disk parallel storage pool is made up of a small amount of disk space from each and every of the server storage space. Finally, the rest of the disk storage space from the servers forms the on-disk distributed storage pool. The in-memory parallel storage pool is the smallest in size while achieving the best performance. The on-disk parallel storage pool is of relatively weaker performance and larger capacity. The on-disk distributed storage pool has the lowest performance with the largest capacity.According to our analysis, even though the accesses from individual clients are sequential, when multiple clients access the servers at the same time, the mixed accesses from multiple clients exhibit a random access pattern. In order to reduce the number of random accesses to the disks, we designed a CBP(Client-Based Prefetching) algorithm. By reserving certain amount of buffer for prefetching and adopting large prefetching block sizes, we reduced the number of disk accesses and improved the system performance. In terms of cache replacement algorithm, based on the high predictability of multimedia accesses, we designed a forecast-based optimal cache replacement algorithm, namely Forecast OPT (FORT). We implemented the forecast-based FORT algorithm by predicting future access addresses based on the contiguity of the multimedia accesses.We evaluated and analyzed CMSS system performance in a Gigabit Ethernet environment. We compared the single-server performances under CMSS and NFS. In general, for random reads, the CMSS server performance is slightly worse than that of the NFS server. But for sequential reads, the CMSS server performance is 20% better than NFS. In addition, we also tested the performances of CMSS, Lustre and PVFS, respectively, in a multi-server parallel storage environment. The results show that, for random reads, the CMSS server performance is better than Lustre while worse than PVFS. However, for sequential reads, the CMSS server overperforms Lustre and PVFS by 30-40%. The results reveal the effectiveness of CMSS system optimization for sequential reads.We also compared the hit rates results under the FORT and LRU replacement algorithm by simulating a multiple-client system. The results show that the hit rate of FORT and LRU are similar and relatively high when the request sizes is below 64KB. This demonstrates the effectiveness of our algorithm by adopting the 64KB prefetching block size. When the request size reaches 64KB, regardless of the number of clients, the hit rate of the FORT algorithm is 50-70% higher than LRU, further illuminating the effectiveness of FORT algorithm.

【关键词】 存储系统多媒体集群分布式元数据
【Key words】 Storage systemMultimediaClusterDistributeMetadata
节点文献中: 

本文链接的文献网络图示:

本文的引文网络