节点文献

海量数据分布式存储技术的研究与应用

Research on Distributed Storage Technology Based on Mass Data

【作者】 李存琛

【导师】 杨俊;

【作者基本信息】 北京邮电大学 , 计算机科学与技术, 2013, 硕士

【摘要】 近年来,随着信息技术的蓬勃发展,互联网上业务不断地扩张,用户不断地增加,存储空间不断地增大,数据呈现出无法想象的增长趋势。然而存储容量往往同存储性能总成反比,传统数据库在应付海量数据时显得十分吃力,暴露出并发性低、扩展性差、效率低下等问题。因此,海量数据存储成为重点研究对象,基于MPP(Massive Parallel Processing)架构的并行处理分布式数据库就是其中的一个研究方向。本文对海量数据存储技术做了探索性的研究,选题自“十一五"国家科技重点支撑项目——安全可信的电信级生殖健康服务运营支撑体系关键技术研究,主要解决项目中数据量不断扩大带来的存取性能问题,为项目提供高并发性、高可用性、高扩展性的存储技术支持。本文的所做的研究工作主要包括以下几个方面:1、基于海量数据存储技术、关系型数据与NoSQL数据模型、分布式数据库存储和基于MPP架构的并行处理模式的理论,总结了海量数据存储的方案和应用到的新技术。2、分析了海量数据存储技术特点、比较了国内外常用的分布式海量数据存储技术的优缺点,设计了海量数据的分布存储模型,并详细阐述了SQL解析模块、数据切分模块、并行查询模块以及结果模块的实现方法。3、在海量数据存储模型设计和数据并行查询存储技术的基础上,自主研发了基于MPP架构的存储架构‘’DB Mapping"系统,实现了具有良好的扩展性和大规模并行处理的优势的海量数据存储解决方案。论文主要贡献是,提出了一种基于MPP架构的并行处理的海量数据存储方法,提出了从客户端发起请求到数据持久化的全程的数据存储方式,并融合了Map/Reduce的思想,将工作分发到各个数据节点,实现了数据的高可扩展性、高可用性、高并发性。并通过搭建分布式数据节点进行仿真测试,验证了该海量数据存储方式的可行性。

【Abstract】 In the recent years, with the burgeoning development of the information technology, the data on the Internet is growing in an incredible speed. There is a continuing increase in the Internet business, the number of Internet users and the space of online storage. However, the storage capacity is inversely proportional to the storage performance. As the traditional centralized database can hardly deal with the huge amount of data, it failed to meet the expanding demands of abundant information and high system performance. Therefore, mass data storage became a key research topic and MPP (Massive Parallel Processing) architecture-based parallel processing distributed database is one of the related research directions. Based on the subject of "Research on key technologies of safety trusted telecom-level operation supporting architecture on reproductive health services", this paper mainly focuses on the mass data storage technology. It aims to provide a storage solution with high concurrency, high availability, and high scalability.The present study has addressed:1. Summed up the mass data storage and the corresponding application of new technology based on the massive data storage technology, relational data, NoSQL data model, distributed database storage and MPP architecture-based parallel processing mode theory;2.Analyzed the characteristics of mass data storage technology, compared the advantages and disadvantages of distributed mass data storage technology commonly used at home and abroad, and designed the distribution of mass data storage model. The system is composed of four modules:SQL parsing module, sharding module, parallel query module, and results summarizing module; and3.Combined with existing distributed database design method, independently developed the storage system of "DB Mapping" based on MPP architecture which has good scalability and the advantages of highly efficient processing.The primary contributions of this paper are summarized as follows. We proposed a mass data storage solution based on MPP parallel processing and provided a complete process of the data storage from the client request to the database. By integrating the MapReduce thought, the system can work on the distribution data node and satisfy the demands of high scalability, high availability and high concurrency. The feasibility of this solution was verified by a simulation test.

  • 【分类号】TP333;TP311.13
  • 【被引频次】5
  • 【下载频次】1208
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络