节点文献

林地落界数据快速查询技术研究

Study on Fast Forestland Border Data Search Technology

【作者】 李惺颖

【导师】 唐小明;

【作者基本信息】 中国林业科学研究院 , 森林经理学, 2014, 博士

【摘要】 林地数据反映了森林资源现状和变化情况,是制定我国生态建设管理决策的重要依据。全国林地“一张图”系统的建设,汇集了全国基础地理、遥感影像、地形、林地落界小班数据及各类专题图超过500TB的数据,其中仅林地落界小班数据就达6738.88万条,成为了我国林业有史以来规模最大的空间数据库。随着应用的深入以及更新调查的全面展开,林地落界数据规模还将迅速增大,即将成为名符其实的“大数据”。面对规模越来越大的空间数据,传统空间数据管理方法的局限性表现得越来越突出,且目前没有合适的GIS框架可用于解决林业上的大规模空间数据问题。在此背景下,本文对并行GIS体系在林地落界数据查询上的应用及其实现的关键技术进行深入探讨和研究。本文深入分析了传统管理体系中存在的问题,结合林地落界数据的特点,设计了一个以高速并行GIS为核心的林地落界数据快速查询体系,阐述了其中的关键技术,实现了一个原型系统对相关技术进行验证,结果表明原型系统相对传统管理体系有较大优势,能够满足目前及将来更大规模的林地落界数据的管理,有一定的推广应用价值。本文所做研究工作如下:(1)在理论上建立了林地落界数据快速查询理论体系,给出数据存储布局、并行分配调度、查询结果汇集归约三个关键技术的思路;(2)数据存储布局研究:通过行政界选择和粒度测试解决了数据粒度划分问题,通过图顶点着色理论解决数据的离散布局问题。测试结果表明数据粒度越小速度越快,在返回全库约1/3数据的大结果查询测试下,以县划分速度为17052毫秒,是按市划分的30373毫秒的1.8倍;以县划分的离散存储布局是聚类布局的1.35倍;(3)并行分配调度研究:根据快速查询体系特点设计了三层关联的并行计算模型。第一层给出了一种根据CPU性能进行任务分配和调度的算法;第二层给出了一种根据数据节点的并发和计算能力以及数据副本来进行分配和调度方法;第三层给出了一种利用线程执行的不同阶段对CPU计算量需求不同来进行分配调度的方法。最终在测试中,用4个数据节点使系统速度达到传统管理模式的3.7倍,获得了接近线性的加速比;(4)查询结果汇集归约研究:通过建立数据分类模型自动进行数据传输控制解决了查询响应速度的问题;通过属性数据主要信息提取和空间数据抽稀,减少了结果传输量;通过主节点内存、数据节点内存、缓存表三层缓存减少二次相关查询开销。测试结果表明,查询过程和响应的异步处理极大提高了响应时间,在测试中查询响应时间不超过2秒。本文的创新点如下:(1)从静态负载均衡的角度对数据布局进行了研究,应用了基于图顶点着色理论的布局方法,实现了数据在服务器集群中的均匀离散布局,具有创新性;(2)分析了集群结构对并行计算过程的影响,提出了根据集群结构对并行计算任务进行分层的方法,研究了多层并行计算任务的关联与调度过程,提高了计算任务在集群中的动态负载均衡度。

【Abstract】 The forestland data reflects the status and changes of forest resources, is an importantbasis for the development of China’s ecological construction management decisions. Thenational forest "a map" system has collected over500TB of national basic geography data,remote sensing data, terrain data, forestland border data and various thematic maps data,which forestland border data has amounted to more than6783million pieces, became thecountry’s largest-ever forestry spatial database. With the deepening of the application andupdate the investigation in full swing, the scale of forestland border data will increase quicklyand become a truly "big data". The face of the increasing scale of spatial data, the limitations ofthe traditional spatial data management methods behaved more and more prominent,and thereis no suitable GIS framework can be used to solve the problem of forestry about massivespatial data. In this context, this paper has done some in-depth studies and researches aboutparallel GIS system on the application of forestland border data query and its key technologies.This paper deeply analyzed the problems existing in the traditional management system,combined with the characteristics of forestland border data, designed a fast query system whichcore was a high-speed parallel GIS about the forestland border data, expounded the keytechnologies, realised a prototype system to verify the correlation technologies. The resultshows that the prototype system has greater advantage compare to the traditional managementsystem,the system can meet the requires about the management of forestland border data incurrent and future, which has the certain popularization and application value.The researchworks this paper have done are following:(1) established a theoretical system about how toquick query the forestland border data, provided three key technologies;(2) The data storagelayout study:solved the problem of data granularity classification through the selection ofadministrative boundary and test the particle size, solved the problem of the discrete layout ofthe data through the graph vertex coloring theory. The test result showed that the smaller the particle size, the faster the data, in the return to full library of about1/3of the big results of thequery test data, by county division at the rate of17052milliseconds, is1.8times of30373milliseconds divided city; By county division of discrete storage layout is clustering layout1.35times;(3)The parallel distribution scheduling study:based on the characteristics of fastquery system, designed a parallel computation model for three layer of correlation. The first isa task allocation and scheduling algorithm according to CPU performance; The second presentsa concurrent and computing power according to the data node allocation and schedulingmethods and data copy; The third layer gives method that using the different stages of thread toa CPU calculation execution needs of allocate different scheduling. Finally during the test,the system with four data node speed reached3.7times that of the traditional managementmode, got close to the linear acceleration ratio;(4) reduction of collection of queryresults:through the establishment of data classification model for automatic data transmissioncontrolto solve the query response speed problem; the main information extraction andspatialattribute data data thinning, reduced the transmission capacity; reduce two related querycostby the master node memory, data memory, three level from main-node cache, data-nodecache DB table. The testresults show that the query returns, edge edge, greatly improves theresponse time, the query response time of less than2seconds in the test.The innovation of this paper are as follows:(1) the data layout is studied from the angle ofapplication of static load balancing, the layout method based on graph vertex coloring theory,realizes the data uniform discrete layout in the server in the cluster, innovative;(2) analysis ofthe impact of cluster structure of parallel computing process, put forward according to themethod of cluster structure parallel computing tasks are hierarchical, associated with thescheduling of multi parallel computing task, improve dynamic computing tasks in cluster loadbalance level.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络