节点文献

三峡库区地质灾害数据仓库与数据挖掘应用研究

Study of Data Warehouse and Data Mining Application on Geohazard in Three Gorges Reservior Area

【作者】 朱传华

【导师】 胡光道;

【作者基本信息】 中国地质大学 , 地球探测与信息技术, 2010, 博士

【摘要】 崩塌、滑坡、泥石流等突发性地质灾害及其风险评价已经成为人们普遍关心的主要问题之一。滑坡是地质灾害的主要类型之一,其危害和影响程度仅居地震、火山之后,具有分布地区广、发生频率高、运动速度快、灾害损失严重等特点。对滑坡灾害进行预测预报研究能够提高对突发性地质灾害事件的快速反应能力,达到有效防灾、减灾的目的,意义十分重大。长江三峡库区是中国滑坡灾害发生的重灾区之一,三峡水库沿岸地质地貌条件复杂,且处于亚热带气候区,雨量充沛,且多暴雨,故崩塌、滑坡及泥石流时有发生,古滑坡分布甚多。移民工程迁建的新城镇几乎都在斜坡地带,水库蓄水后的库水位上升及泄洪后的库水位下降的变化及移民工程的影响下,很多古滑坡将会复活,而且还会导致新滑坡的发生。一些县城新址,由于受滑坡影响被迫多次变迁,不少移民工程遭受崩滑灾害或严重威胁。随着三峡工程的进展,对库区地质灾害的防范,引起了国务院高度重视。2001年7月,国务院全面启动长江三峡库区地质灾害防治。随着三峡库区地质灾害防治信息化建设工程的推进,目前已积累了丰富的地质灾害数据。由于地质灾害的突发性和危害性,三峡库区地质灾害预警指挥要求按指挥对象,包括不同灾害类型的不同预警级别、同一预警级别内的不同分级对信息进行重组,不但需要有分析、统计和利用这些信息的高水平方法库和模型库,还需要能够在浩如烟海的数据库中快速去寻找和挖掘有用信息的工具,使决策过程准确、迅速,这是传统的操作型数据库难以或无法作到的。而利用数据仓库联机分析处理与数据挖掘的理论及方法技术,可以对海量数据进行自动有效地分析及利用,发现数据内在联系,从中挖掘出有用的规则和知识,为决策支持系统服务。本研究的目的在于使用数据仓库技术有效整合长江三峡库区的地质灾害数据,并应用数据挖掘技术,从滑坡灾害历史数据中挖掘出有利于滑坡灾害预测预报的有效信息,为预警指挥系统服务。论文的研究内容主要包括地质灾害数据仓库建设和滑坡灾害预测数据挖掘应用研究等两个方面:(1)地质灾害数据仓库的建设采用“数据驱动”的系统设计方法,即数据仓库的模式主要由分析下层的数据源系统获得,其思路主要是:利用以前已经建立的数据库进行数据仓库的建设,要尽量利用已有的数据和代码,而不是从头开始;数据仓库的设计是从已有的数据库系统出发,按照业务领域的要求重新考察数据之间的联系,以组织数据仓库中的主题。整个设计按照需求规格说明、概念模型设计、逻辑模型设计和物理模型设计等四个阶段完成。在需求规格说明阶段识别了源数据库,即已有的操作型数据库,包括地质灾害专业属性数据库和空间数据库等。根据对数据源的分析,确定了三峡库区地质灾害数据仓库的主题,包括区域地质灾害预测预报、移民新城区地质灾害预测预报、单体地质灾害预测预报、涌浪预测预报、治理工程评估、监测预报和预警决策支持与应急指挥等主题。根据数据的收集进度和本人已完成的研究工作等实际情况,论文选择区域地质灾害预测预报主题和滑坡监测预报主题等两个主题作为研究的重点。在主题确定的基础上,概念设计阶段分别对区域地质灾害预测预报主题和滑坡监测预报主题数据源的数据层次进行了分析,推导并选取了滑坡敏感性事实和滑坡位移监测事实,确定了事实度量、维和层次,建立了滑坡敏感性和滑坡位移监测概念多维模型。其中滑坡敏感性事实确定的事实度量有已知滑坡、工程地质岩组、斜坡结构类型、构造、坡度、高程、地表河流、植被、土地覆盖、公路、坡向和地表曲率等度量;维有滑坡类型维、比例尺维和地区维等维,相应的层次分别为类型—>类—>型—>式—>期—>性、比例尺—>规模和县市—>省—>库区等层次。滑坡位移监测事实确定的事实度量有变形位移量、降雨、温度、库水位变动、地震、突发性暴雨和人工活动等度量;维有滑坡类型维、时间维、监测点维和监测类型维等维,相应的层次分别为类型—>类—>型—>式—>期—>性、日期—>月份—>季度—>年份、监测点—>滑坡体—>村—>镇—>县和监测内容—>监测仪器—>监测方法—>监测类型等层次。逻辑设计阶段将已建立的概念多维模型转换为逻辑多维模型,并设计了滑坡敏感性多维模型和滑坡位移监测多维模型的ETL过程,其中滑坡敏感性多维模型的ETL过程包括空间数据ETL和属性数据ETL两个部分。物理模型设计阶段在Oracle Warehouse Builder (OWB)中实现了数据源到目标数据仓库的上载,建立基于Oracle数据库的地质灾害数据仓库,并从分区、索引、实体化视图和存储结构设计等方面对数据仓库性能进行优化。(2)滑坡灾害预测数据挖掘应用研究基于地质灾害数据仓库的多维数据集,使用内嵌于Oracle数据库的Oracle Data Mining (ODM)中的支持向量机回归算法展开工作。支持向量机作为下一代算法,它是基于统计模型而不是通过自然学习系统的松散分析,在理论上可以取得最优的预测结果。能较好地解决小样本、非线性高维数和局部极小点等实际问题,被视为替代神经网络的较好算法。ODM中的支持向量机回归算法具有使用方便,易于部署,对算法模型参数的干预较少的特点。①首先,以忠县为研究区,进行滑坡敏感性区划研究。滑坡敏感性分析通过已发生滑坡和致滑坡内在因子之间的空间分布统计关系,评价特定地区范围内潜在滑坡事件发生的可能性,有利于国土开发和规划,从宏观上减轻滑坡灾害的威胁。研究采用普遍认可和使用的GIS栅格模型,基于数据仓库多维建模建立滑坡敏感性多维数据集,在数据仓库的基础上使用ODM的支持向量机回归算法对研究区的滑坡敏感性进行分析。为了检验Oracle Data Mining中支持向量机回归算法的性能,特引入两种常用的定量统计模型:证据权法和Logistic回归方法,进行研究对比。采用与支持向量机模型建立时完全一致的样本和预测变量,建立证据权预测模型和Logistic回归预测模型。预测结果表明,尽管没有完全预测全部已知的滑坡分布,但支持向量机得到的敏感性很高和敏感性高的区域预测了88.02%的已知滑坡,证据权法和Logistic回归所预测的百分比分别为84.48%和58.94%。可以看出,支持向量机模型的预测能力优于证据权法和Logistic回归模型。②另外,以白水河滑坡监测数据为例,进行滑坡位移监测数据时间序列分析。时间序列分析具有预测复杂系统发展趋势的能力,一直是滑坡位移动态预报研究的热点。针对目前的预测模型多基于平面文件进行分析的不足,研究中引入在数据仓库多维模型的基础上进行时间序列分析的框架,数据挖掘基于数据仓库,并参照状态空间重构原理对白水河滑坡位移时间序列数据进行处理,使用ODM的PL/SQL API建立支持向量机回归时间序列模型对处理后的数据进行挖掘,多步预测结果表明,支持向量机回归算法的前5步预测值的误差率控制在8%以内,性能相当不错。第六步的预测值误差较大,可能是受4,5月份降雨量达355mm及5月份水位下降4.68m的组合工程情况的影响,滑坡已处于临滑突变阶段(2007年6月30日白水河滑坡中部约10万m3的土体坍塌座落。),数据不再具有指导性,但84.1%的准确性仍能满足工程要求。由此可见,ODM的支持向量机回归算法可应用于滑坡监测的短期预测。通过论文研究,主要的创新与特色在于:(1)基于数据仓库的概念,通过对滑坡敏感性事实的深入分析,设计并建立空间数据GIS栅格模型的滑坡敏感性多维数据集,将三峡库区滑坡和致滑坡因子空间数据按不同比例尺、不同地区和不同滑坡类型等三个视角存放在地质灾害数据仓库中,实现空间数据按主题的集成,达到快速响应滑坡空间预测预报研究的数据需求。(2)深入分析滑坡位移监测时间序列数据,考虑时间、监测点、监测类型和滑坡类型等四个维度,设计并建立滑坡位移监测多维数据集。并基于数据仓库的多维数据集,使用ODM的PL/SQL API建立支持向量机回归时间序列模型进行数据挖掘研究。论文研究也存在一些不足,主要表面在以下这些方面:(1)数据仓库的构建基于对业务领域的理解和对业务数据“预处理”,对于具有连续性属性的空间数据预处理——致滑坡因子的重分类——采用专家经验和双变量统计方法验证结合,具有一定的主观性。(2)设计并建立滑坡位移监测多维数据集并对滑坡位移监测时间序列进行数据挖掘研究,但没有对位移和库水位、位移和降雨量进行交叉预测,挖掘模型应用的可靠性和准确性有待进一步验证。(3)数据挖掘与GIS制图没有一体化。数据挖掘的结果数据需要输出,然后由GIS软件生成滑坡敏感性预测结果图。总之,将地质灾害数据集成到数据仓库,选用基于数据仓库的挖掘工具,进行滑坡灾害预测预报是一种可行的新途径。

【Abstract】 The abrupt geological hazards, such as collapse, landslides and mud-rock flows, and their risk assessment has become one of the major issues of common concern. Landslide is one of the main types of geological hazards, the degree of its risks and impact only next to earthquakes and volcanoes, has the characters of widely distribution, highly frequency, high-speed movement and seriously losses and so on. The study of landslide forecasting can improve the capability of rapid reaction to the abrupt geological hazards for effective prevention and mitigation of hazards, which has great significance.The Three Gorges Reservoir Area is the one of the hardest hit of landslide in China.The coastal of Three Gorges reservoir has the complex geological and geomorphological conditions, and located in the sub-tropical climate zone with rich rainfall and heavy rainstorms that causes the occurrences of collapses, landslides and debris flows, and distribution of many ancient landslides. Migration project of the new towns are almost located in the slopes areas, which is influenced by the water level change of reservoir and the impact of immigration work, and not only many ancient landslides will be reactivated, but also lead to occurrence of a new landslide. Many immigrant construction projects have been threaten by the landslide or collapse, and new site of some counties had to move times due to the influence of landslides.With the progress of the Three Gorges Project, the State Council starts paying great attentions on the prevention of geohazards in the reservoir area. In July 2001,the State Council launched a comprehensive project of prevention and control of geological hazards in Three Gorges Reservoir Area.With the progress of the safeguard and control information-based construction projects in Three Gorges reservoir area, affluent geology hazard data have been accumulated until now. Because of the urgency and jeopardize, the commanded objects in geology hazard alarm command system will be classified in hazard types, alarming level and reorganized according to grade in the same alarming level, therefore, it not only need analysis, statistic, high-level algorithm library and model library which use the data, but also need a tool which has the ability of digging out useful information from tremendous amount of database, thus, making the decision process correctly and quickly.The requirements mentioned above can’t be meet according to custom operation database, but, using the data warehouse OLAP (On-Line Analytical Processing) and data-mining technology, we can found the interior connection, dig out useful rule and knowledge and provide service for decision support system by analyzing and utilizing the vast data automatically and effectively.The purpose of this study is to use data warehouse technology to effectively integrate the data of geological hazards in Three Gorges reservoir area, and apply the data mining techniques to mine useful information from historical data for landslide prediction and server for early warning command system.The papers mainly studied on two aspects include the data warehouse construction of geological hazards and data mining application of landslide prediction:(1) Data-driven method was adopted for the data warehouse construction of geological hazards, and the data warehouse schema is obtained by analyzing the underlying source systems.The main idea of the data-driven method is:the construction of data warehouse is based on the existence source systems and making full use of existing data and code, rather than starting from scratch. The design of the data warehouse is proceed from the existing database system, and in accordance with the requirements of business field to re-examine the link between the data in order to organize data warehouse theme.The whole design has four stages includes requirements specification, conceptual model design, logic design and physical model design models. In the stage of requirements specification, source databases that is the existence operational database, should be identify, which include property database and spatial database of geohazard. The theme of the geohazard data warehouse of Three Gorge Reservoir Area has been confirmed in accordance with the analysis of source systems, which include the theme of regional geohazard forecasting, local of migration geohazard forecasting, single geohazard forecasting, water wave forecasting, safeguard engineering assessment, monitoring forecasting and warning support decision and emergency command and so on. According to the practical situations of data collection progress and what I have completed of the research work, this paper focused on two themes include the theme of regional geological disaster forecasting and landslide monitoring and forecasting.In the stage of concept design, analyzing the hierarchy of the data of regional geohazard forecasting theme and landslide monitoring forecasting theme,from which derived landslide susceptibility fact and landslide displacement monitoring fact and determined the fact measures, dimensions and hierarchies, then the landslide susceptibility cube and landslide displacement monitoring cube were established. The fact of landslide susceptibility has measures are existent landslides, engineering geological rock group, the slope structure, conformation, slope angle, elevation, surface rivers, vegetation, land cover, roads, aspect and surface curvature and so on. Dimensions are landslide type, scale and region and theirs corresponding hierarchies are "types—>kind—>model—>style—>stage—>character", "scale—>size" and "county—>province—>the reservoir area" respectively. The fact of Landslide Displacement Monitoring has measures are deformation displacement, rainfall, temperature, water level change, earthquake, rainstorms and human activities and so on. Dimensions are landslide type, time, monitoring location and monitoring type and their corresponding hierarchies are "character—>stage—>style—>type—>kind","date—>month—>quarter—>year", "monitoring location—>andslide mass—>village—>town—>county" and "monitoring content—>monitoring instrument—>monitoring method—>monitoring type" respectively.In the stage of logic design, concept multidimensional model build above has been transferred to the logical model and ETL process of landslide susceptibility multidimensional model and landslide displacement monitoring multidimensional model have also been designed,of which the landslide susceptibility multidimensional model included two parts of spatial data ETL and property data ETL. In the stage of physical design, the load process has been implemented from source to the target data warehouse by using Oracle Warehouse Builder (OWB),and then geohazard data warehouse was built based on Oracle database. Moreover, the performance of data warehouse has been optimized from the aspects of partition, index, materialized view and storage structure design(2) Based on the cube of geohazard data warehouse, the process of data mining of landslide forecasting was implemented by using the support vector machine regression algorithm of Oracle Data Mining (ODM) embedded in Oracle Database. As the next-generation algorithm, Support Vector Machine is based on statistical models rather than the loose analysis of natural learning systems can obtain the best predictions in theory. It can also solve the small sample, nonlinear high dimension and local minimum points of practical problems better; hence, it is regarded as a better alternative to neural network algorithm. The support vector machine regression algorithm of ODM has the characteristics of using conveniently, deploy easily and intervene in the parameters of algorithms rarely.①Firstly, Zhong County is the study area of landslide susceptibility zone. Landslide susceptibility analysis is through the spatial distribution statistical relationship between existence landslides and the causative factors that can evaluate the likelihood of occurrence of the potential landslides within a particular region, which is conducive to land development and planning, as well as reducing the threat of landslide hazard. This study use widely recognized raster GIS model, based on susceptible landslide cube of data warehouse dimensional modeling and analyze the sensitivity of the research area using ODM’s support vector machine regression algorithm. In order to test the performance of support vector machine algorithm of Oracle Data Mining, two kinds of commonly used quantitative statistical models, weights-of-evidence and logistic regression, are used for comparison. By using the same sample and forecast variables as support vector machine mode, established weights-of-evidence and logistic regression model. The prediction results indicate that although do not predict the total existence landslide,the support vector machine forecast 88.02% of the existence landslides of the high susceptibility and very high susceptibility areas, while the proportion of weights-of-evidence and logistic regression is 84.48% and 58.94% respectively. It show that the prediction capability of support vector machine model is better than weights-of-evidence and the logistic regression model.②Secondly, Baishuihe landslide monitoring data was illustrated for analyzing the time series of landslide displacement monitoring data. Time series analysis has the ability to predict the trend of complex system, which is the hot topic of dynamic forecasting of landslide displacement. The framework of time series analysis based on the data warehouse multidimensional modeling has been introduced for directing to the shortcoming of the flat file that most prediction model currently used. The time series data of Baishuihe displacement preprocess by referencing the theory of State Space Reconstruction, then using the ODM’s PL/SQL API to establish support vector machine regression model and carrying out the data mining process based on data warehouse.Multi-step prediction results show that the error rate of the prediction value of support vector machine regression algorithm was controlled within 8% for the first five-step, which indicate the performance is quite good. The error rate of sixth step is greater, which maybe affected by the combination engineering condition that precipitation of 4,5 month amount to 355mm and water level dropped 4.68m in May, so the landslide was in phase of sliding mutation (there is about 100,000 m3 of the soil collapse in central body of baishuihe in June 30,2007.), therefore the data was with no guide, but still meet the engineering requirements with 84.1% accuracy. Thus, ODM’s support vector machine regression algorithm can be used for short-term prediction of landslide monitoring.Through the study and research, the main innovation and features of this thesis are:(1) Based on concept of data warehouse, through the in-depth analysis of the fact of landslide susceptibility, designing and building the cube of landslide susceptibility on the basis of raster GIS model of spatial data, implementing the integration of spatial data classified by theme, which is from three different view of scale, region and landslide type, meet the data need of landslide spatial prediction with rapidly response.(2) In-depth analysis of landslide displacement monitoring time-series data, considering four dimensions of time, observation location, monitoring type and landslides type, design and build the cube of Landslide Displacement Monitoring. And then using the ODM’s PL/SQL API to establish support vector machine regression model and implementing the data mining process based on the data warehouse.Thesis also has some inadequacies mainly include:(1) the construction of data warehouse is based on the understanding of the business field and the data preprocessing. The preprocessing of spatial data that characterized with consecutive attributes, the reclassify of the causative factors of landslide, which was adapted the method that combined expert knowledge and bivariate statistical methods, which has a certain degree of subjectivity. (2)Designing and building the cube of landslide Displacement Monitoring and mining the time series data of landslide Displacement Monitoring, but not implement the cross prediction of displacement and reservoir lever, displacement and precipitation, the reliability and accuracy of the application model need validate in the further work.(3) Data Mining and GIS mapping functions were not integrated. The results of data mining need to output, and then generate landslide susceptibility map by the GIS software.In summary, integrating the geohazard data into data warehouse and applying the data mining tools based on the data warehouse is applicable new approach for the landslide forecasting.

  • 【分类号】TP311.13
  • 【被引频次】17
  • 【下载频次】2141
  • 攻读期成果
节点文献中: 

本文链接的文献网络图示:

本文的引文网络