

Based on Time Series Similarity Matching Algorithm for Earthquake Prediction Research

【作者】 郑华

【导师】 李炜;

【作者基本信息】 安徽大学 , 计算机软件与理论, 2010, 硕士

【摘要】 我国是地震多发国家。地震活动频度高、强度大、分布范围广、震源浅,地震灾害十分严重。由于引发地震的因素很多且各种因素之间具有极不确定的非线性关系。本文通过时间震级序列数据挖掘方法对地震预报展开了一系列的研究。其研究的目的是根据地震数据的特点,把经典的时间序列数据挖掘方法和高性能的计算机技术相结合,研究适合于地震预报的数据挖掘算法,找到地震数据背后的规律,发现潜在,有价值的地震预报知识。地震相关地区相似度匹配研究工作包括以下几个部分:1.对地震数据进行预处理,地震目录数据由发震时间,震中位置,震级等信息组成的数据,若直接进行管理规则挖掘,挖掘出来的结果是一些点与点之间的关系,本文对地震数据进行去噪、分块、画圆、聚类等离散化和主类分析,将地震转化为我们需要的地震数据格式。2.根据地震领域相关知识,定义了时间震级序列相似度,提出一种基于地震相似度的序列相似性匹配算法。该算法引入时间、震级二维阈值匹配思想,能快速地进行序列的相似性匹配,从而在地震序列中发现地震相关地区。3.约束规则的序贯模式度量模型的建立相似度匹配关联算法,其中相似度匹配定义可分为两个部分:粗粒度相似匹配,即在地震源目录中找出地震条数差值在一定的阈值margin下的地震区域;细粒度相似匹配,在粗相似的基础上,把时间、震级、发震地点等信息转化为二维阈值支持数的地震序列,对需查询的地震序列与地震数据仓库中的地震序列记录进行比较,找出具有较高相似度的地震序列。4.实现了基于机群系统的地震预报并行数据挖掘平台。在该平台中对海量数据进行预处理筛选的基础上再进行时间相似性匹配,增加了横向和纵向,多地区和多时间段的匹配:以及不同时间差,阈值的匹配,并通过大量实验对该模型进行反复验证,对我国地震频繁地区近几十年的地震历史数据进行了匹配实验分析,取得了可信度较高的实验结果,验证了所给序列相似性匹配控制策略的有效性、实用性以及算法的优越性。

【Abstract】 China is earthquake-prone countries. High-frequency seismic activity, strength, wide distribution, light source, a very serious earthquake. To the country and the people brought huge losses. Because many factors that caused the earthquake and a variety of factors have highly uncertain nonlinear relationship. Using data mining techniques can be more systematic, in-depth, comprehensive, detailed research on earthquake prediction analysis play a role in promoting. This paper focuses on earthquake prediction in time series data mining algorithms magnitude theories, methods and practical applications. In this paper, the magnitude of time series data mining started a series of earthquake prediction research. The purpose of the study is based on the characteristics of seismic data, the classic time-series data mining methods and high-performance computer technology combined with studies for earthquake prediction in data mining algorithms to find the law behind the seismic data, identify potential and valuable knowledge of earthquake prediction.Similarity matching of earthquake-related areas of work include the following parts:1. Preprocessing of seismic data, seismic catalog data from the earthquake time, epicenter location, magnitude, composition data and other information, if the management of direct rule mining, excavation The result is that some of the relationship between points, this denoising of seismic data, block, round, discrete cluster, etc. and the main class, the seismic into the seismic data format we need.2. According to the seismic area of knowledge, time and magnitude of the definition of sequence similarity, is proposed based on seismic sequence similarity similarity matching algorithm. The algorithm introduces time, magnitude two-dimensional threshold matching, can quickly match the sequence similarity to the earthquake in the earthquake sequence found in relevant areas.3. Constraint Rule metric model of sequential pattern matching similarity association algorithm, which matches the definition of similarity can be divided into two parts:coarse-grained similar to the match, that the earthquake source directory to find the difference in the number of seismic section a certain threshold margin of seismic area, simply, in a period of time, an earthquake occurred in an area project, another region of the tens of thousands of items of Article earthquake, then the two regions have the possibility of similar the minimum of; fine-grained similarity matching, on the basis of similarity in the rough, the time, magnitude, earthquake location and other information into two-dimensional threshold to support the number of earthquake sequences, need to check on the seismic sequence and seismic data warehouse The earthquake sequence records were compared to find sequences with high similarity of the earthquake. When a higher degree of similarity, the two areas is bound to reflect the occurrence of earthquakes have certain rules on the relationship.4. Realized the cluster system based on parallel data mining platform for earthquake prediction. In the platform of the massive data preprocessing filter time based on similarity matching further increased the horizontal and vertical, multi-regional and multi-time matching; and different time difference, the match threshold, and through a large number of experiments repeated validation of the model, the earthquake in China’s earthquake-prone areas in recent decades the history matching experimental data analysis, made more credible experimental results verify the sequence similarity to match the effectiveness of control strategies, practical and algorithm.

  • 【网络出版投稿人】 安徽大学
  • 【网络出版年期】2010年 11期

