节点文献

基于流形学习的分类算法及其应用研究

Research on Manifold Learning Based Classifiers and Their Applications

【作者】 康莉

【导师】 李爱国;

【作者基本信息】 西安科技大学 , 计算机软件与理论, 2010, 硕士

【摘要】 利用数据挖掘技术进行地震预测是一个令人感兴趣的学术研究领域,有着重要的学术价值和现实意义。本文探索基于数据挖掘技术的余震时间预测和震级预测的新途径,探索将流形学习降维算法应用于余震异常检测。在研究几种数据挖掘算法的基础上,本文探索适合余震时间预测和震级预测的方法以及对地震特征属性降维的方法,并开发出相应的软件原型系统。本文的主要工作包括:提出了基于自适应局部线性化ALL的余震间隔时间预测方法。ALL是一种基于奇异值分解的自适应局部线性化方法,它可以自适应确定当前嵌入维数,从而克服病态数据矩阵的影响。实验数据采用汶川地震后震级大于等于4.0级的余震间隔时间数据,评价指标为平均均方根误差、平均绝对偏离和绝对误差。ALL与标准的局部线性化方法和最小二乘拟合预测方法的对比实验显示,自适应局部线性化方法对余震间隔时间预测是一种有效的方法。针对决策属性是实数值的预测问题,在K近邻算法KNN的基础上,结合多项式回归模型,提出了一种基于KNN的建模方法PR-KNN。实验数据采用汶川地震后震级大于等于4.0级的余震序列,以余震间隔时间作为条件属性,余震震级作为决策属性,评价指标为相对误差和绝对误差。PR-KNN方法与传统的KNN回归算法和距离加权KNN回归算法的对比实验显示,PR-KNN是预测余震震级的一种有潜力的方法。针对有监督流形学习算法中缺少测试样本从高维空间到低维空间的映射函数问题,在有监督局部线性嵌入算法SLLE的基础上,结合KNN和多项式回归算法的思想,提出了一种有监督流形学习算法PR-SLLE,并将其应用于余震异常检测中。实验数据采用汶川地震后的地震特征属性数据,评价指标为准确率、漏报率和误报率,与标准的SLLE算法的对比实验显示,PR-SLLE算法结合朴素贝叶斯分类器预测的效果优于SLLE,说明PR-SLLE是一种可行且有效的降维方法。基于上述研究成果,设计并实现了一个基于数据挖掘的余震趋势预测分析原型子系统,该子系统是本项目组开发的基于数据挖掘的地震趋势预报与评判的分析软件原型系统的一个重要组成部分。该子系统包括余震间隔时间预测、余震震级预测和余震异常检测三个模块,测试结果表明该软件原型系统运行正确。开发此原型系统的目的是为后续研究打下基础。

【Abstract】 Using data mining technology for earthquake prediction is an interesting field of academic research, which has important academic value and practical significance. This thesis aims to explore a new way to predict time interval between aftershocks and magnitude based on data mining technologies, and to explore that manifold learning dimensionality reduction techniques are applied to anomaly detection of aftershock. With careful study and comparison of several data mining algorithms, this thesis seeks to explore appropriate methods for aftershock time prediction, aftershock magnitude prediction and dimensionality reduction of seismic characters attributes, and to develop the software prototype system. The main contributions are included as follows:A prediction method for time interval between aftershocks based on Adaptive Local Linear ALL is proposed. ALL is an adaptive local linear method based on singular value decomposition, and it could determine current embedding dimension adaptively, which thus overcome the impact of pathological data matrix. The experimental datasets are time intervals between aftershocks with magnitude greater than or equal to 4.0 from Wenchuan earthquake. The evaluation criterions are MRMSE (Mean of Root Mean Square Errors), MMAE (Mean of Mean Absolute Errors) and AE (Absolute Error). Comparing with standard local linear and least square fitting, experimental results show that ALL is an effective prediction method for time interval between aftershocks.For the prediction problem that decision attribute is real value, a modeling method named PR-KNN (Polynomial Regression and K Nearest Neighbor) is proposed, which is based on combination of K Nearest Neighbor and Polynomial Regression. Experimental data are the sequence data of aftershocks with magnitude greater than or equal to 4.0 from Wenchuan earthquake. Time intervals between aftershocks are considered as condition attribute, and aftershock magnitude as decision attribute. The evaluation criterions are RE (Relative Error) and AE (Absolute Error). Comparing with traditional KNN and Distance-Weighted KNN regression algorithm, experimental results show that PR-KNN is a potential method of aftershock magnitude prediction.For the problem that lacking mapping function for test samples from high-dimensional space to low-dimensional space in the supervised manifold learning algorithms, a supervised manifold learning algorithm named PR-SLLE is proposed, which is based on combination of supervised locally linear embedding, K Nearest Neighbor and Polynomial Regression. And this method is applied to anomaly detection. Experimental data are seismic attribute data obtained from Wenchuan earthquake. The evaluation criterions are AR (Accuracy Rate), FR (False alarm Rate) and OR (Omission Rate). Comparing with standard SLLE algorithm, experimental results show that the predicted effect by PR-SLLE and Bayesian classifier is superior to that of SLLE, and also illustrates that PR-SLLE is a feasible and effective dimensionality reduction method.On the basis of above research, an aftershock prediction prototype sub-system was developed which was used as one important part of the software prototype system of data mining based earthquake tendency prediction and assessment. The sub-system includes three modules, the module of time interval between aftershocks prediction, the module of aftershock magnitude prediction and the module of aftershock anomaly detection. The test results show that the prototype system runs well and the aim is to lay the foundation for further study.

节点文献中: