节点文献

数据挖掘在甲状腺功能减退症分类中的应用与研究

【作者】 龙涛

【导师】 刘晓霞;

【作者基本信息】 西北大学 , 计算机应用技术, 2010, 硕士

【摘要】 医疗信息化的发展,诊断数据量的激增,需要结合数据挖掘技术进行深入分析,提取有潜在意义的知识。当前基于甲状腺功能减退症(简称甲减)的分类挖掘研究较少,只是纯粹地从医疗分析的角度,统计学原理的角度,单一数据挖掘模型的角度出发,未能将统计方法与数据挖掘技术有机结合,也未能将数据挖掘的多种模型进行综合性地比较分析,以此决定甲减分类模型的优劣。本文针对甲减分类在以上研究领域的不足,挖掘甲减的不同测量数据,从统计原理的方法和实际应用两方面对多种分类模型进行了较为深入的研究。从变量要求、数据鲁棒性、时间运行、结果解释、分类准确率和性能伸缩性等多因素,综合研究了三类模型的性能优劣,对临床甲减分类诊断具有一定的参考作用和指导意义。本文所做的主要工作有:1)阐述了数据挖掘技术的相关概念和主要应用领域,较为深入地分析了数据挖掘过程CRISP-DM中的各个实施阶段,及其产生的相应结果。结合研究与应用,对甲减分类进行较为透彻的业务理解。同时在数据理解过程中,进行了甲减属性的深入探索,使训练集和测试集的选择具备一般性。在数据准备方面,针对相关变量字段存在的缺失值,离群值,无用属性或冗余属性等情况,进行了较为全面的数据分析和数据预处理工作。2)基于数据模型的统计学原理,本文着重探讨了统计方法与数据挖掘的异同之处和相互关系,主要研究了判别式分析算法,Logistic回归算法和CHAID决策树算法的数学原理及应用。通过建立相应的数据挖掘模型,得出了甲减分类的主要判别指标。以统计原理的方法与多种数据挖掘模型相结合的方式,进行了较为全面的数据统计分析和挖掘算法研究,找到较优的挖掘模型,并进一步将三种模型从不同测量因素上进行综合分析与比较。3)在Clementine12.0开发环境下,采用了CRISP-DM数据挖掘标准过程进行系统性的甲减挖掘研究与开发,从总体上和细节上有机地把握挖掘实施过程的六个阶段,以一种结构化的、体系化的、标准化的、可视化的流程进行数据挖掘工作。利用Script脚本语言开发数据挖掘的整个过程,从而改善了那些手动的、重复的、耗时的工作任务,有利于在操作界面上实现过程的自动化和处理对象的批量化。

【Abstract】 With the development of medical information and the increment of diagnostic data, it is necessary to extract the potential and significant knowledge using the deep analysis of data mining technology.The current research based on hypothyroidism classification mining is not good enough to determine the advantages and disadvantages of classification models,because it comes from the perspective of medical analysis, statistical theory, or the single data mining model, not combing with statistical method and data mining, and failing to compare and analyze the variety of data mining models comprehensively.In this paper, researches the different datas of hypothyrodisim from the statistical methods and practical application, and compares with different classification models to make up the current deficiency. Makes a comprehensive analysis of the performance of three models from the variable demands, data robustness, time cosuming, result interpretation, classification accuracy, performance scalability and many other factors, also provids a referencing and guiding significance to the clinical diagnosis of hypothyroidism.This paper contains the following aspects:1) Introduces the concepts of data mining technology and major applications, analyzes the CRISP-DM data mining process in the various stages of implementation, and the corresponding results deeply. Takes a more deep business understanding of hypothyroidism classification combing with research and application. At the same time, conducts in-depth exploration of hypothyroidism properties in the data understand process, so that making the training set and testing set more general and representative. Analyzes and pre-processes the fields relevant with missing values, outliers, useless or redundant attributes in the data preparation process.2) Researches the main method, mathematical principle and application of the discriminant analysis, Logistic regression and CHAID decision tree, explores the similarities, differences and mutual relations of the statistical methods and data mining, based on the statistical theory and data models. Determines the main indicators of hypothyroidism classification through the establishment of appropriate data mining model. Makes a more comprehensive statistical analysis of data mining algorithms and research of the mining models to find optimum with a variety of statistical methods and principles of data mining model combination, carries out a further measurement and comprehensive analysis of three models from different factors.3) Uses the CRISP-DM data mining standard process for systematic hypothyroidism research and development to grasp the six stages of the implementation process from the whole and detail views in Clementine12.0 development environment. Takes the data mining work in a structured, systematic, standard, and visual process. Uses the Script language to develop the whole process of data mining to improve those manual, repetitive, time consuming tasks, and also help to achieve the automatic process and batching process in the user interface.

  • 【网络出版投稿人】 西北大学
  • 【网络出版年期】2010年 09期
  • 【分类号】R581.2;TP311.13
  • 【被引频次】1
  • 【下载频次】176
  • 攻读期成果
节点文献中: