

Research and Implementation of Birth Defect Early Warning System Based on Association Rules

【作者】 赵佳璐

【导师】 杨俊;

【作者基本信息】 北京邮电大学 , 计算机科学与技术, 2013, 硕士

【摘要】 我国出生缺陷发生率逐年增加,对人类可持续发展和社会经济发展造成的威胁越来越大,数据挖掘领域的关联规则挖掘可以找出与出生缺陷相关的致病因素,从而进行出生缺陷预防。但传统的关联规则挖掘算法存在耗时长以及规则冗余的问题,并且无法直接应用于分布式的数值型医疗数据的挖掘。针对以上两大挑战,本文对医疗数据的关联规则挖掘方法做了探索性的研究。本文选题自“十一五”国家科技支撑计划课题“安全可信的电信级生殖健康运营支撑体系关键技术研究”,主要解决了如何从采集到的一百六十多万份家庭档案中挖掘出跟出生缺陷相关的因素,从而实现预警目标的问题。论文的工作主要体现在以下几个方面:1.研究了关联规则挖掘的理论知识,包括基本概念和分类等,对最有影响的算法即Apriori和FP-gorwth算法进行重点研究并进行比较分析。2.提出了一种将用户兴趣约束引入关联规则挖掘的新算法ACARMT,解决了现有算法耗时长和规则冗余的问题。3.设计了一个针对医疗数据的预处理模型,该模型实现分布式数据集成,定义了数据转换规则,将数量庞大的源数据转换成适用于直接挖掘的中间数据,解决了医疗数据无法直接进行关联规则挖掘的问题。4.设计并实现了一个出生缺陷预警系统,达到出生缺陷致病因素的挖掘以及对可疑档案实时预警的目标。论文的主要贡献是,提出了一种基于约束的关联规则挖掘新算法ACARMT,提高了挖掘效率和挖掘结果的针对性,设计了一个针对医疗数据挖掘的数据预处理模型,使海量医学数据可以使用新算法进行关联规则挖掘。最后,在出生缺陷预警系统的设计与实现中应用ACARMT算法和数据预处理模型,通过对“国家免费孕前优生健康检查信息服务管理平台”采集到的一百多万份档案进行关联规则挖掘,验证了算法与模型的有效性,最终实现出生缺陷预警。

【Abstract】 The incidence of birth defect in China has increased year by year, which threatens human sustainable development and social economic development. Association rules mining, one of data mining methods can find the pathogenic factors by mining the medical data, and then prevent the birth defect. But traditional algorithms of association rules mining have disadvantages of time-consuming and generate redundant rules, which cannot be used to mine the distributed and numeric medical data directly. In view of above two challenges, this paper does the exploratory research about the association rules mining methods of medical data. This paper topic from "Eleventh Five-Year" National Science and Technology Support Project "safe and reliable reproductive health services, telecom operation support system for key technologies", solve the problem of how to mine the factors related with birth defect from1.6million family archives collected by the project, and then achieve the goal of early warning.The work of this paper reflected in the following aspects:1. Research the knowledge of association rules mining, including basic concepts and types. Then focused research and compare the classical algorithms Apriori and FP-growth.2. Propose a new algorithm (ACARMT) which use the constraints based on the interests of users after research exist algorithm.3. In view of the characteristics of medical data, design a data preprocessing model. This model which implements the integration of distributed data and define the data transfer rules to transfer the source data to the Intermediate data which can use the algorithm to mine association rules. This solves the problem of cannot mine association rules in medical data.4. Based on the new algorithm and new model, design and implement a birth defect early warning system to mine the factors lead to birth defect and give early warning to suspicious archives.The main contribution of the paper is to propose a constrained association rules mining algorithm ACARMT which improve the mining efficiency and results’ pertinence, and to design a data preprocessing model which makes the mass medical data can use the new algorithm to mine the association rules. Finally, Application of the ACARMT and data preprocessing model in designing and implement of birth defect early warning system to verify the effectiveness of the algorithm and model and realize the early warning by mining the association rules in1.6million family archives collected by platform of national pre-pregnancy information management.
