

Research of Data Mining at New Rural Cooperative Medical System

【作者】 韩颖

【导师】 郑建中;

【作者基本信息】 山西医科大学 , 流行病与卫生统计学, 2009, 博士

【摘要】 研究目的总目标:为提高新农合管理者利用历史数据进行定量管理决策的能力,研究新农合数据挖掘流程、挖掘主题及其实践价值。具体目标:①规范新农合数据挖掘流程;②界定新农合管理相关业务问题;③设计新农合数据仓库模型;④利用某县新农合补偿主题数据库研究新农合风险要素,根据挖掘结果提出用于未来制度完善的相关对策。研究方法首先,利用文献研究,了解新农合发展概况和数据挖掘的一般模式,提出新农合数据挖掘基本框架;其次,采用定性研究方法分析新农合运行机制,运用政策科学中公共选择理论、利益相关集团分析以及逻辑归纳方法,界定新农合管理的主要业务问题;第三,根据数据仓库概念模型、逻辑模型和物理模型设计原理,从新农合管理业务信息需求出发,研究新农合数据仓库概念模型与逻辑模型的设计模式;最后,利用SPSS Clementine12.0,设计数据挖掘的数据流模型。采用关联规则GRI算法,研究新农合补偿主题数据库中的强关联规则;利用聚类,识别新农合报销数据库中的异常医疗费支付案。在此基础上利用多种分类预测模型,对参合农民的住院服务利用和异常医疗费用支出的数据模式训练最优分类模型。结果1. 2004-08年全国新农合试点县数量年平均增长72.3%,参合人口年均增长78.6%,参合率由75.2%上升到91.5%,人均筹资额增长3-4倍。当前,全国县级经办机构内平均每人管理参合农民6.5万人。2006年后,卫生部出台4部新农合信息管理政策文件,安排1.8亿信息化建设专项资金。期望2010年前建成以数据综合管理、数据分析和数据挖掘为重点的管理信息系统。2.根据CRISP-DM方法论,新农合数据挖掘应执行四个框架体系:首要环节是理解业务问题,其次数据准备,第三是数据挖掘,第四结果评价与执行。3.共归纳7类新农合管理相关业务问题:筹资、疾病风险、费用补偿、医疗机构服务、农民健康、参合农民和定点医疗机构的受益归属分析,对制度绩效影响的重要程度评分依次为8,8,8,7,5,3,3。4.新农合数据仓库模型:概念模型以农民参合、就诊和费用补偿为系统边界,设计个人、(医疗)机构与补偿主题数据库,确定个人-补偿(1-1)、补偿-机构(1-m)个人-机构(m-m)的E-R关系模型。逻辑模型设计:门诊统筹/家庭帐户报销数据库的数据采用双重粒度,大病统筹补偿采用患者为单位的单一数据粒度,建立以补偿数据为事实表,个人、机构、疾患和日期为维度表的星型数据仓库逻辑模型。5.疾病风险关联规则:“妊娠、分娩和产褥期并发症”在各种报销案例中的构成比最高(26.56%),随后依次为:消化系疾病(18.37%)、损伤与中毒(10.98%)、循环系统疾病(10.42%)和恶性肿瘤(6.01%);“骨折”=>“男性”的规则支持度6.32%,置信度73.76%。费用级别关联规则显示:“妊娠分娩类病例”AND“县级住院”=>“低费用级”;“消化系统疾病”AND“县或乡级住院”=>“中费用级”,以及(“骨折”AND“县级住院”)或“地市级以上住院”=>“高费用级”等。6.疾病风险分布规律:报销病例中县级医疗机构住院比例最高(47.53%),其次为乡镇级(28.88%),县乡两级住院患者的首要疾患是“消化系统疾病”,其次是“骨折”病例,两种疾患均为男性高于女性。地市级以上报销人数最多的病种是恶性肿瘤。另外,随着病人住院级别提高、患者平均住院天数、住院费、合作医疗补偿费均显著增加(P<0.05)。7.异常数据对象的检测:按照医疗机构级别,从患者住院天数、医疗费用和疾病类型三个维度检测异常数据对象。发现:某女性患者因双侧腹股沟淋巴结肿在乡卫生院住院270天;某75岁女性患者因慢性支气管炎,县医院住院175天……。县医院异常数据中70%为男性骨折/意外伤害患者。8.住院服务利用的分类预测:以是否住院为输出,选择特征变量,建立CHAID树结构模型,显示因病致贫组人群住院率22.97%,如果是“因病致贫”AND“患慢性病”,那么住院率高达58.82%,其次是“贫困”AND“患慢性病”人群的住院率高达14.29%。9.住院服务利用异常数据模式的预测:首先利用相关属性值完成10%的异常数据模式识别,将识别结果作为预测分类的输出值,根据选择的特征变量,筛选C5.1和神经网络两种预测效率最高的模型分别进行预测。神经网络对异常数据模式识别的准确率在测试集、训练集和验证集上的表现最好而且相当稳定,利用C5.1树模型导出的规则解读神经网络模型的黑箱问题:65岁以上、乡级医疗机构住院,11岁以下市级机构住院,以及民营、部队等其他医院消费等情形下易发生异常医疗消费模式。而11岁以上58岁以下者无论哪级医疗机构住院均不易发生异常医疗消费。结论1.构建新农合数据仓库和数据挖掘系统,提高新农合管理人员的政策水平、管理能力,已成为农村医疗保障制度科学发展一个绕不开的话题。2.新农合数据仓库概念模型与逻辑模型应紧密围绕补偿主题进行设计,有利于提高数据仓库开发效率。3.住院分娩在合作医疗中申请报销比例最高,但消化系统疾病和骨折/意外伤害对当地农民(尤其是男性)造成较重的疾病负担。4.老年人、慢性病,低收入、儿童疑难杂症是医疗消费异常的主要因素。建议提高农民健康水平,需做好孕产妇保健工作;控制农民疾病风险应重点预防意外伤害和消化系统疾病;为降低新农合基金风险,应做好慢性病人和老年人疾病管理,防止不合理医疗费用支出。主要创新点(1)利用政策科学理论,通过规范化研究,界定新农合数据挖掘主题。提出新农合数据仓库的概念模型和逻辑模型设计方案。(2)利用SPSS Clementine挖掘软件,设计新农合数据挖掘的数据流模型。(3)利用数据挖掘结果提出适合当地特征的新农合管理措施。

【Abstract】 Rsearch ObjectChief objective: In order to increase New Rural Cooperative Medical System(NRCMS) governor’s quantity decision-making ability by history data, NRCMS Data-Mining(DM) flow, subject matter for DM and it’s practice value were studied. Particular objective:①making NRCMS-DM flow standardization;②understanding the operation question for NRCMS;③designing NRCMS Data Warehouse models;④carrying out NRCMS DM by some NRCMS county, and some advice could be put forward for NRCMS.Study MethodsFirstly, finding out NRCMS’s general situation and DM’s common model by literature research, and making the frame for NRCMS DM.Secondly, to affirm NRCMS’main operation question by understanding how working for NRCMS, using public choice.Thirdly, according to DW’s conceptual model, logistic model and physical model design flow, making NRCMS-DW’s conceptual model and logistic model, from the manager information asking.Lastly ,design NRCMS’s DM data stream model on SPSS Clemention 12.0. finding out strong association rules in NRCMS compensation subject data-base by GRI association arithmetic. To find out abnormity medical consumed data model for countryman who joining NRCMS. Training a model to forecast who may be hospitalization and which one may be an abnormity in hospitalization.Outcomes1. The quantity of NRCMS county went up percent 72.3 annual from 2004 to 2008 year, coverage rise from percent 75.2 to percent 91.5, the NRCMS fund for everybody increased 3-4 times. 65 thousands countryman for a manager to manage. After 2006 year, four policy file about NRCMS’s manage information were made out on health ministry, 1.8 hundred billion fund were arranged. NRCMS’s information system building on the data management, analysis and data mining could be realized by 2010 year.2. According to CRISP-DM, data mining on NRCMS should carry out on four step: understanding the operation question, preparing data, data mining, appraising and implementing.3. Eight question were correlation with NRCMS: financing, disease risk, compensate for medical consumption, medical service, countryman’s health, benefit adscription for countryman or hospital. The importance score of all of which in turn are 8,8,8,7,5,3,3.4. NRCMS DW model: conceptual model was build on individual, hospital and compensate three subject matter, and the relation model of which are that individual-compensate (1-1), compensate-hospital (1-m), individual-hospital (m-m). DW logical data model was designed as star model. The data cube should be both for family account or outpatient and single for hospitalization database.5. Disease risk association rule:“pregnancy, childbirthd and co-childbed disease”have the highest percent on compensation facts(26.56%),after that in turn were digest disease(18.37%) ,“trauma and poisoning”(10.98%), circle system disease(10.42%) and cancer(6.01%). A rule of“fracture”=>”male”was found, the support is 6.32% and the confidence is 73.76% for which. Another rule about medical consumption were found as that:“pregnancy, childbirth etc.”and“in county hospital”=>”low consumption”,“digest diseases”and“in county or township hospital”=>”middle low consumption”, (“fracture”and“in county hospital”)or“in city hospital”=>”high consumption”6. The distribution law of countryman’s disease risk: the percent of people in county hospital is the highest that 47.53%, then is that in township hospital(28.88%). The main disease are digest disease and fracture, and male sufferer were more than female. Cancer sufferer are the most in city hospital. Average hospital days, fee for medicine and compensation by NRCMS increase as the level of hospital asked for service(P<0.05).7. To inspect abnormity data by hospital institute, hospital days , disease and fee for service. Some data were taken out as: some female sufferer stay 270 days in township hospital for both side groin lymph turgescence , another 75 age female sufferer stay 175 days in county hospital for chronic bronchitis.……. the most sufferer have fracture disease in county abnormity data.8. Forecasting for hospitalization: make“yes”or“no”hospital as output, select the feature attribute, CHAID tree model show that: hospital rate is 22.97% in poor group lead by disease. People who are poor as disease and have chronic disease, then the rate be in hospital will be 58.28%. poor and have chronic disease will be in hospital at the rate of 14.29%.9. Forecasting for abnormity data of hospitalization: take out 10% abnormity data by clustering, then make“T”or”F”as the output label, after selecting character attributes, and select two model those is C5.1 and Network which. forecast efficiency are higher. Some question hiding in black box of network could be found out by rules coming from C5.1 tree model: person who above 65age and be in township hospital or below 11age and be in city hospital would happen abnormity consumption. Above city level hospital also would easily be an abnormity consumption. But person who above 11age and below 58age would not happen abnormity consumption no matter what which lever hospital.Conclusion1. To enhance the decision-making ability for NRCMS managers by building NRCMS DW and DM system, that has been an important topic.2. The compensation topic should be the first subject matter for building NRCMS DW model.3. The percent who gain compensation for childbirth at hospital is the highest. While the most disease economic burden come from digest disease and fracture/suddenness hurt for people (especially male)at the study county.4. Aging, chronic disease, low income and children’s difficulty disease could be the main factors for abnormity data object happen .adviceTo promote countryman’s health by making more health care for pregnancy and lying-in women working; To control countryman’s disease risk, suddenness hurt and digest disease should be take more attention to prevent. To decreasing NRCMS fund risk, disease management should provide for chronic and/or aging patient, avoid inconsequence medical afford.the main Innovations1. Using standardization study, to put forward the subject matter for NRCMS DM by policy theory. NRCMS DW’s conceptual data model and logical data model were designed from NCMS operation view.2. Designing data stream model at SPSS Clemention 12.0 for NRCMS DM.3. Some NRCMS DM questions were studied, and more interesting conclusion for practice were enduced.

  • 【分类号】TP311.13
  • 【被引频次】8
  • 【下载频次】1791

