节点文献

新农合住院费用的分析及异常值筛检方法研究

The Analysis of Hospitalization Costs and the Study of Outlier Detection Methods for New Rural Cooperative Medical Scheme in China

【作者】 牛晓辉

【导师】 王增珍;

【作者基本信息】 华中科技大学 , 流行病与卫生统计学, 2012, 博士

【摘要】 一、目的本研究旨在通过分析新农合住院费用的分布特点、住院病人构成、疾病种类构成等,全而分析新农合的住院费用,找出影响住院费用的相关因素;通过多种统计方法的比较,探索住院费用异常数据的筛检方法。并针对新农合运行中存在的问题,基于数据的分析结果,提出新农合相关的合理化建议,为控制不合理的医疗支出和制定新农合相关政策提供科学的依据,促进新农合健康持续的发展。二、研究方法1、新农合住院费用数据的分析本研究对象为湖北省某市周边八个区,2009年1月-6月新农合所有住院费用数据,共计71982例。本研究的住院费用数据主要包括以下内容:住院病例的人门学特征、住院天数、入院时间、疾病编码、住院总费用、以及住院费用的构成(住院总费用是由药品费、住院费、一般检查费、大型检查费、手术费、治疗费、诊疗费、其他费用等8个部分构成)。采用频数、构成比等进行描述性分析;利用卡方检验、Mantel-Haenszel检验、秩和比检验Kruskal-Wallis H方法进行住院费用的单因素分析;采用广义线性回归模型和多元逐步回归方法对住院费用进行多因素分析;运用boostrap方法估计不同级别医院的例均费用。利用SAS 9.0. MATLAB7.0和Excel等软件进行统计分析。2、异常费用数据的筛检(1)以各区合管办为单位进行针对所有疾病的异常费用数据的挖掘研究对象选取A区二级医院2009年1-6月的住院病例的费用数据,共计3568例。采取了三种方法对该数据集进行异常费用数据的挖掘,包括基于广义线性回归的学生化残差的异常点检测方法、基于共享型最近邻居相似度的异常数据检测方法和基于支撑向量回归的异常点检测方法。利用一致性Kappa检验,对三种方法检测结果的一致性进行检验。(2)针对单种疾病中的异常费用数据的挖掘研究对象选择各区慢性支气管炎病例的费用数据,共计880例。研究方法采取基于支撑向量数据描述的异常数据挖掘方法。对于前期的数据处理,除了直接利用费用构成数据之外,还尝试先对费用构成数据进行核主成份分析。利用Kappa检验,对两种检测结果的一致性进行检验。利用Matlab 7.0、SAS 8.0、R 2.3.0和libsvm 3.1对数据进行分析和处理。三、结果1、新农合住院病例及费用的分布特征不同区域的住院率存在明显差异,最高的4.87%,而最低的只有1.63%,前半年平均住院率2.63%,属于正常范围。从总体上来看女性病例占54.75%,男性占45.25%,女性住院病例数高于男性。住院人数最多的三种疾病为呼吸系统疾病,占17.08%;某些传染病和寄生虫病,占16.84%;循环系统疾病,占12.11%。三种例均费用最高的疾病依次为先天畸形、变形和染色体异常,例均费用11211.44元;肿瘤,7664.82元;血液及造血器官疾病和某些涉及免疫机制的疾患,7250.82元。平均住院天数为10.9天,住院天数主要集中于4-8天和8-12天,分别占总例数的40.86%和20.79%,其中0-12天出院的累计比例达到78.83%。住院费用的构成的研究表明,药品费所占比例最大,占50.72%,其次为住院费,占总费用的22.89%,而所占比例最低的为大型检查费,占1.52%。手术病例13877例,占19.28%,非手术病例58105例,占80.72%。2、住院费用的影响因素住院费用的单因素研究中性别、年龄、疾病种类、就诊时间、住院天数、是否手术和入院状态等七个因素对于住院费用均有显著性影响(P<0.0001)。通过建立针对不同级别医院住院费用的广义线性模型,以上因素具有显著的统计学意义(P<0.05),但不同级别医院的住院费用的极显著的影响因素存在差异。一级医院影响住院费用的极显著的因素有:年龄、疾病种类、是否手术与住院天数;级医院较一级医院增加了就诊时间;三级医院则较一级医院增加了入院状态。由此可以看出,疾病种类、是否手术、住院天数是影响住院费用的主要因素。3、基于boostrap方法的不同级别区内医疗机构的例均费用控制指标一般乡镇卫生院例均费用控制在1000元以下;中心卫生院(手术病例超过住院人数的30%)例均费用控制在1235-1365元之间;区内二级医院的例均费用控制在3320-3570元之间。4、区外转诊的费用分析区外转诊的男性比重要高于区内就医的男性所占比重(P<0.0001),而且区外转诊的男性住院病例的例均费用高于女性(P<0.0001)。区外转诊病例住院费用中,药费占52.07%,药费所占比例低于卫生院,但高于二级医院;而住院费用占到30.64%,明显高于区内各级医院。大额住院病例中,所占比例最高的两类疾病为循环系统疾病和肿瘤,分别占到24.90%,24.29%。对于区外转诊的住院费用,住院天数是影响费用最主要的因素,性别和入院状态也是费用的影响因素。5、异常费用数据的筛检(1)以各区合管办为单位进行针对所有疾病的异常费用数据的挖掘基于广义线性回归的学生化残差的检测方法和基于支撑向量回归的异常数据检测方法之间具有一定的一致性(Kappa=0.3414, ASE=0.0339, u=10.07, P<0.0001;而符合率为93.8%)但两种方法与共享型最近邻居(Shared Nearest Neighbor, SNN)相似度方法的一致性都较差,Kappa值分别为-0.0058和0.0082。首先,三种方法对于异常费用数据的挖掘都是行之有效的;其次,一致性检验结果说明了基于SNN相似度的异常点检测方法的机理与前两种方法有所不同,可以从另一个角度进行异常点的检测,作为前两种方法的补充。(2)针对单种疾病中的异常费用数据的挖掘两种基于支撑向量数据描述的异常费用数据挖掘都是可行的。根据相关文献的分析,核主成份分析有利于提高算法的性能。两种方法之间的Kappa=0.4135, ASE=0.1287, u=3.213, P=0.0013;而且符合率为86.46%。故认为两种异常费用数据挖掘方法具有中度的一致性。四、结论本研究表明建立新农合住院费用数据的信息化处理和分析系统对于新农合的健康发展十分必要。通过数据分析,药品费在住院总费用中所占比例最大,控制药费虚高是控制医疗费用上涨的关键,住院天数是影响住院费用最主要的因素,缩短住院天数是降低住院费用的有效途径。不同区域疾病的构成存在差异,各区应有针对性的加强疾病的预防工作。肿瘤,先天畸形和循环系统疾病等重大疾病例均费用较高、所占比重较大,应建立针对此类重大疾病的保障体系。利用boostrap方法建立了不同级别医院例均费用控制指标体系,可以从宏观上控制住院费用的上涨。通过建立异常费用数据的挖掘系统使得从微观上针对单个住院病例的费用监控成为可能,同时采取多种检测方法,能从不同侧而筛选出更多的异常点,这都有利于减少医院费用的不合理支出,规范新农合的操作流程,提高对新农合的监管力度。五、创新点1.本文分别利用了基于广义线性回归的学生化残差、基于共享型最近邻居相似度、基于支撑向量回归和基于支撑向量数据描述的异常点检测方法,针对新农合的住院费用数据进行了异常数据挖掘。从海量数据中迅速筛选出待查的异常数据,大大减少了人工监控的工作量,使从微观层而进行住院费用的监控成为可能。通过对各种异常点挖掘方法结果的一致性检验发现,具有相同机理的挖掘方法,其检测结果具有一定的一致性,但不同机理的挖掘方法,发现的异常点存在差异。将各种异常点挖掘方法进行有机结合可以提高异常数据筛检的敏感性。2.将bootstrap方法用于例均费用的区间估计,得到了区内各级医院例均费用的合理范围,为例均费用控制的实施提供了科学的依据。

【Abstract】 ObjectivesThe study was aimed to analyze the distribution of the NCMS hospitalization costs, inpatients’composition, the composition of the types of diseases and other aspects; to find the factors affecting the hospitalization costs based on the comprehensive analysis; to explore the methods for the outlier detection of hospitalization costs with a variety of statistical methods comparison. According to the problems in the operation of NCMS, we intended to put forward rationalization proposals for NCMS, to control unreasonable medical expenses, to provide the scientific basis for the formulation of the relevant policies and to promote the sustainable development of NCMS based on the analysis.Methods1. Analysis of the NCMS hospitalization costsThe subject of this study is the NCMS hospitalization cost data in the eight districts (A-H districts) surrounding some city in Hubei province from January to June 2009, a total of 71982 cases. The hospitalization cost data includes the following contents: inpatient demographic traits, days of hospitalization, admission time, coding of diseases, hospitalization cost and its composition (The hospitalization cost is composed of eight components, such as, medicine expense, hospital fee, general inspection fee, large-scale inspection fee, surgical fee, treatment fee, medical fee and other cost). Frequency and proposition were used to describe the data distribution. Chi-square test, Mantel-Haenszel test and Kruskal-Wallis H rank sum test were utilized in the univariate analysis of hospitalization costs. Generalized linear regression model and stepwise multiple linear regressions were constructed for multivariate analysis of hospitalization costs. The boostrap method was used to estimate the average hospitalization cost in each hospital level. The softwares used in the study included SAS 9.0, MATLAB 7.0 and Excel.2. Outlier detection of hospitalization costs(1) Taking the district NCMS management office as a unit, the outlier detection for all diseases.The subject of this study is the hospitalization costs in the A district second-level hospitals, a total of 3568 cases. We utilized three outlier detection methods based on the studentized residuals of the generalized linear regression, the shared nearest neighbour (SNN) similarity and support vector regression. The Kappa test was used to test the consistency of the results of these three methods.(2) Outlier detection for the single disease hospitalization costsThe subjects of study are the chronic bronchitis cases in the eight districts, a total of 880 cases. We utilized the support vector data description to detect the outliers for the hospitalization costs. The first method directly used the hospitalization costs. The second one utilized the kernel principal component analysis for data processing. The Kappa test was exploited to test the consistency of the results.The softwares used in the study included SAS 9.0, MATLAB 7.0, R 2.3.0 and LibSVM 3.1.Results1. The distribution characteristics of the NCMS inpatients and hospitalization costsThere are obvious differences among the hospitalization rates in the different districts, the highest 4.87 percent, the lowest, only 1.63%, the first half year average hospitalization rate was 2.63%, within the normal range. For all inpatients, female patients accounted for 54.57%, males accounted for 45.25%, and female inpatients were more than the male. The three most common diseases were respiratory diseases, accounting for 17.08%; certain infectious and parasitic diseases accounted for 16.84%; diseases of the circulatory system, accounting for 12.11%. The three most costly diseases were congenital malformations, deformations and chromosomal abnormalities, for example, the average cost of 11211.44 yuan; cancer,7664.82 yuan; blood and blood-forming organs and certain disorders involving the immune mechanism,7250.82 yuan. The average days of hospitalization was 10.9, the days of hospitalization were mainly concentrated in 4-8 days and 8-12 days, accounted for 40.86% and 20.79% respectively. The cumulative proportion within 12 days reached 78.83%. Among the hospitalization cost, the medicine expense accounted for the largest proportion of 50.72%, followed by hospital fee, accounting for 22.89%, while large-scale inspection fees accounted for the lowest proportion of 1.52%. There were 13877 surgery cases and 58105 non-surgery cases. Non-surgery cases accounted for 80.72%.2. Related factors of hospitalization costsThe result of univariate analysis suggested that the seven factors of gender, age, type of diseases, admission time, the number of days in hospital, whether surgery, and hospital admission status had significant effects (P<0.0001). According to the results of the generalized linear regression models for all levels of hospitalization costs, all these seven factors were significant (P<0.05). But there were differences among the very significant factors for all levels of hospitalization costs. The factors for the first level hospitals were age, type of diseases, whether surgery and days of hospitalization; compare with the first level hospitals, the factors for the second level hospitals increased admission time; the factors for the third level hospitals augmented admission status. It can be seen that age, types of diseases, whether surgery, days of hospitalization were the main factors affecting the hospitalization costs. 3. The average hospitalization cost regulation indices for all levels of medical institutions.The average hospitalization cost in general township hospitals should be controlled in less than 1000 yuan; For center township hospitals (surgical cases more than 30% of all hospitalization cases), average cost was controlled in the range of 1235-1365 yuan; For secondary hospitals in districts, average cost was controlled between 3320 and 3570 yuan.4. Analysis of hospitalization costs for extra regionally referral casesMale portion of extra region was higher than the one of intra region (P<0.0001). But the hospitalization cost of male case was significantly higher than the cost of female (P<0.0001). Among the compositons of hospitalization costs for extra regionally referral cases, the medicine expense accounted for 52.07%, it was lower than township hospitals, but it was higher than secondary hospitals; the hospitalization fee accounted for 30.64%, it was obviously higher than other level hospitals. Among high cost hospitalization cases, the two most frequent types of diseases were diseases of the circulatory system and cancers, they accounted for 24.90% and 24.29% respectively. For the hospitalization costs of extra regionally referral cases, hospitalization days was the major factor, gender and admission status could affect it.5. Results of outlier detection(1) Taking the district NCMS management office as a unit, the outlier detection for all diseases.The method based on studentized residuals of generalized linear regression had a certain consistency with the outlier detection method based on support vector regression (Kappa=0.3414, of ASE=0.0339, u=10.07, P<0.0001; and the consistent rate 93.8%), but the consistency of these two methods with SNN similarity method was poor, Kappa values were -0.0058 and 0.0082, respectively. At first, the three methods were effective methods for the outlier detection for hospitalization costs. Secondly, the results of Kappa identity tests demonstrated that there was difference between the mechanism of the outlier detection method based on SNN similarity and the first two methods, the SNN method could be used as the complement of the first two methods.(2) Outlier detection for the single disease hospitalization costsTwo outlier methods based on support vector data description were feasible. According to the relevant researches, kernel principal component analysis could improve the performance of the algorithm. Between the two above-mentioned methods, Kappa=0.4135, ASE=0.1287, u=3.213, P= 0.0013 and the consistent rate of 86.46%. It was proved that the two outlier detection methods had moderate consistency.ConclusionOur study shows that it is necessary to establish the information processing and analysis system for the healthy development of the NCMS. According to the results, drugs charges account for the largest proportion in hospital costs, it is the key point to control medicine expense for controlling increment of health care costs; hospitalization days is the most important factor for hospitalization costs, and shortening hospitalization days is an effective way to reduce hospitalization costs. There were differences among the disease compositions of all the districts, the prevention work should be streghtened according to the disease compostions. The average costs of the major diseases of cancer, congenital malformations and circulatory system diseases were high and their proportions of the hosptitalization costs were large, so the security system should be established to protect against these major diseases. The average hospitalization cost regulation indices for all levels of medical institutions with bootstrap method to control the hospitalization cost increment from marcolevel. The establishment of the outlier detection system made it possible to monitor the hospitalization costs for individual inpatients from microcosmic level. Moreover, taking a variety of outlier detection methods, it could screen out more outliers from different sides. All of them could reduce the unreasonable expenditure, regulate the operation and improve the supervision of the NCMS.Innovation1. In this study, there were four methods applied to detect the outliers in the NCMS hospitalization costs, including the outlier detection algorithms based on the studentized residuals of the generalized linear regression, the SNN similarity, support vector regression and support vector data description. From the mass data, these methods could rapidly detect the candidate outliers and greatly reduce the artificial monitoring workload. They made it possible to monitor the hospitalization costs from microcosmic level.The results of the consistency tests between each pair of outlier detection methods demonstrated that the methods with the same mechanisms had some consistency; Different methods with different mechanisms detected different outliers. With the organic combination of these methods, it improved the sensity of the outlier detection.2. Boostrap method was applied to conduct the interval estimation of the average hospitalization costs. We calculated the rational ranges for the average hospitalization costs of all levels of intra regional hospitals, which provided the scientific basis for the average hospitalization cost regulation.

节点文献中: