节点文献

利用人工神经网络模型预测原发性高血压的研究

Prediction Using Artificial Neural Network Model of Essential Hypertension

【作者】 杨洋

【导师】 时景璞;

【作者基本信息】 中国医科大学 , 流行病与卫生统计学, 2010, 硕士

【摘要】 前言原发性高血压(essential hypertension, EH)是常见的心血管疾病之一,近年来,随着我国经济发展,生活节奏明显增快,产生了一系列的不健康的生活方式,导致我国心脑血管疾病死亡率、发病率和患病率持续上升。高血压既是一种独立的疾病,又是引发心脑血管疾病的重要危险因素,发生高血压危象及高血压脑病等严重并发症时可危及生命。因此,高血压的防治工作不容忽视。国内外研究表明,高血压是一种多因子疾病,致病因素多,各因素间关系复杂,是高血压研究的重要特点。目前疾病预测的方法以传统的Logistic回归(Logistic regression, LR)为主,Logistic回归模型要求变量满足独立性且不能处理变量之间的共线性问题。因此,应用Logistic回归模型进行预测高血压这样的复杂疾病具有一定的局限性。人工神经网络(Artificial Neural Networks,ANNs),简称为神经网络(NNs),是模拟生物神经网络进行信息处理的一种数学模型。神经网络具有强大的解决共线性效应和变量间的交互作用的能力,善于处理非线性的、模糊的、含有噪声的数据情况。目前,人工神经网络在医学上的应用还远没有传统的Logistic回归那么广泛。本研究所选择的现场是辽宁省彰武县农村,经调查该人群高血压标化患病率高达35%,全国罕见。本研究利用这些调查数据建立BP人工神经网络(backpropagation ANNs)预测模型,并与Logistic回归模型进行比较,利用受试者工作特征曲线(receiver operator characteristic curve, ROC曲线)评价人工神经网络模型的预测性能。以探讨和评价ANNs用于疾病预测的效果和特点,为高血压等复杂疾病的预测探索新的方法,同时对农村高血压病的防治也有一定的参考价值。研究对象与方法一、研究对象的选择本研究利用之前在辽宁省彰武县农村进行的EH流行病学调查的资料进行统计、预测分析。该调查采用整群多级随机抽样的方法总计调查5208人,最后30岁以上常住人群共计4126名调查对象被纳入本次研究,其中女1942人,男2184人。二、调查内容和检测指标在现场以问询和测量的方式填写调查表,调查内容主要包括:一般特征,吸烟史,饮酒史等;测量血压、体重、身高等现场每人采血5ml(隔夜空腹),经离心后分离血清,分装冰冻保存用于血清指标检测。三、诊断标准和测量方法:高血压诊断是根据1999年WHO/ISH公布的高血压诊断标准:收缩压≥140mmHg和/或舒张压≥90 mmHg或既往确诊的原发性高血压者。血压测量及其他血清生化检测指标由专业医护人员在标准条件下进行测量。胆固醇、甘油三酯、HDL、LDL、血清钠、血清钾、血清铁、血清钙等指标的水平采用日本第一化学提供的7150型全自动生化分析仪,用比色法进行分析。血糖水平采用美国强生公司生产的稳捷基础型血糖分析仪,用滴血法进行分析。四、神经网络模型的建立ANNs模型采用含有一个隐含层的三层BP神经网络模型。模型输入层的神经元为单因素分析中P<0.05的与高血压相关的因素,输出层有1个神经元(即按照诊断标准判断研究对象是否患高血压),隐含层的神经元个数通过实验根据均方误差择优确定。隐含层的激活函数为tansig,输出层的激活函数为logsig。本研究将4126例资料按照性别、年龄进行均衡后按3:1的比例随机分为训练总集(3096例)和测试集(1030例)两部分,分别用于模型的建立和测试。为了防止ANNs过渡拟合,在ANNs模型的训练过程中,又将训练总集(3096例)按3:1的比例随机分为训练集(2334例)和检验集(762例),利用检验集时时地检查训练效果。五、资料统计分析方法用Matlab7.1软件编程建立ANNs预测模型。用spss13.0统计软件建立二分类非条件Logistic回归的高血压预测模型和绘制模型预测识别的ROC曲线。预测概率的判别标准为0.5,即p≥0.5时预测结果为患高血压,否则为不患高血压。统计学显著性水平规定为α=0.05。结果一、非条件单因素Logistic回归模型进行高血压预测对调查数据进行高血压的单因素分析,筛选出p<0.05的因素作为预测模型的输入变量,共22个因素与高血压有关。二、非条件多因素Logistic回归模型进行高血压预测(一)建立非条件多因素Logistic回归模型对训练总集的3096例样本进行非条件多因素Logistic回归分析,将单因素筛选出的指标作为自变量(身高、体重已转化为BMI故未进入模型),以研究对象是否患高血压为因变量建立多因素Logistic回归模型。模型采用最大似然估计前进法进行逐步回归分析,入选变量的标准是p<0.05,剔除变量的标准是p>0.10。经逐步回归后,共有9个因素进入模型,模型改善情况检验(x2=4.335)和整个模型检验(x2=1439.457)。整个训练总集的分类一致率为78.42%,特异度为80.45%,灵敏度为76.62%,(二)利用非条件多因素Logistic回归模型预测用上述Logistic回归模型预测测试集(1030例)研究对象是否患高血压。经模型预测,测试集一致率为77.48%,特异度为80%,灵敏度为74.85%。三、BP神经网络模型进行预测(一)建立BP神经网络模型建立一个三层的BPANNs模型,以单因素筛选出的全部22个因素作为输入变量,其隐含层设为22个神经元,输出层1个神经元(即是否患EH)。目标误差取0.01,学习速率取0.1,最大训练周期2000。经过17步训练,此时训练中均方误差MSE为0.126262,梯度Gradient为137.276/le-010,网络的训练由于检验集均方误差达到极小值而结束。测试训练好的BPANNs模型的拟合效果,训练集的分类一致率为81.06%,检验集的分类一致率为77.95%,整个训练总集的分类一致率为80.30%,特异度为84.48%,灵敏度为76.16%。(二)利用BP神经网络模型进行预测用上述BPANNs模型预测测试集(1030例)研究对象是否患高血压,测试结果见表5。其测试集分类一致率为78.83%,特异度为81.57%,灵敏度为76.42%。四、BP神经网络模型与Logistic回归模型高血压预测比较(一)预测结果的比较神经网络模型的分类一致率、灵敏度、特异度均高于Logistic回归模型。(二)ROC曲线面积比较利用SSPS13.0绘出多因素Logistic回归模型和BPANNs模型的ROC曲线,多因素Logistic回归模型的ROC曲线下面积为0.782,95%可信区间为[0.768,0.797],BPANNs模型的ROC曲线下面积为0.800,95%可信区间为[0.786,0.814]。讨论高血压的病因复杂,影响高血压患病的危险因素是多方面的,一些危险因素之间可能存在交互作用、多重共线性,这些复杂的关系影响预测模型的拟合,严重干扰了高血压的预测和病因研究工作。因此本研究利用辽宁省彰武县农村人群的调查资料建立高血压的神经网络预测模型,并与传统方法的Logistic回归模型相比较,以探讨神经网络模型预测高血压发病的能力。神经网络模型建立过程中各函数、参数的设置并没有统一的标准,需要针对具体问题具体分析。本研究建立的模型是以误差反向后传学习算法而得名的BP神经网络,它是医学领域应用最广泛的一种神经网络,集中体现了神经网络中最精华的部分。因为对于任何在闭区间内的一个连续函数都可以用单隐含层的BP神经网络逼近,所以本研究采用了含有一个隐含层的三层BP神经网络。考虑到输入层神经元个数过多对样本量的要求较高,所以只选择了与高血压密切相关的因素作为输入变量,即单因素分析中p<0.05的因素。对于输入变量中的多分类变量(如民族)采取设置哑变量的处理方法,以方便模型更好的利用数据信息。隐含层的神经元个数和训练函数是根据试验来确定的,试验显示,相对于其他取值,神经元个数为22训练函数为trainlm时均方误差既小又稳定,网络的初始权值取(0~1)区间的随机数,由于初始值不同建立ANNs模型也不同,所以经多次试验选出最优的模型。为了避免过度拟合,本研究利用检验集在训练过程中随时监督训练。在本研究中,神经网络模型的分类一致率、灵敏度、特异度均高于Logistic回归模型,Logistic回归模型的分类一致率为77.48%,神经网络模型为78.83%。可以看出神经网络模型的预测能力略优于Logistic回归模型。本研究利用ROC曲线来评价两种模型的预测效果,Logistic回归模型和ANNs模型的AUC分别为0.782,0.800,同样提示,对于高血压这样的致病因素多而且各因素间关系复杂的疾病,神经网络模型的拟合效果略好一些。神经网络尚存在一些问题有待解决。首先,神经网络的建立随着参数、函数、初始值等的设置而变化,这些设置的正确性缺乏理论依据,只能依靠经验和试验来确定;其次,神经网络不能像Logistic回归模型那样有一个公认的模型输入变量的准入和剔出原则;再次,各因素对因变量作用的医学解释尚不明确,以及其假设检验方法和可信区间等问题仍有待进一步研究。结论试验表明对于高血压这样的复杂疾病,神经网络预测模型的预测能力略优于Logistic回归模型。因此可以作为Logistic回归模型的必要补充,神经网络在复杂疾病的预测方面具有广阔应用前景。

【Abstract】 Prediction using artificial neural network model of essential hypertensionPrefaceEssential hypertension (EH) is one of the common cardiovascular disease, In recent years, as the economic development pace of life is significantly increased, resulting in a series of unhealthy lifestyles, leading our country mortality, morbidity and prevalence of cardiovascular disease continued to rise. Hypertension not only is an independent disease, but also is risk factors leading to important cardiovascular disease, even the event of serious complications such as hypertensive crisis and hypertensive encephalopathy may be life-threatening. Therefore, prevention and control of hypertension can not be ignored.Research has shown that hypertension is a multifactorial disease, Large number of risk factors and complexity of relationship between various factors is an important feature of hypertension. Currently, method of disease prediction is mainly traditional Logistic regression(LR),but Logistic regression model require variables must satisfy the independence and can not deal with the problem of collinearity between the variables. Therefore, using logistic regression model to predict such a complex disease, high blood pressure, has some limitations. Artificial Neural Networks(ANNs), referred to as neural networks (NNs), is a mathematical model of simulating the biological neural network to information process. Neural network has the strong ability to solve the collinearity effect and the interaction between variables, and are good at handling non-linear, fuzzy, noisy data case. Currently, artificial neural network applications in medicine is far less widespread than the traditional Logistic regression.The selected scene in this study is Zhangwu County in Liaoning Province. By investigation, the standardized prevalence hypertension rate was 35%, national rare. In this study, we used of these survey data set up a back propagation ANNs (BPANNs) prediction model, comparing with the Logistic regression model, and evaluated the forecast performance of ANNs by receiver operator characteristic curve(ROC curve). We also studied and evaluated the ANNs for the prediction effects and characteristics, to explore new prediction ways for the complex diseases such as high blood pressure and provide a reference for prevention and treatment of hypertension in rural areas.Subjects and Methods1.The selection of study subjectsThis study used the survey data which came from the epidemiological investigation in Zhangwu County in Liaoning Province before to statistics and forecast analysis.Using clustering multistage sampling method 5208 people were total surveyed, at last 4126 respondents over 30 years old were enrolled in this study, of which women were 1942 people, men were 2184 people.2.The contents of investigation and measurement indicatorsQusetionnaires were filled by means of inquiring and measurement in sites, the contents of survey included:general characteristics, smoking habits, alcohol intake and so on. Measure blood pressure, body height and weight, et al.Five millititers blood samples were drawn after an overnight fast. After centrifugation, the serum fraction was removed and frozen in aliquots until assayed.3.Diagnosis standard and measurement methodsThe diagnosis standard of EHT:According to 1999 WHO-ISH guidelines for the management of hypertension, hypertension was defined as a systolic blood pressure (SBP)≥140mmHg and/or a diastolic blood pressure (DBP)≥90mmHg. The measurement of blood pressure should be carried out according to the unified standard under standard conditions.Cholesterol, triglyceride, high density lipoprotein (HDL), low density lipoprotein (LDL), serum sodium, serum potassium, serum iron, serum calcium were measured by automatic biochemistry analyzer 7150 (HITACHI, Japan), the blood sugar was measured by blood sugar analyzer (Johnson & Johnson, America).4.The establishment of ANNsANNs model used the three layers BP neural network model with a hidden layer. Input layer neurons of the model were the factors related to hypertension and P<0.05 by Univariate analysis, output layer had one neuron(that was studied whether hypertension according to diagnostic criteria), and number of neurons in the hidden layer through the experiment was merited to determine basing on the mean square error. The hidden layer activation function was tansig, and the output layer activation function was logsig.The data(4126 cases) according to the ratio of 3:1 after balancing by gender and age were randomly divided into the total set of training (3096 cases) and test set(1030 cases), and were respectively used to set up and test, In order to prevent over fitting the total set of training according to the ratio of 3:1 were randomly divided into train set (2334 cases) and check set(762 cases), using check set from time to time to check the results of training.5.Statistical methodsThe ANNs prediction model of hypertension was created by Matlab7.1 software,the Logistic regression prediction model was created and ROC Curve was draw by spss13.0. Criteria for predicted probability was 0.5, that is, when p≥0.5 predicted infestation of hypertension, or high blood pressure was not. A 2-sided value ofα=0.05 was regarded as statistically significant.Results1. Prediction of hypertension using unconditional single factor Logistic regression modelUnivariate analysis of hypertension was conducted for the survey data. The factors that is p<0.05, A total of hypertension-related factors is 22,was selected and taken as input variables predictive model. 2. Prediction of hypertension using multivariate non-conditionalLogistic regression model(1)The establishment of multi-factor non-conditional Logistic regression modelThe total set of training (3096 cases) was carried out multivariate non-conditional Logistic regression analysis.the chosen indicators by single factor analysis served as independent variables(Height, weight has been transformed into BMI, so did not enter the model), and whether subjects are suffering from high blood pressure served as the dependent variable, and in this way a multi-factor Logistic Regression Model was set up. Model used the maximum likelihood estimation method, forward stepwise regression analysis, the Access criteria selected variables is p<0.05, the Exclusion criteria selected variables was p>0.10. After stepwise regression,9 factors enter the model, in Omnibus Tests of Model Coefficients the step wasχ2=4.335, and the model test wasχ2=1439.457. the consistency rate of the total set of training was 78.42%, specificity was 80.45%, sensitivity 76.62%.(2)The prediction Using multi-factor non-conditional Logistic regression modelThe subjects of test set(1030 cases) were predicted whether they were suffering from high blood pressure by the Logistic Regression Model.The predicted results was that the consistency rate was 77.48%, specificity was 80%, sensitivity 74.85%. 3. The prediction of hypertension using BPANNs(1)The establishment of BPANNsBPANNs was a three-tier model, the 22 chosen indicators by single factor analysis served as input variable, there were 22 hidden layer neurons in the hidden layer, there was one neuron in output layer (whether was the risk of EH).the target error took 0.01, and learning rate took 0.1, the maximum training period took 2000. After 17-step training, then training meaned square error MSE was 0.126262, gradient Gradient was 137.276/1e-010, The network training ended, when test set to the minimum mean square error, fitting results of that test trained BPANNs model was, the consistency rate of the train set,the check set was respectively 81.06%,and 77.95%, and the consistency rate specificity and sensitivity of the total set was respectively 80.30%,84.48%, 76.16%.(2)The prediction Using of BPANNsThe subjects of test set(1030 cases) were predicted whether they were suffering from high blood pressure using the BPANNs.The predicted results was,the consistency rate of the test set was 78.83%, specificity was 81.57%, sensitivity 76.42%.4.The comparison between BPANNs and Logistic regression model about predictive ability of high blood pressure(1) Comparison of predicted resultsThe consistency rate, sensitivity and specificity of Neural network model were higher than Logistic regression model.(2) Comparison of ROC curve areaThe ROC curve of BPANNs and Logistic regression mode were drawn, The results showed that, the area under ROC curve of Logistic regression model was 0.782,95% CI is[0.768,0.797], the area under ROC curve of BPANNs was 0.800,95% CI was [0.786,0.814].DiscussionCauses of hypertension are complex, and the risk factors of affecting hypertension are in many aspects.Some risk factors may exist many interactions, multicollinearity. These complex relationships influence the predictive model fitting, and seriously disturb the prediction and high blood pressure research. Therefore, this study, using of these survey data from Zhangwu county in Liaoning Province,set up a back propagation ANNs (BPANNs) prediction EH model, and compared with the Logistic regression model, and evaluated the ANNs for the prediction effects and characteristics. In the process of building neural network model, there was no uniform standard to set the function and parameter, so we need to analyze specific issues. In this study, the model was the BP neural network, known as a "feed-forward back-propagation network", which was the most widely used in the medical field and embodies the essence of neural networks. Because any continuous function in closed interval could be closed by single hidden layer BP ANNs, so this study used three layers (containing a hidden layer) BP neural network. Taking into account that excessive number of neurons requires a higher sample size, so only selected factors closely related to high blood pressure as input variables, that was, p<0.05 in univariate analysis. For the multi-categorical variables in the input variables (such as national) we set the dummy variable to facilitate better use of data. Number of neurons in the hidden layer and training function was determined according to the test. Test showed that compared to other values, when the number of neurons was 22 and training functions are trainlm, the mean square error was small and stable, the initial weights of the network was set to (01) interval of random numbers.Since if the initial value of different then ANNs model was different,so after numerous experiments, the best model selected. In order to avoid over-fitting, we used chedk set to supervise training at intervals in the training process.In this study, the consistency rate,, sensitivity and specificity of the neural network model were higher than Logistic regression model. The consistency rate of Logistic regression model and neural network model was respectively 77.48% and 78.83%, that could see the predictive ability of neural network model was better than Logistic regression model.Using ROC curve to evaluate the effectiveness of two models, AUC area under the curve of Logistic regression model and ANNs AUC were respectively 0.782,0.800. it also suggested that neural network model fitted slightly better than LR for these diseases such as hypertension of which risk factors were many and complex relationship exists between the various factors.Neural networks were still some issues to be resolve. First of all,the establishment of neural networks changes with setting parameters, functions, initial value, etc. The correctness of these settings is still a lack of theoretical basis,so that only rely on experience and testing to determine. Second, there is no recognized principle of access and remove as a Logistic regression model about the input variables of neural network. Again, the medical explanation of the role of various factors on the dependent variable was not clear,and hypothesis testing method,confidence intervals and other issues need further study.ConclusionExperiments showed that such a complex disease for high blood pressure, neural network prediction model performs was better than the Logistic regression model. Therefore, ANNs could be used as a necessary complement for Logistic regression model, neural network prediction in complex diseases had broad application prospects.

  • 【分类号】R544.1
  • 【被引频次】7
  • 【下载频次】799
节点文献中: 

本文链接的文献网络图示:

本文的引文网络