节点文献

数学建模方法在药物化学及大鼠大脑新陈代谢中的应用研究

Application of Mathematical Modeling in Medicinal Chemistry and Metabolism in the Rat Brain

【作者】 王杰

【导师】 胡之德; Graeme F. Mason;

【作者基本信息】 兰州大学 , 分析化学, 2009, 博士

【摘要】 结构活性/性质关系方法(Structure Activity-Property Relationship,SAR/SPR)是目前国际上一个相当活跃的研究领域,近些年人们对该领域研究的投入呈现逐年递增的趋势。SAR/SPR方法的研究对象主要包含物质各种各样的物理化学性质参数,生物活性,毒性,以及药物的生物利用度等等,研究领域涉及化学、生物学、药学以及环境化学等诸多学科。该方法主要是从化合物的分子结构出发,利用理论计算的方法得到各种各样的物理化学参数,然后从中选择出与研究对象密切相关的参数,建立相关的线性或非线性模型,用来估测物质的性质和活性等,最后,研究人员可以根据所建立的模型从分子水平上讨论物质性质以及活性的作用机理。该方法的出现可以很好的促进学科间交叉,具有重要的理论和实际意义且具有很好的应用前景。本论文首先从分子结构的定量描述和结构活性/性质关系的建立入手,总结了SAR/SPR方法在物质物理化学性质预测,药物筛选领域内的应用。该论文着重讨论了一种新型的改进机器学习算法,即格式搜索支持向量机(Grid-Search Support Vector Machine,GS-SVM)方法,建立了高效、稳定的定量结构活性关系(Quantitative Structure-Property Relationship,QSPR)和分类结构活性关系(Classification Structure-Activity Relationship,CSAR)模型。最后,本论文又研究了数学模型在大鼠大脑新陈代谢领域内的应用研究,研究了尼古丁对大脑各个部位代谢速率的影响。该论文主要有以下四章组成:第一章首先对机器学习和相关的统计学习理论进行了简单的介绍;然后详细的描述了论文主要采用的算法——支持向量机的基本原理,同时对其它各种分类方法作一简单总结;最后对QSAR方法的基本原理,主要步骤以及模型稳定性和可靠性的判定方法作一概述。第二章详细讨论了QSPR方法在物质性质预测领域内的应用,其中主要包括以下两个方面的工作:(a)运用QSPR方法对18种人体必需的氨基酸的比旋光度进行了预测。该工作首先应用启发式算法对CODESSA软件所产生的化学描述符进行筛选,建立线性回归模型,模型的复相关系数(R~2)为0.918;随着特征描述符(+1,-1分别代表左旋和右旋)的引入,模型的复相关系数提高为0.970,模型的预测结果得到了很大的改观。该模型为预测手性化合物的比旋光度提供了一种文献未曾报导过的新型研究方法。(b)应用启发式算法和支持向量机算法分别建立线性和非线性模型,对196种化合物的表面张力进行预测。通过模型对比,非线性SVM模型的结果明显优于线性模型的结果,对于训练集和测试集的复相关系数和误差因子分别为0.9348和0.9097,1.22和1.07。该模型的建立为表面化学的研究提供了一种新型的研究方法。第三章详细地介绍了改进支持向量机算法——格式搜索支持向量机算法在分类领域内的应用。主要包括以下三个方面的工作:(a)基于格式搜索支持向量机算法对141种新型抗艾滋病药物核苷类衍生物进行了分类研究。首先,根据CODESSA软件产生的描述符,利用线性判别分析方法选取与抗艾活性最紧密相关的描述符,同时建立线性分类模型。该模型对于训练集,测试集的预测准确率分别为83.0%和88.6%。从预测结果可见,有改善的必要性。因此,为了得到更加精确的预测模型,基于所选择的描述符,利用格式搜索支持向量机算法建立了非线性模型,得到了较好的预测结果——91.5%(训练集)和91.4%(测试集)。该工作对新型抗艾滋核苷类药物的筛选提供了一定的理论指导。(b)利用分类构效关系(Classification Structure-Activity Relationship,CSAR)方法对噻吩类衍生物的遗传毒性进行了分类研究。首先利用前向性逐步线性判别分析方法选择出与遗传毒性最为相关的结构参数同时建立线性分类模型;利用所选择的这些参数作为格式搜索支持向量机的输入变量,建立非线性模型,对噻吩类衍生物的遗传毒性进一步进行预测。通过模型对比,非线性GS-SVM方法能够提供更加精确的预测结果92.9%(训练集)和92.6%(测试集)。通过结果分析与讨论,我们找到了化合物一些与药物遗传毒性相关的结构因素。该模型的建立对噻分类衍生物的遗传毒性的研究提供了简便、有效且快捷的方法。(c)利用LDA和GS-SVM联用方法分别建立了线性和非线性两种分类模型,对167种药物的生物利用度进行了研究。线性LDA方法用来选取与药物的生物利用度最为密切相关的结构参数,同时根据选取的参数,建立线性和非线性二元分类模型。非线性GS-SVM模型的判断正确率为85.82%(训练集),84.85%(测试集)和85.63%(整体数据集),要远远高于LDA模型。相比于原始文献而言,该工作为药物的生物利用度的研究提供了另外一种新的研究手段。前三章是在兰州大学化学化工学院胡之德教授的指导下完成的,论文第四章主要是在美国耶鲁大学医学院,在Prof.Graeme F.Mason的指导下完成的。该章的工作主要是通过数学建模的方法研究了尼古丁对大鼠大脑各个区域的新陈代谢物质的总含量以及代谢速率进行了研究。首先,通过经典的单变量t检验方法,对大鼠大脑各区域的化合物的总含量进行了对比研究,发现大脑纹状体(γ-氨基丁酸(GABA),谷氨酸(Glutamate)和N-乙酰天冬氨酸(N-acetylaspartate,NAA))、顶叶皮层(肌酸(Creatine),Glutamate和NAA)、额叶皮层(NAA)、颞叶皮层(丙氨酸(Alanine),胆碱(Choline))、髓质(天冬氨酸(Aspartate),Glutamate)、嗅球(NAA)等部位在注射尼古丁后均有显著变化。然后,通过简单的线性判别分析方法对38只大鼠进行了分类研究。根据大鼠不同部位,不同代谢物质所组成的变量集合,来判断大鼠接受药物注射的情况(生理盐水和尼古丁)。结果显示38只大鼠仅有一只预测错误,这表明尼古丁对大鼠大脑的新陈代谢影响有可能进行预测。最终,我们根据Glutamate的C4,Glutamine的C4以及GABA的C2的13C标记情况对大鼠大脑各区域的新陈代谢速率和尼古丁的影响也进行了初步研究。

【Abstract】 Nowadays, structure activity-property relationship (SAR/SPR) approach is avery popular method in many research groups. Over the past twenty years a largenumber of papers has been published every year and the number continues to rise. Theaims of the SAR/SPR method are very broad, including various physical-chemicalproperties of substances, biology activity, toxicology, bioavailability, etc, and its’research area is related to chemistry, biology, drugs and environmental chemistry.Therefore, the development of this approach will drive the advancement of the crossdiscipline.In chemiformatics, this method only utilizes the information of themolecular structures, and calculated multifarious physical-chemical parameters usingtheoretical computation approaches. Using these parameters and the selected trainingset, some mathematical methods, such as heuristic method, genetic algorithm, lineardiscriminant analysis, etc, are used to select the most important descriptors, and thenconstruct many different linear or non-linear models. Using these models, researcherscan successfully predict the properties and activities of the compounds. At last, thisapproach also provides some important information, which can be used to discuss thebasic theory of the activities and the influence factors of the properties on molecularlevel.In the first part of this dissertation, we discuss the application of SAR/SPRmethod in the physical-chemical properties of substrates and drug screen domain. Thefocus of this dissertation is on an improved new machine learning method: grid searchsupport vector machine (GS-SVM). Using this method, we build efficient, and stablequantitative structure-property relationship (QSPR) and classification structure- activity relationship (CSAR) models. At last, this dissertation also covers theapplication of mathematical modeling to the rat brain’s metabolism, and in particularthe influence of nicotine in the rates of different rat brain regions. This dissertationconsists of four chapters:The first chapter discusses the machine learning method and the statisticallearning theory; then describes the basic theory of support vector machine algorithm,and also summarizes the other classification methods. At last, we describe the basictheory of QSAR methods, the main steps, the stability and reliability of the models.In the second chapter, we investigate the application of QSPR method in thedomain of prediction of the properties of substrates. It consists of two separate parts:(a) The QSPR method was developed to predict power rotation of 18 kinds ofnecessary amino acids. The heuristic method (HM) was utilized to select the mostimportant descriptors which were calculated from the molecular structures alone, andto build a linear regression model at the same time. The coefficient of determination(R~2) of this model is 0.918. In order to build a more reliable model, another descriptor-molecular chirality was added (+1 represent left hand, and -1 represent right hand)into the pool of former selected descriptors, and got much better results-R~2=0.970.The work provides a new and efficient way to investigate the power rotation of chiralcompounds. (b) The heuristic method and support vector machine were used toconstruct linear and non-linear regression models to predict 196 compounds’ surfacetension. By comparing both of the models, the non-linear regression SVM model getsmuch better results than the linear one, and the coefficient of determination and factorof error were 0.9348 and 0.9097, 1.22 and 1.07 for the training and test set,respectively. This study provides a new method for the research of surface chemistry. The third chapter detailed introduce an improved support vector machinemethod - grid-search support vector machine (GS-SVM), and also discuss its’application in classification area. This chapter consists of three sections: (a) The GSSVMmethod was used for the classification of the anti-HIV activity of 141 kinds ofnucleosides derivatives. At first, the stepwise linear discriminant analysis method wasused to select the major descriptors which were significantly influence the anti-HIVactivity, and build a dual linear classification model. The predictability of this modelis 83.0% and 88.6% for the training set and test set separately. In order to arrive at amore accurate model, another non-linear classification model - GS-SVM - wasconstructed using the same selected descriptors, and got better results, 91.5% (trainingset), 91.4% (test set). This study provides a new approach to guide the research on theanti-HIV activity of nucleoside derivatives. (b) Using classification structure-activityrelationship (CSAR) method, the genotoxicity property of thiophene derivatives wasinvestigated. In this project, the stepwise LDA method was used to select the mostimportant descriptors, which correlated strongly with genotoxicity, and build a linearclassification model at the same time. Using the selected parameters and improvedsupport vector machine method (GS-SVM), another non-linear classification modelwas finish. By comparing the results of these two models, the GS-SVM methodprovides a more accurate predictions: 92.9% for the training set, and 92.6% for thetest set. At the same time, some important information was obtained by theinterpretation of the selected descriptors. (c) The LDA and GGS-SVM methods wereseparately used again to build a linear and non-linear classification model for 167kinds of drugs’ bioavailability. Turner and his co-workers utilized regression methodsto research the bioavailability and got some results that were not promising. In thiswork, we used another way to study it, and got better results. By comparing the two generated models, the GS-SVM models give much better predicted results: 85.82%(training set), 84.85% (test set) and 85.63% (all data set). Thus this investigationprovides a new approach to investigate the bioavailability.The first three chapters were finish in Lanzhou University, under the supervisionof Prof. Zhide Hu, and the last chapter was finish in the school of Medicine, YaleUniversity, under the supervision of Dr. Graeme F. Mason. In this chapter, themathematical modeling method was used to research the total quantity of metabolitesand the metabolic rates in different regions of the rat brain. At first, the classical t-testmethod was used to analyze the effect of nicotine on the individual parameters(different metabolites and different regions) of total concentration of metabolites. Theresults indicated that the following parameters were significantly changed after a doseof subcutaneously injected nicotine: striatum (GABA, glutamate, and NAA), parietalcortex (creatine, glutamate and NAA), frontal cortex (NAA), temporal cortex (alanine,choline), medulla (aspartate, glutamate), and olfactory bulb (NAA). By comparing thesame compounds, in different regions, we found that NAA was significantlydecreased in every region. Later, the LDA method was used to separate the 38 ratsinto two different groups (saline and nicotine), using the parameters different regions(except olfactory bulb) multiply different compounds (except lactate). Thisclassification model only gave one wrong rat. The results indicated that nicotine hadeffect on the metabolism of the rat brain. At last, the metabolic rates in differentregions and the effect of nicotine were determined using the 13C labeled glutamateC4, glutamine C4 and GABA C2.

  • 【网络出版投稿人】 兰州大学
  • 【网络出版年期】2009年 12期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络