节点文献

有机物分子结构参数化表征及定量构谱关系研究

Molecular Structural Characterization and Further Quantitative Structural-Spectrum Relationships for Organic Compounds

【作者】 何留

【导师】 李志良;

【作者基本信息】 重庆大学 , 分析化学, 2008, 硕士

【摘要】 定量构效关系(quantitative structure-activity relationship,QSAR)作为药物设计研究中一个重要的计算方法和常用手段,其核心内容是考察和分析分子结构特征与物化性质或生物活性之间的定量相关关系。近年来,QSAR技术对有机合成化学、药物化学及药物设计的发展起到了巨大的推动作用,已经成为研究物质理化性质与生物活性以寻求分子解释的一个强有力工具。作为定量构效关系一个新兴的分支学科,定量构谱关系(quantitative structure-spectrum relationship,QSSR)是以QSAR技术来研究分析仪器得到的谱学数据,通过理论模拟波谱特征。由于较多情况下化合物的结构与谱学数据之间并非简单地呈现线性关系,并且谱学数据自身存在复杂性和多样性,因此对其正确地模拟和预测仍存在一定困难。本文在该领域进行了有益的探讨,通过研究新的分子结构参数表征方法分析了有机化合物及生物分子的几类谱学行为,得到了较满意的结果。本文在实验室已有的良好基础上,应用和发展了基于分子二维结构信息的原子电性作用矢量(AEIV)、分子电性作用矢量(MEIV)和分子电性杂化作用矢量(MEHIV);结合多元线性回归(MLR)、逐步回归(SMR)、遗传算法(GA)、偏最小二乘法(PLS)、支持向量机(SVM)等变量筛选方法和QSAR建模技术,对有机化合物的谱学性质作了定量构谱关系研究,多数体系取得了与文献相近或者更优的结果。本文开展的工作主要有以下几个方面:①基于分子二维拓扑结构,提出了表征分子局部化学微观环境和杂化状态的结构描述子:原子电性作用矢量(atomic electronegativity interaction vector,AEIV)和原子杂化状态指数(atomic hybridization state index,AHSI)。采用该方法对42种吖啶酮生物碱和24种喹啉酮生物碱进行结构表征,并以此建立起模拟碳原子核磁共振化学位移的多元线性回归方程,所得模型复相关系数R以及留一法交互校验复相关系数Rcv分别为0.957,0.956和0.983,0.981。此外,对35种萘系衍生物共计375个等价碳原子进行结构表征,所得模型的R、交互检验Rcv及均方根误差RMS分别为:0.951、0.949和5.365。通过严格的统计诊断和模型验证,结果表明所建模型具有较好的稳定性和预测能力。②考虑到分子中杂化状态对原子轨道电负性的影响,建立了一种新型结构参数化方法:分子电性杂化作用矢量(molecular electronegativity interaction vector withhybridization,MEHIV)。采用该矢量对烷烃、芳香烃、脂肪醇、脂肪族醛和酮等5类有机化合物进行结构表征并与其离子迁移谱约化迁移常数K0建立了多元线性回归模型,所得模型复相关系数R为:0.915、0.926、0.978、0.978、0.990,标准偏差SD分别为:0.044、0.053、0.042、0.034、0.030。模型相关统计结果表明MEHIV描述子具有物化意义明确、分子结构表达能力较强及表现形式直观等优点。③进一步探讨MEHIV的适用性。尝试使用MEHIV对190个双质子化肽离子碰撞截面进行系统的QSSR研究,采用遗传算法结合偏最小二乘方法建立模型。通过严格统计检验,结果显示所建偏最小二乘(PLS)回归模型具有良好的稳定性和泛化能力,该模型对内部训练集和外部测试集样本计算结果的相关统计量R2、RMS、Rcv2cv、Rpred2和RMSpred分别为0.957、8.378、0.954、0.978和10.298。结果表明MEHIV与肽离子碰撞截面呈显著线性相关,而对少数多肽则包含一定非线性因素。④从分子二维拓扑结构出发,采用分子电性作用矢量(molecularelectronegativity interaction vector,MEIV)对100个多环芳烃(PAHs)、62个多氯代萘(PCN)、117个含氮多环芳香烃(N-PAHs)、90个含硫化合物的气相色谱保留行为进行了QSRR研究,所得模型复相关系数R以及留一法交互检验复相关系数Rcv均在0.98以上。采用内部及外部双重检验的方法对所得模型稳定性和泛化能力作了深入分析。其结果表明MEIV具有较强分子结构表达能力及对化合物各类性质的优良适应性。⑤从文献中选取72个肽序列及相应的高效液相色谱(HPLC)保留时间,采用MEHIV表征肽结构,结合常见QSAR建模技术和模型验证方法,采用严格的分组方法将样本分为52个训练集和20个测试集,通过逐步回归、遗传算法、偏最小二乘法、支持向量机分别对该组样本进行建模并比较分析,结果表明MEHIV是一类优良的描述子,所得模型质量较好;另外,通过对几种QSAR建模方法的比较,发现肽HPLC保留行为与MEHIV描述子间主要是线性关系,但也存在一定的非线性因素。

【Abstract】 Quantitative Structure-Activity Relationship (QSAR), which investigates the quantitative relationship between the molecular structural parameters and biological activities or dependent functions, is one of the most important computational method and common technique for drug design. In recent years, great impetus has been made by QSAR to the development of organic synthesis chemistry, medicinal chemistry and drug design, while it is proved to be a powerful tool for correlating molecular structure with their physicochemical properties and bioactivities and to seek reasonable interpretations. As an emerging branch of QSAR, Quantitative Structure-Spectrum Relationship (QSSR) is referred to the process that spectrum data obtained from instrumental analysis are theoretically simulated by QSAR methods. However, due to their own complexity and diversity, spectrum data are not merely in linear relation with structure, thus being difficult to be correctly predicted and simulated. In this context, a helpful discussion has been attempted, i.e. several kinds of spectrum behaviors of organic compounds and biomolecules are deeply researched by some new molecular structural characterization methods.In this thesis, based on the 2D information of molecular structure, atomic electronegativity interaction vector (AEIV), molecular electronegativity interaction vector (MEIV) and molecular electronegativity interaction vector with hybridization (MEHIV) are applied and extended. In the modeling process, several modeling methods, such as multiple linear regression (MLR), stepwise multiple linear regression (SMR), genetic algorithm (GA), partial least squares (PLS) regression and support vector machine (SVM) are utilized to establish the QSSR models for organic compounds, and most obtained models have comparable or superior quality compared with literatures. The main contents are as follows:①Based on molecular two-dimensional topological structures, both atomic electronegativity interaction vector (AEIV) and atomic hybridization state index (AHSI) were developed for expression of chemical microenviroment and atomic hybridation state. By applying AEIV and AHSI to characterize a great deal of equivalent carbon atoms of 42 acridone alkaloids and 24 quinolinones, multiple linear regression model is constructed to simulate nuclear magnetic resonace chemical shifts of 13C atoms. The correlation coefficients R of modeling estimation and leave-one-out cross-validation RCV are 0.957, 0.956 and 0.983, 0.981, respectively. Applying these descriptors to characterize 375 equivalent resonant carbon atoms of 35 naphthalene derivatives, the correlation coefficients R of modeling estimation, leave-one-out cross-validation RCV and root mean square RMS are 0.951, 0.949 and 5.365, respectively. Then by strict statistical diagnosis, the model is confirmed to be stable and predictable.②Taking the effects of various hybridization on atomic electronegativities into account, a novel electropological descriptors, called Molecular electronegativity interaction vector with hybridization (MEHIV), has been developed to describe the atomic hybridization state in different molecular environment. Five quantitative models by MEHIV characterization and multiple linear regression modeling were successfully established to predict reduced ion mobility constants (K0) of alkanes, aromatic hydrocarbons, fatty alcohols, fatty aldehydes and ketones and carboxylic esters. The correlation coefficients R were 0.915, 0.926, 0.978, 0.978 and 0.990, respectively, and the standard deviations SD were 0.044, 0.053, 0.042, 0.034 and 0.030, respectively. These results suggested that MEHIV is an excellent topological index descriptor with many advantages such as straightforward physicochemical meaning, high characterization competence, convenient expansibility and easy manipulation.③In order to further discuss the applicable fields of MEHIV, it is also used to characterize the molecular structure of 190 doubly protonated peptides and correlated with their ion mobility spectrometry collision cross sections. A quantitative model is successfully developed by GA and PLS regression. The constructed PLS regression model are subjected to rigorous double internal and external validation, indicating the model is robust and predictable, with statistics on both training and testing set as R2=0.957, RMS=8.378, RCV2=0.954, Rpred2=0.978 and RMSpred=10.298, respectively. The results show that MEHIV correlates well with collision cross section, mainly linear and somewhat nonlinear relationship.④A novel electrotopological descriptors, molecular electronegativity interaction vector (MEIV) is employed to studies on retention behaviors of 100 polycyclic aromatic hydrocarbons (PAHs), 62 polychlorinated naphthalene (PCN), 117 nitrogen containing polycyclic aromatic compounds (N-PACs) and 90 sulfur compounds, with good QSRR models having both R and RCV above 0.98. Deeply testing estimated stabilities and generalized abilities by both internal and external exams, MEIV is thus deemed to be adaptable to diverse molecular systems.⑤Considering the recent development of QSAR model validation and modeling methods, MEHIV is also utilized to characterize the molecular structure of 72 peptedes and correlate with their HPLC retention time (RT). A data set of peptedes is selected and divided into a training set with 52 samples and a test set with 20 samples. Stepwise multiple linear regression (SMR), genetic algorithm (GA), partial least squares (PLS) regression and support vector machine (SVM) are used to correlate the molecular structure with retention time in order to have a comparative analysis. The good results show that MEHIV can be used to well express the structures of peptedes. It is found that the relationship between retention behavior of peptedes and MEHIV vectors is mainly linear, also containing a little nonlinear factors.

  • 【网络出版投稿人】 重庆大学
  • 【网络出版年期】2009年 06期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络