节点文献

部分有机污染物构效关系的研究

Study of the QSAR for the Selected Organic Pollutants

【作者】 崔秀君

【导师】 张卓勇;

【作者基本信息】 东北师范大学 , 环境科学, 2006, 博士

【摘要】 在环境化学中,有机污染物定量结构活性/性质相关(QSAR/QSPR)对于有机化合物的生态风险性评价、污染控制和预防等具有十分重要的意义。量子化学计算是获得分子结构参数的重要手段。量子化学参数具有明确的物理化学意义,在有机污染物的QSAR/QSPR研究中,可用于探讨毒物与受体的作用方式,也可用于研究影响有机污染物理化性质的分子结构特征。量子化学计算方法中的密度泛函方法(DFT)在理论上很严格,已成为量子化学计算的主流。分子连接性指数是QSAR/QSPR研究中的另一类重要结构参数,其可以实现分子结构的定量描述,已广泛地用于有机化合物的QSAR/QSPR研究中。温度限制串联相关网络以快速、强大及自组织结构而被设计,在串联相关网络的基础上引入了温度限制的概念,解决了过度训练的问题。Mark在径向基函数神经网络训练过程中引入前向选择而设计了改进的径向基函数神经网络,这样可以优化径向基函数的宽度,以控制模型的复杂性和性能。支持向量机是一种新的机器学习方法,其有良好的理论基础和泛化能力。本论文将这些方法引入到环境化学中,构建QSAR/QSPR模型,预测环境化学中的有机污染物的毒性及有机物的物理性质。本论文的第一章简述了定量结构活性/性质相关的发展过程及研究现状。在第二章中,介绍了定量结构活性/性质相关所使用的参数和研究方法,详细描述了改进的径向基函数神经网络、温度限制串联相关网络及支持向量机的原理及应用,这些理论和现状分析为我们开展本论文的研究工作提供了理论基础和依据。在第三章中,我们采用B3LYP杂化密度泛函理论方法计算了35个硝基苯及其同系物的量子化学结构参数,通过逐步回归得到具有显著统计意义的QSAR方程,相关系数是0.925,交叉验证的相关系数是0.87。首次将TCCCN应用到QSAR研究中。用主成分分析选择参数,建立了BP网络和TCCCN网络非线性模型,其训练集的MSE分别为0.095和0.067,预测集的MSE分别为0.111和0.090。非线性的TCCCN模型较线性的MLR模型有更好的预测能力。在第四章中,我们从分子结构计算分子的连接性指数出发,计算了25个酚类化合物的分子连接性指数,用逐步回归方法建立了4个参数的最佳方程,以此4个参数作为输入参数,将留一法(LOO)应用到BP网络、RBF网络及新颖的机器学习方法SVM中,建立了酚类化合物对黑呆头鱼的QSAR预测模型。应用非线性SVM方法建立模型的结果优于BP网络和RBF网络模型的结果,SVM、BP、RBF模型预测的相关系数分别为0.959,0.940和0.945,得到满意的结果。在第五章中,我们采用密度泛函理论(DFT)方法计算了60个醇类化合物的量子化学结构参数,同时又计算了分子连接性指数,将量化参数和分子连接性指数联合应用到醇类的溶解度和辛醇/水分配系数的QSPR研究中,分别通过逐步回归得到具有显著统计意义的4个参数和5个参数的QSPR方程。以此4个参数和5个参数分别作为输入参

【Abstract】 Quantitative structure-activity/property relationship(QSAR/QSPR)of organicpollutants is of great importance to ecological risk assessments of organic compounds,pollution control and pollution prevention, etc.Quantum chemical calculation is an important way to get structural parameters ofspecific molecules in the QSAR/QSPR study. Quantum chemical parameters have explicitphysical chemistry interpretation, and they can be used in not only discussing effect modebetween toxicity and acceptor but also studying the molecular characters affecting physicalchemistry property of organic pollutants. Due to Density Functional Methods (DFT) inquantum chemistry calculation methods have very strict theory bases, they have become aneffective tool in quantum chemistry calculation worldwide. Molecular connectivity index isanother important structure parameters in QSAR/QSPR study. Because they can describemolecular structure in quantity, they have come into wide use in QSAR/QSPR study.Temperature-constrained cascade correlation network (TCCCN) was devised based onfast, strong and self-organizational architecture. The use of temperature constraints in cascadecorrelation network can solve the effects of overfitting. Mark devised an improved radialbasis function neural network (RBFNN) based on forward selection, which can optimize theRBF widths to control model complexity. Support Vector Machine (SVM) is a novel type ofmachine learning method; it has rigorous theory background and remarkable generalizationperformance. This dissertation introduced these methods to environmental chemistry to buildQSAR/QSPR model, predict the toxicity of organic pollutants and physical properties oforganic compounds.A brief description of QSAR/QSPR realization process and research status was given inChapter 1 of this dissertation.In Chapter 2, firstly we introduced the parameters and research methods used inQSAR/QSPR. Then we described the principle of improved RBFNN, TCCCN and SVM indetail. At last we gave a review of the application of these methods, respectly.In Chapter 3, TCCCN, back-propagation neural network (BP) and multiple linearregression (MLR) were applied to QSAR modeling based on a set of 35 nitrobenzenederivatives and their acute toxicities. These structure quantum-chemical descriptors wereobtained from density functional theory (DFT). Stepwise multiple regression analysis wasperformed and model was obtained. The conventional R was 0.925, and cross-validation Rwas 0.87.The principal component analysis is used for parameter selection. RMS for trainingset using TCCCN and BP were 0.067, 0.095 respectively, and RMS for testing set were 0.090,

节点文献中: