节点文献

有机小分子吸收能的精确计算:神经网络与支持向量机方法

Accurate Prediction of Absorption Energies of Small Organic Molecules: Neural Network and Support Vector Machine Methods

【作者】 高婷

【导师】 苏忠民; 吕英华;

【作者基本信息】 东北师范大学 , 物理化学, 2009, 博士

【摘要】 分子吸收能可以表征分子的内在结构信息和电子性质,是分子的一个重要物理属性。因此,精确地预测吸收能是计算化学领域的一个重要问题。量子化学是研究分子微观结构、性质和分子间相互作用的基础学科,在近20年中,量子化学的基础理论和计算方法取得了显著进展。量子化学计算的一大优势在于它可以先于实验来预测物质的性质或实验上至今无法测得的一些物理量及无法观测到的反应过程。量子化学计算在解释和预测中小分子的实验结果上显示出巨大优势,但由于计算方法固有的近似使得误差不可避免,尤其是对于结构不规整的复杂大分子,计算误差更大。在近10年里,许多统计校正方法被用于提高量子化学的计算精度。其步骤是先用量子化学方法计算分子的相关物理参数,然后用统计方法来确定理论计算和实验值的数量关系,这些方法主要包括:多元线性回归和人工神经网络等。本论文针对160个有机小分子体系,用人工神经网络和支持向量机等方法来校正量子化学方法计算的结果,提高了量子化学计算吸收能的精度。在简单的物理参数下,组合型计算方法能够减小理论计算因忽略电子相关效应和使用小基组所带来的系统误差,为准确、快捷地预测分子性质提供了一种新的研究手段。研究工作主要包括如下几个部分:1、量子化学方法计算有机小分子的紫外可见吸收光谱,用遗传算法优化BP神经网络(GABP)来提高有机小分子吸收能的计算精度。GABP1分别校正了三种量子化学计算方法:B3LYP/6-31G(d),B3LYP/STO-3G和ZINDO,校正前三种方法得到的均方根误差分别是0.32,0.95和0.46 eV,校正后均方根误差降低到了0.14,0.19和0.18 eV。其中,B3LYP/6-31G(d)-GABP1计算结果误差较小,基本与实验值吻合。2、GABP2用于确定B3LYP/6-31G(d)-GABP1计算值和低水平理论计算方法之间的数量关系,例如B3LYP/STO-3G和ZINDO。GABP2校正B3LYP/STO-3G和ZINDO后误差的均方根降低到了0.20和0.19 eV。比较两次校正结果,我们可以验证GABP方法的有效性和可行性。显然,B3LYP/6-31G(d)-GABP1可以有效的提高有机小分子吸收能的精度,并且在实验结果不可获得或者不可得到的情况下,B3LYP/6-31G(d)-GABP1的校正值可以看做是实验的近似值。3、B3LYP/6-31G(d)-GABP方法可以分析较大分子的吸收能,当大分子缺少实验值或者无法获得较高精度理论值的时候,它可能成为一个预测实验值的有效的理论计算工具。4、根据所选数据集来增加物理参数,然后用多元线性回归来筛选物理参数。5、引入最小二乘支持向量机来提高密度泛函理论计算值。最小二乘支持向量机校正后的误差由0.32 eV降低到了0.11 eV。相比多元线性回归来说,最小二乘支持向量机有更好的适应能力和校正效果。在无法获得实验值或无法得到精度较高的量子化学计算结果的时候,最小二乘支持向量机校正B3LYP/6-31G(d)的方法可以有效的预测分子的吸收能。最小二乘支持向量机扩展了B3LYP/6-31G(d)的真实性和实用性。

【Abstract】 Absorption energy is a significant physical property for a molecule, which implies inherent structure information and electronic properties. The accurate prediction the absorption energy is one of the important topics in computational chemistry. Quantum chemistry is a fundamental subject studying properties and interactions of molecules. In the past decades, it has been developed remarkably on its primary theories and methods. One of the Holy Grails of quantum mechanical calculation is to predict properties of matter prior to experiments, to examine the physical properties or processes that are inaccessible by experiments. Despite their success, the results of quantum mechanical calculation contain inherent numerical errors caused by various intrinsic approximations, in particular for complex systems. During the last 10 years, many statistical correction approaches were employed to improve the results of quantum chemical method. After the theoretical calculation of the properties of molecules, statistical correction approaches can be used to determine the quantitative relationship between the calculated and experimental results. These statistical correction approaches mainly include multiple linear regression (MLR) or nonlinear methods et al.In the present work, neural network and least squares support vector machine have been applied to improve the calculation accuracy of quantum chemical methods for absorption energies of 160 small organic molecules. With general descriptors, these combined methods can greatly eliminate the systemic errors of theoretical calculation due to ignoring the electron correlation and using small basis set, and will be a novel tool for predicting the properties of the molecules.Our work has been focus on following aspects:1. The combination of genetic algorithm and back propagation neural network correction approach (GABP) has successfully improved the calculation accuracy of the absorption energies after quantum chemical methods calculated UV-visible absorption spectra of 160 small organic molecules. Firstly, the GABP1 is introduced to determine the quantitative relationship between the experimental results and calculations obtained by using quantum chemical methods. After GABP1 correction, the root-mean-square (RMS) deviations of the calculated absorption energies reduce from 0.32, 0.95 and 0.46 eV to 0.14, 0.19 and 0.18 eV for B3LYP/6-31G(d), B3LYP/STO-3G and ZINDO methods, respectively. The corrected results of B3LYP/6-31G(d)-GABP1 are in good agreement with experimental results.2. The GABP2 is introduced to determine the quantitative relationship between the results of B3LYP/6-31G(d)-GABP1 method and calculations of the low accuracy methods (B3LYP/STO-3G and ZINDO). After GABP2 correction, the RMS deviations of the calculated absorption energies reduce to 0.20 and 0.19 eV for B3LYP/STO-3G and ZINDO methods, respectively. The results show that the RMS deviations after GABP1 and GABP2 correction are similar for B3LYP/STO-3G and ZINDO methods. Thus, the B3LYP/6-31G(d)-GABP1 is a better method to predict absorption energies and can be used as the approximation of experimental results where the experimental results are unknown or uncertain by experimental methods.3. This GABP may be used for predicting absorption energies of larger organic molecules that are unavailable by experimental methods and by high accuracy theoretical methods with lager basis sets. Thus, this method is a realiable tool to predict absorption energies.4. We enlarge the descriptors according to the data set. After multiple linear regression, 8 descriptors are selected as the proper physical descriptors.5. In this paper we introduce least squares support vector machine (LS-SVM) to improve the calculation accuracy of density functional theory. Upon the LS-SVM approach, the RMS deviations of the B3LYP/6-31G(d) calculated absorption energies of 160 organic molecules are reduced from 0.32 eV to 0.11 eV. Comparison of the MLR and LS-SVM values demonstrates the feasibility and effectiveness of the LS-SVM approach. And, the LS-SVM correction on top of the B3LYP/6-31G(d) results is a better method to predict absorption energies and can be used as the approximation of experimental results when the experimental results are limited to measurement with very high accuracy. LS-SVM greatly extends the reliability and applicability of the B3LYP/6-31G(d) method.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络