节点文献

若干有机小分子生物活性和毒性的识别及预测

The Identification and Prediction of Biological Activity and Toxicity for Some Small Organic Molecules

【作者】 陆瑾

【导师】 陆文聪;

【作者基本信息】 上海大学 , 材料学, 2012, 博士

【摘要】 近些年来,伴随着人类等生物物种基因组学、信息技术和生物检测手段的不断发展,生物信息资源日渐丰富,生物信息学作为新兴的交叉学科应运而生。理论研究者可以在实验获取的数据基础上进行加工、存储等,利用机器学习方法进行分析,从中找出隐含的规律和模式,从而进一步加深对事物的认识,揭示数据所蕴含的生物学意义。本文就是采用这一研究方法着手若干有机小分子生物活性和毒性的识别及预测。本文的主体工作分为三个部分:第一部分:基于集成学习算法的小分子生物功能预测如何准确并有效地确定小分子生物功能是一个挑战,小分子生物功能预报研究具有重要意义。本部分内容中我们运用集成学习算法来解决这个问题。我们尝试用AdaBoost-C4.5算法建模,用官能团组成来实现小分子编码,完成小分子代谢途径类型预测等研究。小分子生物功能的研究可以帮助我们认识疾病机理、理解生命现象。本部分研究所建立的模型显示出较好的预测性能,其交叉验证预报准确率为73.71%,对独立测试集的预报准确率达73.8%。根据建立的预测模型,我们开发了相应的小分子代谢途径类型预报的在线服务系统,有关WEB界面见http://chemdata.shu.edu.cn/pathway/。第二部分:基于集成学习算法的代谢过程中酶和小分子相互作用的预测酶和小分子之间相互作用的信息对于我们理解酶和小分子的新陈代谢作用和其它生物过程非常重要。本文中我们应用AdaBoost,Bagging and KNN等不同的分类器组合,通过多分类器投票系统来预测酶和小分子在代谢过程中的相互作用。研究表明:多分类器投票系统的预报结果优于任何单个分类器预报的结果。我们得到的训练数据集和独立测试集的预报准确率分别为82.8%和84.8%。其中对于酶和小分子相互作用对(即正样本)独立测试集的预报准确率为75.5%,比之前文献报道的准确率高出4个百分点。本工作提出的预报方法的相关内容已建立在WEB服务器上,地址为http://chemdata.shu.edu.cn/small-enz/。第三部分:基于支持向量机回归的麻醉药毒性构效关系研究本部分工作中,我们采用支持向量机回归方法、多元线性回归、偏最小二乘法及逆传播人工神经网络研究了39个麻醉药毒性的定量构效关系。从若干量子化学计算参数中筛选出能有效建模的分子描述符。所得SVR,MLR,PLS,BP-ANN模型的均方根差分别为0.283,0.385,0.392和0.466。结果表明,所建支持向量机回归模型的预报精度高于MLR、PLS和BP-ANN方法所得的结果。支持向量机方法有望成为结构毒性关系研究领域中有用的化学计量学工具。

【Abstract】 With the development of genomics, information technology, and biological inspection means, the amount of biological information is rapidly increasing. The tremendous resources of biological information lead to the birth of a new interdisciplinary ? bioinformatics. Researchers have been exploring biological knowledge by capturing, managing, depositing, retrieving and analyzing the biological information. Data mining is used to extract potential and useful information from the databases, and playing an increasingly important role in the study of bioinformatics. In this dissertation, ensemble learning methods are used to investigate identification and prediction of biological activities and toxicities of some small organic molecules. The main contributions of the dissertation can be summarized as follows.I. Prediction of biological function of small molecules based on ensemble learning algorithmStudies on biological functions of small molecules can help understand biological phenomena in molecular biology and disease mechanism in medicine. To discover biological functions of small molecules, a great deal of manpower, materials and financial resources are required in experiments. In this study, an ensemble learning approach is proposed. Based on the AdaBoost method with function group composition, a novel method was used to quickly map the small chemical molecules back to the possible metabolic pathway which the small molecules belonged. As a result, 10-folds cross validation test and independent set test on the model reached 73.71% and 73.8%, respectively. It is concluded that the proposed approach is promising in mapping unknown molecules’possible metabolic pathway. Based on the models for predicting small molecules’metabolic pathways, an online predictor developed in our laboratory is available at http://chemdata.shu.edu.cn/pathway. II: Prediction of interaction between enzymes and small molecules in metabolic pathways with integrated multiple classifiersInformation about interactions between enzymes and small molecules is important for understanding various metabolic bioprocesses. We applied a majority voting system to predict the interaction between enzymes and small molecules in the metabolic pathways by combining several classifiers including AdaBoost, Bagging and KNN. The advantage of the strategy is attributed to the fact that a predictor based on majority voting systems usually can provide results with better reliability than any single classifier. The prediction accuracy of a training dataset and an independent testing dataset were 82.8% and 84.8%, respectively. The prediction accuracy for the networking couples in the independent testing dataset was 75.5%, about 4% higher than that reported in a previous study. An implementation of the proposed prediction method is available at http://chemdata.shu.edu.cn/small-enz.III. Quantitative structure-property relationship based on support vector regression for narcotics toxicitiesQuantitative structure-toxicity relationship of narcotics was studied using support vector regression, multiple linear regression, partial least squares, and back propagation artificial neural network. The molecular descriptors contributing to toxicities were selected from various features obtained using quantum chemistry methods. The root-mean-square errors of SVR, MLR, PLS and BP-ANN models were 0.283, 0.385, 0.392 and 0.466 respectively. The results indicate that the prediction accuracy of SVR model is higher than those of MLR, PLS and BP-ANN models. It is expected that SVR is a useful chemometric tool in the research of structure-toxicity relationship.

  • 【网络出版投稿人】 上海大学
  • 【网络出版年期】2012年 07期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络