节点文献

新型氨基酸结构表征方法及其在定量构效关系中应用研究

Novel Structural Characterizations of Amino Acids and Their Applications in Quantitative Structure-Activity Relationships Studies

【作者】 舒茂

【导师】 杨力;

【作者基本信息】 重庆大学 , 生物医学工程, 2009, 博士

【摘要】 肽与蛋白质的结构表征是其定量构效关系(Quantitative Structure-Activity Relationship, QSAR)研究的前提和重要内容。由于肽和蛋白质的空间结构及功能信息隐藏于一级结构即氨基酸序列中,因此,氨基酸的结构信息对肽及蛋白质的定量构效关系研究至关重要。本文从氨基酸的结构特征出发,构建了两种全新的氨基酸结构表征体系,即VHESH和VSTPV。VHESH(principal component score vector of hydrophobic, electronic, steric, and hydrogen bond properties)来源于20种天然氨基酸的113种物理化学性质,通过对其中50个疏水性质、23个电性性质、35个立体性质和5个氢键性质分别进行主成分特征提取而产生,其中VHSE1和VHSE2代表氨基酸疏水特性;VHSE3~VHSE6代表氨基酸电性特性;VHSE7和VHSE8则代表氨基酸的立体特性;VHSE9和VHSE10代表氨基酸氢键供体和受体特性。VSTPV(principal component score vector of structural and topological variables)则来源于166种天然及非天然氨基酸的85种拓扑结构信息,并经主成分特征提取而产生。与z-scale等其它氨基酸描述子比较,VHESH具有物理化学意义明确,表征能力强,结果易解释等优点;而基于氨基酸拓扑结构性质的VSTPV则具有计算方法简便,不依赖实验数据以及拓展性能好等优点。在肽定量构效关系研究中,将VHESH和VSTPV用于血管紧张素转化酶抑制剂、后叶催产素、人类1型双载蛋白SH3结构域亲和肽、阳离子抗菌肽及细胞穿膜肽的定量构效关系研究,都取得了较好构效关系建模结果。基于VHESH表征方法的构效关系研究发现:血管紧张素转化酶抑制剂第2残基电性与疏水性及第1残基立体等性质与生物活性呈正相关关系,而其第1残的电性等性质则与活性呈负相关关系;后叶催产素第1残基电性及疏水性质和第3残基立体及氢键性质与其生物活性呈显著正相关关系,而第2残基疏水、电性及立体性质与其活性呈负相关关系;分析影响人类1型双载蛋白SH3结构域亲和肽亲和性关键作用力可知,第P-3与第P2之间残基(含P-3与P2残基)的相应性质对亲和活性影响较为显著;阳离子抗菌肽第3残基电性性质,第6、7和12残基立体性质以及第11和12残基的疏水性与抗菌效价呈正相关关系,而第6、10和12残基电性性质则与抗菌效价呈显著的负相关关系;细胞穿膜肽的相关残基的物化性质及拓扑性质对其穿膜性能影响较大。应用VSTPV表征方法对以上体系进行构效关系研究亦取得了较优的建模和预测结果,且得出影响活性关键氨基酸位点与VHESH模型结果基本吻合。在以上研究基础上,根据最优定量构效关系模型,在模型应用域范围内分别设计了一系列全新分子,其预测活性与各体系最高预测活性相比均有不同程度提高。将VSTPV应用于含非天然氨基酸肽衍生物体系即血管舒缓激肽促进剂、牛乳清蛋白水解肽和弹性蛋白酶模拟底物的定量构效关系研究,取得了较好的结果。研究表明,血管舒缓激肽促进剂分子的第2、3残基相关拓扑信息与其生物活性呈强相关;牛乳铁蛋白水解肽的第6、8残基拓扑性质与其生物活性关系密切;弹性蛋白质模拟底物A、B残基部分变量的二次项和交互项对酶催化反应影响很大。应用定量构效关系相关理论和方法对蛋白质特性及功能预测进行了研究。基于VHESH和VSTPV结构表征基础上,对人免疫缺陷病毒蛋白酶裂解位点(HIV PR)、蛋白质磷酸化位点和蛋白质与RNA相互作用位点进行预测及特异性分析,取得了优于其他方法的预测结果。研究显示,HIV PR的第1、2、4、5和6残基的立体、氢键、电性及疏水性质或对应的拓扑性质是HIV PR被识别重要因素;磷酸化位点序列的P-3位点物化性质(VHESH)及其拓扑性质(VSTPV)对S、T和Y位点磷酸化影响最大;与RNA相互作用的蛋白质序列第2、5、6残基立体、疏性、电性和拓扑信息对RNA和蛋白质相互作用位点影响较大。构效关系建模方法与技术是定量构效关系研究的一个重要内容。本文比较了多元线性回归(MLR)、偏最小二乘(PLS)、线性判别分析(LDA)及支持向量机(SVM)等方法在肽及蛋白质结构与功能关系研究中应用。结果表明,MLR在满足相关条件前提下,通常可以取得较好结果;PLS可较好地解决变量数较多且存在多重共线性情况;LDA用于模式识别效果好,模型易解释;SVM能较好地解决小样本、非线性、高维数和局部最小等问题。此外,为提高模型质量,采用多元线性逐步回归(SMR)、遗传算法(GA)筛选变量。研究发现,这两方法能较好地删除原始变量中噪音信息。模型质量评价及其应用域现已成为建模方法学中的一个关键性问题。文中将全部样本划分为训练集和预测集两个部分,由训练集样本建立QSAR模型,通过内部和外部双重验证来对模型进行质量评价。采用的内部验证方法有留一法(leave one out, LOO)、留组法(leave 1/n out,LNO)、留多法(leave many out, LMO)以及Y随机排序验证(Y random permutations test)。在内部验证基础上,通过多种评价函数对模型的外部预测能力进行评价,以确保模型的真实有效性。在此基础上,以样本的X空间标准化模型距离为依据确定了模型的应用域,避免模型外推后给活性预测带来的较大误差及不确定性。

【Abstract】 Structural characterization is crucial to performing quantitative structure-activity relationship (QSAR) studies for peptides and proteins. Major information of structure and function for peptides and proteins is contained in their amino acid sequences. Therefore, characteristics of the amino acid residues for peptides and proteins are of great significance to their QSAR study. Two kinds of amino acid descriptors, i.e. principal component score vector of hydrophobic, electronic, steric, hydrogen bond properties (VHESH) and principal component score vector of structural and topological variables (VSTPV), were extracted from principal component analysis (PCA). VHESH was derived from PCA of independent families of 50 hydrophobic properties, 23 electronic properties, 35steric properties, and 5 hydrogen bond properties, respectively, which were in total 113 physicochemical properties of 20 coded amino acids. With regard to each amino acid, VHESH1 and VHESH2 are related to hydrophobic properties, VHESH3~VHESH6 indicate electronic properties, VHESH7 and VHESH8 denote steric properties, VHESH9 and VHESH10 are hydrogen bond properties. VSTPV was derived from PCA of 85 structural and topological variables of 166 coded and non-coded amino acids. VHESH is physico-chemically interpretable and more informative in comparison with z-scales and other amino acid descriptors, and VSTPV is easy to compute, and experiment-independent can be easily expanded to other non-coded amino acids.VHESH and VSTPV were applied to study structural descriptions of several functional peptides, including angiotensin-converting enzyme inhibitors, oxytocin analogues, decapeptides binding to SH3 domain of human protein Amphiphysin-1, cationic antimicrobial peptides, and cell-penetrating peptides. Robust and predictive QSAR models were obtained by various modeling techniques and methods. The VHESH model was showed that bioactivities of angiotensin converting enzyme inhibitors could be enhanced by increasing electronic and hydrophobic properties of the 2nd residue, steric properties of the 1st residue and so on. In addition, their activities might be decreased by improving electronic properties of the 1st residue. It was inferred that activities of oxytocin analogues might be highly positive correlation with the electronic and hydrophobic properties of the 1st residue, steric and hydrogen bond contribution properties of the 3rd residue, and highly negative correlation with hydrophobic, electronic and steric properties of the 2nd residue. Diversified properties of the residues between the P-3 site and the P2 site for the decapeptide (P4P3P2P1P0P-1P-2P-3P-4P-5) may remarkably contribute to the interactions between human Amphiphysin-1 SH3 domain and the decapeptide. It can be found that electronic properties of the 3rd residue, steric properties of the 6th, 7th and 12th residues, hydrophobic properties of the 11th and 12th residues exert highly positive effects on the activities of antimicrobial peptides, and electronic of the 6th, 10th and 12th residues negatively contribute to the activities antibacterial activities. Different structural information of cell-penetrating peptides may be highly correlated to the penetrating process. Many new peptide sequences can be designed based on their structure and activities relationships in these peptides panels. The VSTPV modeling results showed similar results with VHESH models in explanation of the relationships between sequence site and bioactivities.VSTPV was applied to investigate structural description of several peptides and analogues, including bradykinin-potentiating pentapeptides, bovine lactoferricin-(17–31)-pentadecapeptide, and elastase substrate analogues. Robust and predictive QSAR models were developed using various modeling techniques and methods. The results showed that the activities of bradykinin potentiating pentapeptides were mainly related to its topological information of the 2nd and 3rd residues. It can be found that the 6th and 8th topological variables contribute significantly to bovine lactoferricin-(17-31)-pentadecapeptide bioactivities. The square and reciprocation of topological variables in the residues A and B mainly have effects on elastase substrate analogues catalytic activities.The principles and methodologies of QSAR were employed to investigate the relationship between protein structure and property or function. VHESH and VSTPV were applied to characterize amino acid sequences of proteins, including cleaved site of HIV-1 protease (HIV PR), phosphorylation site of protein, and RNA binding sites in proteins. It was inferred that HIV PR may only recognize several key properties of various sites in the octameric sequences. These diversified properties including steric properties, hydrogen bond properties, electronic properties, hydrophobic properties and topological properties of the 1st, 2nd, 4th, 5th and 6th residues and so on may be important to determine HIV PR cleavage. The physicochemical properties (VHESH) and topological properties (VSTPV) of P-3 site near the S, T, and Y sites were significant to predicting phosphorylated S, T and Y sites. Remarkable influences were derived from the steric, hydrophobic, electronic and topological properties of the 2nd, 5th, 6th sites in the motif with the 11 residues in protein sequences, and little remarkable influences were from the other sites. This point displayed that these properties may be key features for recognization of the RNA binding region.The modeling methods and related techniques are also important to the success of QSAR studies. The modeling and the pattern recognition methods, such as multiple linear regression (MLR), partial least squares (PLS), linear discriminant analysis (LDA) and SVM were discussed in this dissertation. The results showed that MLR behaved as well as other modeling methods if its application conditions were meeted. PLS can well avoid harmful effects by the multi-collinearity in modeling, and be particularly fit for the regression when the sample size is less than the number of variables. Models are robust and interpretable by LDA. As a new machine learning arithmetic, SVM can well deal with small dataset, nonlinear optimization, high-dimensional feature space, local minimization and so on. Besides, stepwise multiple regression (SMR) and genetic algorithm (GA) were used to optimize variable subsets. The results indicated that variable selection can efficiently avoid noise in the original variable set.The QSAR models were then subjected to validation and evaluation. In this dissertation, dataset were firstly divided into training and test dataset. The training dataset was utilized to establish QSAR models. Leave-one-out (LOO) cross validation (CV), leave-1/n-out (LNO) CV, leave-many-out (LMO) CV and Y random permutation test were used to perform internal validation of the QSAR models. Based on internal validation, external validation was performed by test dataset. Several evaluation functions were used to evaluate predictive power of the results of QSAR models. Besides, the error evaluation of the predictive activities of designed molecules was also fulfilled with model applicability domain in this dissertation.

  • 【网络出版投稿人】 重庆大学
  • 【网络出版年期】2009年 12期
节点文献中: