节点文献

基于群体智能的特征选择算法在SELDI质谱数据分析中的研究

The Research of Feature Selection Algorithm Based on Swarm Intelligence in SELDI Mass Spectral Data Analysis

【作者】 张蓉

【导师】 冯斌;

【作者基本信息】 江南大学 , 计算机应用与技术, 2009, 硕士

【摘要】 特征选择是生物信息学各个应用领域建模任务的前提。这些领域如生物序列分析、微阵列数据分析及质谱数据分析等都存在高维小样本和内部空间疏散的特性,由于小样本数据存在其固有的危险:不精确和过拟合,因而数据分析面临着巨大的挑战。结合生物信息学应用领域这些具体的特点,各种新的稳定行和鲁棒性好的特征选择算法不断地被提出。质谱技术能够检测生物样本(组织和细胞抽取物、血液、尿液等),获得样本中目标蛋白的分子量。因此,该方法能够识别出与疾病相关的模式,从而为寻找疾病标记物、特异的治疗疾病的靶分子、药物开发和疾病的诊断、治疗等提供重要的、直接的线索。本文系统地研究了SELDI-TOF质谱的数据分析,并将群体智能优化算法结合支持向量机(SVM)应用于质谱数据分析的生物标记物特征选择中。主要工作分为以几个方面:1)对国际上目前的研究前沿SELDI-TOF质谱技术进行理论研究,归纳了比较了SELDI-TOF质谱数据分析中的预处理方法和生物标记物选择方法,并总结了质谱技术存在的问题和发展方向。2)对群体智能算法,特别是蚁群算法(ACO)、粒子群算法(PSO)、及对应的改进算法的基本原理进行研究,为以后的学习应用提供了理论基础。3)将特征的权重因子作为ACO算法搜索过程中的先验信息,结合支持向量机(SVM)用于筛选血清蛋白相关生物标记物,该方法建立的癌症诊断模型取得了较好的分类性能测试仿真结果。4)将基于量子粒子群算法(QPSO)、ACO算法和粒子群算法(PSO)分别与SVM结合,并将建立的诊断模型用于生物标记物的选择。通过实验表明,基于量子粒子群算法建立的模型不仅具有良好的预测精度而且在速度上有大幅的提高,因此,具有一定的理论意义和实用价值。最后对本论文的主要研究成果进行了总结,并对有待进一步研究的方向进行了展望。

【Abstract】 Applying feature selection (FS) techniques in bioinformatics has become a real prerequisite for model building. In particular, the high dimensional and small sample sizes natures of many modeling tasks in bioinformatics, going from sequence analysis over microarray analysis to spectral analyses and literature mining has given rise to a wealth of feature selection techniques being presented in the field. Small sample sizes and their inherent risk of imprecision and overfitting pose a great challenge for many modeling problems in bioinformatics. Specific applications in bioinformatics have led to a wealth of newly proposed techniques.Mass spectrometry (MS) technology is used to measure the mixture of proteins/peptides of biological tissues or fluids, such as serum or urine. Such measurements can be used to identify disease-related patterns, which hold potential for early diagnosis, prognosis, monitoring disease progression, response to treatment and drug target research.Comprehensive analyses on Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF-MS) data analyses are mainly discussed in our work and the application of swarm intelligence algorithm combined with SVM in biomarker selection is also studied in the work. The main contents of this dissertation are as follows:(1) The thesis researched fundamental principle of SELDI-TOF-MS technology and summarized various methods of its two main phases: pre-processing and biomarker selection. And its shortcomings and progress are discussed here.(2) Research on fundamental principle of Ant Colony Optimization Algorithm (ACO), Particle Swarm Optimization Algorithm (PSO) and their improved methods provides theoretical principles for further learning.(3) New method is raised using weighting factor as prior information in the ant colony optimization searching process. Combined with support vector machines (SVM), it was applied to identify relevant serum proteomic biomarkers. Experiments proposed method has strong power in distinguishing cancer patients from healthy individuals.(4) Combined SVM with QPSO, ACO and PSO, and using the models biomarkers selection, the experiments show that model built by QPSO achieved not only high prediction accuracy but also extremely fast velocity, so the proposed method QPSO-SVM has a certain good theoretical and utility value.The main contributions of this paper are summarized and the further researches on work are suggested at the end of this dissertation.

  • 【网络出版投稿人】 江南大学
  • 【网络出版年期】2010年 05期
节点文献中: 

本文链接的文献网络图示:

本文的引文网络