节点文献

并行遗传算法在DNA杂交测序中的应用研究

Research on Parallel GA for DNA Sequencing by Hybridization

【作者】 袁倩倩

【导师】 谢红薇;

【作者基本信息】 太原理工大学 , 计算机应用技术, 2010, 硕士

【摘要】 由基因序列解开一切生命的奥秘是生物信息学的研究目标,而DNA测序是了解基因结构与功能的基础,是生物信息学研究的起点。人类基因组计划的启动推进了生物信息学的飞速发展。2008年7月2日,美国能源部联合基因组研究所宣称其群体测序项目将在2009年支持DNA测序计划。上述表明,生物基因组序列的快速测序一直是一种迫切的需求。杂交测序是目前使用最广的一种测序方法,它分为杂交实验和序列重构两个步骤。杂交实验由于技术和条件的限制往往出现两类错误,由此得到的非理想谱集就给序列重构带来了极大的困难。已经证明含有错误的序列重构问题是NP-hard问题。国内外已有许多关于杂交问题的研究文献。求解杂交测序问题主要有精确算法(分支定界算法,动态规划方法等)和启发式搜索方法(禁忌搜索算法,遗传算法,蚁群算法,模拟退火算法等)两类。比较之下,精确算法只在序列长度较小时适用,而启发式搜索方法因不依赖于问题空间得到了更多的应用。遗传算法是基于达尔文生物进化论的自然选择学说和群体遗传学原理而开发出的一种随机全局搜索与优化的自适应智能算法。遗传算法不受搜索空间的限制性假设的约束。已验证在求解含有错误的DNA测序问题上优于其他算法。由于遗传算法固有的并行性非常适用于大规模并行计算,所以并行遗传算法往往能提高运算速度,改善求解性能。基于以上所述,本文的意图在于改进现有的并行遗传算法,将其应用到DNA杂交测序问题中,使得重构的DNA序列更加精确,花费的时间更少。本文首先对问题建模,利用现有的求解该问题的遗传算法模型,确定进化策略;然后改进并行遗传算法,对模型进行优化求解;最后搭建机群环境,并对计算结果进行分析讨论。本文的核心在于并行遗传算法框架的选择和迁移操作的改进两方面。本文分析了四种并行模型的通信方式,结合DNA杂交测序问题,选择使用粗粒度-主从式双层并行框架;分析了迁移操作对并行框架的影响,提出一种新的迁移策略。通过实验验证,新的迁移策略有效地减少了通信开销。本文的分析研究为使用并行遗传算法对复杂的生物工程实际问题进行计算提供了参照依据,对并行遗传算法的推广和更有效地应用具有实际价值。

【Abstract】 The research target of bioinformatics is to reveal the secret of life. DNA sequencing is the foundation of finding out the structure and function of genes, so it is the research starting point of bioinformatics. The startup of human genome project rapidly advanced bioinformatics. On July 2, 2008, U.S. Department of Energy Joint Genome Institute claimed that, their projects, CSP, would support DNA Sequencing plan in 2009. These show that rapid sequencing biological genome has been an urgent demand.Sequencing by hybridization is the most widely used method, which is divided into hybrid experiment and sequence reconstruction. Because of the technology limitation, hybrid experiment often includes two kinds of errors. Errors will bring a lot of difficulties in the sequence reconstruction. DNA sequencing problem with errors in the hybridization data has been proved to be strongly NP-hard.There is much research literature about SBH problem at home and abroad. Methods which solve the SBH problem are exact algorithms (branch and bound algorithm, dynamic programming method, etc.)and heuristic methods (tabu search algorithm, genetic algorithm, ant colony algorithm, simulated annealing algorithm, etc.). The exact algorithms are used only when the length of sequence is small, however, the heuristic methods got more applications because they don’t depend on the problem space.Genetic algorithm is an adaptive global optimization algorithm, which simulates biological evolution process in the natural environment. Genetic algorithm is not under the constraint of the restrictive assumption from search space. It has been proved that, genetic algorithm is superior to the other algorithms in solving the DNA sequencing problems with errors. Due to the intrinsic parallel property, genetic algorithm is suitable to solve large-scale parallel computation problems. Parallel genetic algorithm can often increase speed and improve solution performance.Based on the above mentioned, the intention of this paper is obvious, which is to improve existing parallel genetic algorithm, and apply it in DNA sequencing problem to achieve accurate solutions and spent less time. First, the SBH problem is modeled. Then, the evolutionary strategy is determined based on existing genetic algorithm model. Next, new parallel genetic algorithm framework to optimize the solution is designed and the calculation results are discussed. Experimental results show that, the parallel genetic algorithm obtains more optimal solution than genetic algorithm in solving the SBH problem. The core points of the paper are selection of the parallel genetic algorithm framework and improvements of the migration operation. The parallel genetic algorithm has four kinds of implementation framework. Combining with DNA sequencing problem, the communication mode of four models is analyzed, and the coarse grain and master-slave double-level parallel framework is chose to use. Then, the effect of the migration on the parallel operation is considered, and a new migration policy is proposed. Experimental results show that the new migration strategy effectively reduce the communication overhead.Parallel genetic algorithm should be applied in solving complex biological engineering problem. The research on the parallel genetic algorithm for DNA sequencing by hybridization provides theoretical basis for the application and spreads it greatly.

节点文献中: 

本文链接的文献网络图示:

本文的引文网络