Approximation for Multiple Sequence Alignments with optimization methods

Friday, October 28, 2011
Hall 1-2 (San Jose Convention Center)
Fernando Samano , Heritage University, Toppenish, WA
John Tsiligaridis, PhD , Heritage University, Yakima
One of the fundamental issues in Computational Biology is Multiple Sequence Alignment (MSA). Dynamic Programming (DP) is suitable for MSA only when the length and the number of sequences are small. We need heuristics to compute the MSA. The star-alignment focuses on finding a central sequence based on pairwise alignments. An algorithm for finding more than one center, the multiple center algorithm (MCA) has been developed. For this purpose, the chain alignment (CA) with its advantages is introduced. MCA can discover sets of closer subsequences that may be also used for other alignments. Moreover, a pilot method for one or more star alignment explorations using constraints (PMCA) is developed. The pilot method defines the directions for the discovery of stars’ sequences. A theorem shows that a group of star sequences remains the same no matter what the direction from the starting point is. The criterion of having more than one stars and upper bounds are examined. The MSA is also achieved by using the Traveling Salesman Problem (TSP). The TSP simply lists all the MSA, and picks out the best , and picks out the best in an exhaustive inefficient and impractical search. To obtain good solutions in reasonable time, avoiding exactly optimal solutions, a Genetic Algorithm (GA) is developed. GA can solve NP-complete problems without exploring all the possible solutions. A serial and a parallel algorithm for finding MSA are developed by solving TSP using GA. The complexity of all proposed algorithms is examined. Simulation results are provided.