Developing Profiles using a Hidden Markov Model and a Heuristic Method

Saturday, October 29, 2011
Hall 1-2 (San Jose Convention Center)
Jeremiah Schmidt, N/A , Heritage University, Toppenish, WA
John Tsiligaridis, PhD , Heritage University, Toppenish, WA
Multiple sequence alignment is among the most important tasks in computational biology. A profile is a standard topology for modeling sequence motifs. Profile Hidden Markov Model (PHMM) formalism models a shared pattern in biological sequences.  For the profile construction of a Hidden Markov Model (PHMMA), with variable length motifs, a heuristic algorithm is developed by determining the match, insert, delete states, and estimating the transition and emission probabilities.  After the construction of the PHMM via PHMMA the model has been trained using the Baum-Welch algorithm. The PHMM can also be used to evaluate a given sequence for membership in the family. This has been achieved via the straightforward application of the Viterbi algorithm. Comparison with the maximum likelihood detector using a theorem is also provided. PHMMs are position-specific, which allows their application to multiple sequences but also means that each PHMM must be trained to a given set of sequences. The behavior of various PHMMs to score a match to a HMM for finding the most probable alignments are examined and useful results are provided. Additionally, using the classical Motif formulation for a set of sequences and applying the Pointer Move Motif with Pruning Search Method (PMP), consensus can be found.  The consensus can be used to evaluate if a sequence is a member of a family of sequences. The PHMM can produce profiles that are an improvement over traditionally constructed profiles. The advantages of PHMM over the standard profiles and PMP are examined. Simulation results are provided.