Thursday, October 11, 2012: 6:55 PM
618 (WSCC)
Computational prediction of functional residues from protein 3D structure plays a significant role in studying mechanistic aspects of protein function, as well as understanding molecular cause of disease upon mutation. In this study, we propose two novel graph-based kernel methods, referred to as the label and edge mismatch graphlet kernels, for annotation of functional residues in protein structures. First, a protein contact graph is constructed from protein structures deposited in Protein Data Bank, where residues are represented by nodes, and edges represent connections between neighboring residues. Next, our algorithm counts all labeled non-isomorphic subgraphs (or graphlets) that pivot around a pre-defined residue and satisfy specific label and edge mismatch measures. Moreover, we incorporate evolutionary conservation information into the graphlets labeling. Finally, similarity between two vertices is determined as the inner product of their respective count vectors, and subsequently used in a supervised learning framework to classify protein residues. We report experiments on four residue-level function prediction datasets: identification of catalytic residues, identification of zinc-binding sites and DNA-binding sites, and phosphorylation site prediction. Our graphlet kernels performed as good as or better than established sequence and structure-based approaches. Additionally, we present evidence that the proposed graphlet-based methods account for structural flexibility while efficiently capturing neighborhood similarities in protein structures.