Extension of Grammatical Evolution Decision Trees for Family Trio Data

Saturday, October 29, 2011
Hall 1-2 (San Jose Convention Center)
Holly Petruso , Statistics, North Carolina State University, Raleigh, NC
Alison Motsinger-Reif , Statistics, North Carolina State University, Raleigh, NC
With today's advanced genotyping technologies, the number of genetic variants per individual that are available for disease-mapping studies is exponentially increasing, posing an important computational problem. Current analytical methods are computationally infeasible in the face of the combinatorial explosion created when considering complex genetic models in high-dimensional datasets generated by these new technologies.  Evolutionary computation approaches have shown promise in addressing such high-dimensional combinatoric problems. However, these have largely been applied only to genetic data on unrelated individuals (i.e. case-control data).  In this study, an evolutionary computation method that uses grammatical evolution to evolve decision trees (GEDT) is extended to consider trios, in which disease cases and their respective parents are collected for gene-mapping. Using previously-developed simulation software, we implement North Carolina State University’s super-computing cluster to evaluate the ability of GENN to identify disease-associated loci in trio data. We then characterize its performance by comparing its output to a range of complex models. Finally, we extend and optimize the existing grammatical evolution algorithm in order to create distributable software for these cutting-edge methods.