Friday, October 28, 2011
Hall 1-2 (San Jose Convention Center)
As genetic association mapping rapidly evolves, new insights in the field recognize complex human traits as more heavily impacted by higher-order models of disease risk than initially assumed, attributing dozens, or even hundreds, of genetic variants to disease etiology. With advanced statistical and computational modeling, these genetic variants may be applied as clinical predictors. However, the high-dimensionality of these models confronts both traditional and modern data-mining approaches with important challenges in respect to variable selection and model identification. Whereas new approaches including Multifactor Dimensionality Reduction (MDR) have been tested with as many as five disease-associated genes, little is known regarding its performance with higher dimensional risk models. Through the use of simulations, MDR’s statistical integrity with high-dimensional risk models will be evaluated, and the sample sizes needed to model a range of these high-order effects will be empirically estimated. The use of other classifiers, including traditional statistical approaches such as logistic regression, will be evaluated in parallel and compared to the results of MDR. This study will utilize the processing power at NCSU’s High Performance Computing (HPC) center to allow feasible implementations of these computationally-intensive empirical comparisons.