Regression and data mining methods for analyses of multiple rare variants in the Genetic Analysis Workshop 17 mini-exome data

Joan E Bailey-Wilson; Jennifer S Brennan; Shelley B Bull; Robert Culverhouse; Yoonhee Kim; Yuan Jiang; Jeesun Jung; Qing Li; Claudia Lamina; Ying Liu; Reedik Mägi; Yue S Niu; Claire L Simpson; Libo Wang; Yildiz E Yilmaz; Heping Zhang; Zhaogong Zhang

doi:10.1002/gepi.20657

Regression and data mining methods for analyses of multiple rare variants in the Genetic Analysis Workshop 17 mini-exome data

Genet Epidemiol. 2011;35 Suppl 1(Suppl 1):S92-100. doi: 10.1002/gepi.20657.

Authors

Joan E Bailey-Wilson¹, Jennifer S Brennan, Shelley B Bull, Robert Culverhouse, Yoonhee Kim, Yuan Jiang, Jeesun Jung, Qing Li, Claudia Lamina, Ying Liu, Reedik Mägi, Yue S Niu, Claire L Simpson, Libo Wang, Yildiz E Yilmaz, Heping Zhang, Zhaogong Zhang

Affiliation

¹ Inherited Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Baltimore, MD 21224, USA. jebw@mail.nih.gov

Abstract

Group 14 of Genetic Analysis Workshop 17 examined several issues related to analysis of complex traits using DNA sequence data. These issues included novel methods for analyzing rare genetic variants in an aggregated manner (often termed collapsing rare variants), evaluation of various study designs to increase power to detect effects of rare variants, and the use of machine learning approaches to model highly complex heterogeneous traits. Various published and novel methods for analyzing traits with extreme locus and allelic heterogeneity were applied to the simulated quantitative and disease phenotypes. Overall, we conclude that power is (as expected) dependent on locus-specific heritability or contribution to disease risk, large samples will be required to detect rare causal variants with small effect sizes, extreme phenotype sampling designs may increase power for smaller laboratory costs, methods that allow joint analysis of multiple variants per gene or pathway are more powerful in general than analyses of individual rare variants, population-specific analyses can be optimal when different subpopulations harbor private causal mutations, and machine learning methods may be useful for selecting subsets of predictors for follow-up in the presence of extreme locus heterogeneity and large numbers of potential predictors.

Publication types

Research Support, N.I.H., Extramural
Research Support, N.I.H., Intramural

MeSH terms

Artificial Intelligence
Data Interpretation, Statistical
Data Mining
Exome
Genetic Predisposition to Disease / genetics*
Genetic Variation
Human Genome Project
Humans
Meta-Analysis as Topic
Molecular Epidemiology / methods*
Polymorphism, Single Nucleotide / genetics*
Regression Analysis*
Sequence Analysis

Abstract

Publication types

MeSH terms

Grants and funding