Supplementary Material for: Alternative Methods for H1 Simulations in Genome-Wide Association Studies
datasetposted on 28.03.2012, 00:00 by Perduca V., Sinoquet C., Mourad R., Nuel G.
Objective: Assessing the statistical power to detect susceptibility variants plays a critical role in genome-wide association (GWA) studies both from the prospective and retrospective point of view. Power is empirically estimated by simulating phenotypes under a disease model H1. For this purpose, the gold standard consists in simulating genotypes given the phenotypes (e.g.Hapgen). We introduce here an alternative approach for simulating phenotypes under H1 that does not require generating new genotypes for each simulation. Methods: In order to simulate phenotypes with a fixed total number of cases and under a given disease model, we suggest 3 algorithms: (1) a simple rejection algorithm; (2) a numerical Markov chain Monte-Carlo (MCMC) approach, and (3) an exact and efficient backward sampling algorithm. In our study, we validated the 3 algorithms both on a simulated dataset and by comparing them with Hapgen on a more realistic dataset. For an application, we then conducted a simulation study on a 1000 Genomes Project dataset consisting of 629 individuals (314 cases) and 8,048 SNPs from chromosome X. We arbitrarily defined an additive disease model with two susceptibility SNPs and an epistatic effect. Results: The 3 algorithms are consistent, but backward sampling is dramatically faster than the other two. Our approach also gives consistent results with Hapgen. Using our application data, we showed that our limited design requires a biological a priori to limit the investigated region. We also proved that epistatic effects can play a significant role even when simple marker statistics (e.g. trend) are used. We finally showed that the overall performance of a GWA study strongly depends on the prevalence of the disease: the larger the prevalence, the better the power. Conclusions: Our approach is a valid alternative to Hapgen-type methods; it is not only dramatically faster but has 2 main advantages: (1) there is no need for sophisticated genotype models (e.g. haplotype frequencies, or recombination rates), and (2) the choice of the disease model is completely unconstrained (number of SNPs involved, gene-environment interactions, hybrid genetic models, etc.). Our 3 algorithms are available in an R package called ‘waffect’ (‘double-u affect’, for weighted affectations).