Supplementary Material for: A Cost-Effective Statistical Method to Correct for Differential Genotype Misclassification When Performing Case-Control Genetic Association

dataset

posted on 2017-03-23, 14:44 authored by Londono D., Haynes C., De La Vega F.M., Finch S.J., Gordon D.

Background/Aims: There is a growing interest regarding the effect of differential misclassification on power and type I error rate in genome-wide association studies. We present an extension of a previously published test statistic: the likelihood ratio test allowing for errors (LRT_AE). This test uses double-sample information on a subset of individuals to increase power for genetic association in the presence of nondifferential misclassification. Methods:We extend the original LRT_AE by allowing for differential genotype misclassification between case and control populations. We label this new statistic as LRT^D_A^M_E. We test the performance of this statistic with data simulated under differential misclassification specifications and two different types of genetic models: null and power. For simulations using the null model, we specify that there is no difference between case and control genotype frequencies before the introduction of errors. For simulations under power, we consider three modes of inheritance: dominant, multiplicative, and recessive. Results: We show that the LRT^D_A^M_E, with p values computed using permutation, maintains a correct type I error rate under the null model after the introduction of differential genotyping errors. Also, we find that as little as 10 to 15% of double-sampled genotype data is needed to achieve this effect. Aside from a few situations (particularly recessive mode of inheritance simulations) the LRT^D_A^M_E version that calculates p values through permutation requires 15 to 20% double sampling to maintain an 80% power for a 0.05 significance level and approximately 20% double sampling for a 0.01 significance level.