Multiple Hypothesis Testing in a Genome Wide Association Study of Bovine Tuberculosis
Genom tabanlı ilişki çalışmaları (GTİÇ) kullanılarak çiftlik hayvanlarının verimleri ile ilişkili tekil nükleotid polymorfizmler (TNP) belirlenebilmektedir. GTİÇ'den elde edilecek sonuçlar hem fenotip hem de genotip veri setlerinin kalitesine bağlı olacaktır. Populasyon tabakası, çoklu hipotez düzeltim yöntemleri ve kalite kontol süreçleri yanlış pozitif (veya negatif) ilişki sonuçlarına yol açabilir. Bu çalışmanın amaçları: bir sığır tüberküloz GTİÇ veri setine 1) değişik kalite ölçütleri ve çoklu hipotez düzeltme yöntemlerinin 2) bazı TNP regresyon yöntemleri ile atasal tabakaların belirlenmesi ve düzeltilmesinin etkilerinin incelenmesidir. Çoklu hipotez düzeltmesi olmadan TNP regresyon modeli ile 2, 7, 8 ve 13. kromozomdan önemli TNP'ler (P
Sığır Tüberkülozu İçin Çoklu Hipotez Düzeltmesi İle Genom Tabanlı İlişki Analizi
Genome-wide association studies (GWAS) have been used to detect single nucleotide polymorphisms (SNPs) related to various animal traits. The outcome of GWAS is based on quality of the both phenotypic and genotypic datasets. False positive (or negative) associations can be obtained due to multiple hypothesis testing procedures, quality control measures, or an undetected population structure. The objectives of this study were to 1) investigate different multiple hypothesis testing procedures with different quality measures and 2) to detect and correct ancestral stratification using different single SNPs models of the bovine tuberculosis GWA data set. Based on a regression model, SNPs from chromosomes 2, 7, 8 and 13 were detected at a significance level of P<0.001 without correction for multiple hypothesis testing. However, after Bonferroni correction, Hochberg's method and permutation test for multiple hypothesis correction genomic signals, it became non-significant. Only a false discovery rate approach detected weak signals (at level of 0.54) from chromosomes 2, 8, and 13. We used a model that took into account the effect of linkage disequilibrium to the multiple hypothesis testing procedures by combining adjacent SNPs test statistics with windows sizes of 2, 4 and 6. We detected strong genomic signals from chromosomes 13, 8, 6 and 2 at windows size 6. The results of this study showed that multiple hypothesis testing procedures are related to false positive genomic signals. It is difficult to suggest universally acceptable multiple hypothesis testing and QC measures and their thresholds due to sources of variations between species and within populations. However, additional analytical approaches and studies are needed to evaluate the effects of linkage disequilibrium on the multiple hypothesis testing procedures and QC measures (especially for minor allele frequencies) to GWAS under various scenarios including, but not limited to, level of heritability, linkage disequilibrium, population structure, and population size.
___
- 1. Wasserstein RL, Lazar NA: The ASA's statement on p-values: Context, process, and purpose. Am Stat, 70, 129-133, 2016. DOI: 10.1080/00031305.2016.1154108
- 2. Karacaören B: A bayesian random walk approach for mapping dynamic quantitative trait. J App Non Dyn, 5, 105-115, 2016. DOI: 10.5890/ JAND.2016.03.008
- 3. Zaykin DV, Zhivotovsky LA, Westfall PH, Weir BS: Truncated product method for combining P-values. Genet epid, 22, 170-185, 2002. DOI: 10.1002/gepi.0042
- 4. Sham PC, Purcell SM: Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet, 15, 335-346, 2014. DOI: 10.1038/nrg3706
- 5. Kang G, Ye K, Liu N, Allison DB, Gao G: Weighted multiple hypothesis testing procedures. Stat Appl Genet Molec Biol, 8, 1-22, 2009. DOI: 10.2202/1544-6115.1437
- 6. Fu G, Saunders G, Stevens J: Holm multiple correction for large-scale gene-shape association mapping. BMC Genet, 15 (Suppl. 1): S5, 2014. DOI: 10.1186/1471-2156-15-S1-S5
- 7. Tabangin ME, Woo JG, Martin LJ: The effect of minor allele frequency on likelihood of obtaining false positives. BMC Proc, 3 (Suppl. 7): S41, 2009. DOI: 10.1186/1753-6561-3-S7-S41
- 8. Chen G, Lee SH, Robinson MR, Trzaskowski M, Zhu Z, Winkler TW, Day FR, Croteau-Chonka DC, Wood AR, Locke AE, Kutalik Z, Loos RJF, Frayling TM, Hirschhorn JN, Yang J, Wray NR, The Genetic Investigation of Anthropometric Traits (GIANT) Consortium, Visscher PM: Across-cohort QC analyses of GWAS summary statistics from complex traits. Eur J Hum Genet, 2016 (In press). DOI: 10.1038/ejhg.2016.106
- 9. Eu-ahsunthornwattana J, Howey RA, Cordell HJ: Accounting for relatedness in family-based association studies: Application to genetic analysis workshop 18 data. BMC Proc, 8, (Suppl. 1): S79, 2014. DOI: 10.1186/1753-6561-8-S1-S79
- 10. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D: Principal components analysis corrects for stratification in genome-wide association studies. Nat genet, 38, 904-909, 2006. DOI: 10.1038/ng1847
- 11. Aulchenko YS, De Koning DJ, Haley C: Genomewide rapid association using mixed model and regression: A fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics, 177, 577-585, 2007. DOI: 10.1534/genetics.107.075614
- 12. Amin N, Van Duijn CM, Aulchenko YS: A genomic background based method for association analysis in related individuals. PloS one, 2, e1274, 2007. DOI: 10.1371/journal.pone.0001274
- 13. Svishcheva GR, Axenovich TI, Belonogova NM, van Duijn CM, Aulchenko YS: Rapid variance components-based method for wholegenome association analysis. Nat Genet, 44, 1166-1170, 2012. DOI: 10.1038/ng.2410
- 14. Karacaören B: Admixture mapping of growth related traits in F 2 mice dataset using ancestry informative markers. J Bioinform Comput Biol, 12, 2014. DOI: 10.1142/S0219720014410108
- 15. Bermingham ML, Bishop SC, Woolliams JA, Pong-Wong R, Allen AR, McBride SH, Ryder JJ, Wright DM, Skuce RA, McDowell SWJ, Glass EJ: Genome-wide association study identifies novel loci associated with resistance to bovine tuberculosis. Heredity, 112, 543551, 2014. DOI: 10.1038/hdy.2013.137
- 16. Garnier T, Eiglmeier K, Camus JC, Medina N, Mansoor H, Pryor M, Duthoy S, Grondin S, Lacroix C, Monsempe C, Simon S, Harris B, Atkin R, Doggett J, Mayes R, Keating L, Wheeler PR, Parkhill J, Barrell BG, Cole ST, Gordon SV, Hewinson RG: The complete genome sequence of Mycobacterium bovis. Proc Natl Acad Sci USA. 100, 78-77, 82, 2003. DOI: 10.1073/pnas.1130426100
- 17. Cleveland MA, Hickey JM, Forni S: A common dataset for genomic analysis of livestock populations. G3: Genes Genom Genet, 2, 429-435, 2012. DOI: 10.1534/g3.111.001453
- 18. Szydlowski M, Paczynska P: QTLMAS 2010: Simulated dataset. BMC Proc, 5 (Suppl 3): S3, 2011. DOI: 10.1186/1753-6561-5-S3-S3
- 19.Turner S, Loren LA, Yuki B, Christopher SC, Dana CC, Andrew T. C, Mariza de A, Kimberly FD, Jonathan LH, Geoffrey H, Gail J, Lan J, Iftikhar JK, Rongling L, Hua L, Teri AM, Martha M, Catherine AM, Andrew NM, Daniel BM, Justin EP, Elizabeth WP, Luke VR, Russell AW, Rebecca LZ, Marylyn DR : Quality control procedures for genome-wide association studies. Curr prot hum genet, 1-19, 2011. DOI: 10.1002/0471142905.hg0119s68
- 20. Aulchenko YS, Ripke S, Isaacs A, Van Duijn CM: GenABEL: An R library for genome-wide association analysis. Bioinformatics, 23, 1294- 1296, 2007. DOI: 10.1093/bioinformatics/btm108
- 21. Endelman JB: Ridge regression and other kernels for genomicselection with R package rrBLUP. Plant Genome, 4, 250-255, 2011. DOI: 10.3835/plantgenome2011.08.0024
- 22.Balding DJ: A tutorial on statistical methods for population association studies. Nat Rev Genet, 7, 781-791,2006. DOI: 10.1038/nrg1916
- 23. Goeman JJ, Solari A: Multiple hypothesis testing in genomics. Stat Med, 33, 1946-1978, 2014. DOI: 10.1002/sim.6082
- 24. Johnson RC, Nelson GW, Troyer JL, Lautenberger JA, Kessing BD, Winkler CA, O'Brien SJ: Accounting for multiple comparisons in a genome-wide association study (GWAS). BMC genomics, 11, 1, 2010. DOI: 10.1186/1471-2164-11-724
- 25. Hu X, Zhang W, Zhang S, Ma S, Li Q: Group-combined P-values with applications to genetic association studies. Bioinformatics, 2016 (In press). DOI: 10.1093/bioinformatics/btw314
- 26. Lin WY, Lou XY, Gao G, Liu N: Rare variant association testing by adaptive combination of P-values. PloS one, 9 (1): e85728, 2014. DOI: 10.1371/journal.pone.0085728