Prognosis of muscular dystrophy with extrinsic and intrinsic descriptors through ensemble learning

Prognosis of muscular dystrophy with extrinsic and intrinsic descriptors through ensemble learning

Muscular dystrophy is a neuromuscular disorder that impairs the functioning of the locomotive muscles. Large deletion and duplication mutations in the gene sequences pave the way for these muscular dystrophies. Any heritable change can be used as input in computational studies such as pattern and classi cation models. Mutated gene sequences are generated by adopting the positional cloning approach on the reference cDNA sequence with mutational information from the Human Gene Mutational Database (HGMD). The extrinsic and intrinsic descriptors of the mutated gene sequence are indispensable to identifying the disease. This work describes a computational approach of building a disease classi cation model by extracting the exonic and intronic descriptors from the mutated gene sequences through a combined learning technique. An ensemble hybrid model is developed through LibD3C classi er. The hybrid learned model gained an accuracy of 98.3% in diagnosing the neuromuscular disorder, based on deletion and insertion/duplication mutations. Furthermore, this paper analyzes the implementation of ensemble-learning classi ers based on features related to synonymous and nonsynonymous mutations, in order to detect muscular dystrophy performed with the same data set. Experiments showed high accuracy for the models built using LibD3C classi er, which proves that ensemble learning is effective for predicting disease. To the best of our knowledge, for the rst time the models established here explore a scheme of disease prediction through pattern recognition from the sequence of nucleic acid molecule and associated mutations.

___

  • [1] Fajkusova L, Lukas Z, Tvrdikova M, Kuhrova V, Hajek J, Fajkus J. Novel dystrophin mutations revealed by analysis of dystrophin mRNA alternative splicing suppresses the phenotypic effect of a nonsense mutation. Neuromuscular Disord 2001; 11: 133-138.
  • [2] Zubrzycka-Gaarn EE, Bulman DE, Karpati G, Burghes AH, Belfall B, Klamut HJ, Talbot J, Hodges RS, Ray PN, Worton RG. The Duchenne muscular dystrophy gene product is localized in sarcolemma of human skeletal muscle. Nature 1988; 333: 466-469.
  • [3] Kann MG. Advances in translational bioinformatics: computational approaches for the hunting of disease genes. Brief Bioinform 2009; 11: 96-110.
  • [4] Tranchevent LC, Capdevila FB, Nitsch D, De Moor B, De Causmaecker P, Moreau Y. A guide to web tools to prioritize candidate genes. Brief Bioinform 2010; 12: 22-32.
  • [5] Jones KJ, North KN. Recent advances in diagnosis of childhood muscular dystrophies. J Pediatr Child Health 1997; 33: 195-201.
  • [6] Koenig M, Hoffman EP, Bertelson CJ, Monaco AP, Feener C, Kunkel LM. Complete cloning of the Duchenne muscular dystrophy (DMD) cDNA and preliminary genomic organization of the DMD gene innormal and affected individuals. Cell 1987; 50: 509-517.
  • [7] Charif D, Thioulouse J, Lobry JR, Perriere G. Online synonymous codon usage analyses with the ade4 and seqinR packages. Bioinformat Oxford J 2005; 21: 545-547.
  • [8] Ban HJ, Heo JY, Oh KS, Park KJ. Identi cation of Type 2 diabetes-associated combination of SNPs using support vector machine. BMC Genet 2010; 23: 11-26.
  • [9] Uhmn S, Kim DH, Ko YW, Cho S, Cheong J, Kim J. A study on application of single nucleotide polymorphism and machine learning techniques to diagnosis of chronic hepatitis. Expert Syst 2009; 26: 60-69.
  • [10] Briggs FB, Ramsay PP, Madden E, Norris JM, Holers VM, Mikuls TR, Sokka T, Seldin MF, Gregersen PK, Criswell LA et al. Supervised machine learning and logistic regression identi es novel epistatic risk factors with PTPN22 for rheumatoid arthritis. Genes Immunol 2010; 11: 199-208.
  • [11] Nicodemus KK, Callicott JH, Higier RG, Luna A, Nixon DC, Lipska BK, Vakkalanka R, Giegling I, Rujescu D, Muglia P et al. Evidence of statistical epistasis between DISC1, CIT and NDEL1 impacting risk for schizophrenia: biological validation with functional neuroimaging. Human Genet 2010; 127: 441-452.
  • [12] Gonzalez-Navarro FF, Belanche-Mu~noz LA, Silva-Colon KA. Effective classi cation and gene expression pro ling for the facioscapulohumeral muscular dystrophy. PLoS ONE 2013; 8(12).
  • [13] Ma J, Nguyen MN, Rajapakse JC. Gene classi cation using codon usage and support vector achines. IEEE-ACM T Comput Bi 2009: 1: 1545-5963.
  • [14] Nisha CM, Bhasker P, Pardasani KR. SVM model for classi cation of genotypes of HCV using relative synonymous codon usage. J Adv Bioinform Appl 2012; 3: 357-363.
  • [15] Falk CT, Gilchrist JM, Pericak Vance MA, Speer MC. Using neural networks as an aid in the determination of disease status: comparison of clinical diagnosis to neural-network predictions in a pedigree with autosomal dominant Limb-Girdle muscular dystrophy. Am J Hum Genet 1998; 62: 941-949.
  • [16] Wang C, Ha S, Xuan J, Wang Y, Hoffmann E. Computational analysis of muscular dystrophy sub-types using a novel integrative scheme. Neurocomputing 2012; 1: 9-17.
  • [17] Zou Q, Wang Z, Guan X, Liu B, Wu Y, Lin Z. An approach for identifying cytokines based on a novel Ensemble classi er. Hindawi Publishing Corporation 2013: 686090.
  • [18] Kalari KR. Computational approach to identify deletions or duplications within a gene. PhD, University of Iowa, Iowa City, IA, USA, 2006.
  • [19] Wu J, Zhang W, Jiang R. Comparative study of ensemble learning approaches in the identi cation of disease mutations. In: IEEE 2013 International Conference on Biomedical Engineering and Informatics; 16{18 October 2010; Yantai, China. New York, NY, USA: IEEE. pp. 2306-2310.
  • [20] Sathyavikasini K, Vijaya MS. Predicting muscular dystrophy with sequence based features for point mutations. In: 2015 IEEE Conference on Research in Computational Intelligence and Communication Networks; 20{22 November 2015; Kolkata, India. New York, NY, USA: IEEE. pp. 235-240.
  • [21] Sathyavikasini K, Vijaya MS. Muscular dystrophy disease classi cation using relative synonymous codon usage. Int J Mach Learn Comput 2016; 6: 139-144.
  • [22] Stenson PD, Mort M, Ball EV, Shaw K, Phillips A, Cooper DN. The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum Genet 2013; 133: 1-9.
  • [23] Lin C, Chen W, Qiu C, Wu Y, Krishnan S, Zoua Q. LibD3C: Ensemble classi ers with a clustering and a dynamic strategy. Els Neurocomput 2014; 123: 424-435.
  • [24] Nasierding G, Kouzani AZ, Tsoumakas G. A triple-random ensemble classi cation method for mining multi-label data. In: IEEE 2010 International Conference on Data Mining Workshops; 13 December 2010; Sydney, NSW, Australia. New York, NY, USA: IEEE. pp. 667-685.
  • [25] Yan R, Tesic J, Smith JR. Model-shared subspace boosting for multi-label classi cation. In: ACM 2007 International Conference on Knowledge Discovery and Data Mining; 12{15 August 2017; San Jose, CA, USA. New York, NY, USA: ACM, pp. 834-843.
  • [26] Zhou ZH, Wu J, Tang W. Ensembling neural networks: many could be better than all. Artif Intell 2002; 137; 239-263.
  • [27] Witten IH, Frank E, Trigg L, Hall M, Holmes G, Cunningham SJ. Weka: Practical Machine Learning Tools and Techniques with Java Implementations. Working Paper No. 99/11, 1999.
Turkish Journal of Electrical Engineering and Computer Sciences-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: 6
  • Yayıncı: TÜBİTAK