Classification of healthy controls and Covid-19 cases established on transcriptomic analysis using proposed ensemble model

Classification of healthy controls and Covid-19 cases established on transcriptomic analysis using proposed ensemble model

COVID-19, which is a highly contagious disease, has different symptoms in humans. Therefore, the scientific and genetic status of the virus should be clarified as soon as possible. This study aims to classify COVID-19 and determine the important genes related to the disease by applying the ensemble learning techniques on the public COVID-19 dataset. The data set consists of 579 genes belonging to 32 individuals. While 10 of these people are not COVID-19, 22 are people with COVID-19. In this study Lasso, one of the feature selection methods was used. The ensemble learning methods (Bagging, Boosting, and Stacking) were applied to the public dataset. The performance of the models used was evaluated with accuracy, sensitivity, specificity, positive predictive value, and negative predictive value. Of the constructed ensemble models, the Stacking technique produced the best classification performance compared to the Bagging and Boosting methods. Accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 score obtained from the Stacking technique were 99.85%, 99.91%, 99.82%, 99.64%, 99.95%, and 99.89respectively. CD22, CD19, C4BPA, ARHGDIB, AICDA, CCR5, CCL7, CCL26, CCL22 and CCL16 genes calculated from the Stacking method were the most important genes related to COVID-19. The genes determined from the model may be determinants for early diagnosis and treatment of the COVID-19 disease.

___

  • 1. Lu R, Zhao X, Li J, et al. Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. Lancet. 2020;395:565-74.
  • 2. Rothan HA, Byrareddy SN. The epidemiology and pathogenesis of coronavirus disease (COVID-19) outbreak. J Autoimmun. 2020:102433:1- 4.
  • 3. Vetrugno L, Baciarello M, Bignami E, et al. The “pandemic” increase in lung ultrasound use in response to Covid-19: can we complement computed tomography findings? A narrative review. J Ultrasound. 2020;12:1-11.
  • 4. Timurkan MÖ, Aydın H. Transmission and replication dynamics of SARS CoV-2. Eurasian J Vet Sci.2020;17-22.
  • 5. Chung HM, Gray P. Data mining. Manag Inf Syst. 1999;16:11-6.
  • 6. Akman M, Genç Y, Ankarali H. Random forests yöntemi ve sağlık alanında bir uygulama. Turkiye Klinikleri J Biostat. 2011;3:36-48.
  • 7. Zhang C, Ma Y. Ensemble machine learning: methods and applications: Springer; 2012.
  • 8. Hsieh S-L, Hsieh S-H, Cheng P-H, et al. Design ensemble machine learning model for breast cancer diagnosis. J Med Syst. 2012;36:2841-7.
  • 9. https://www.ebi.ac.uk/arrayexpress/search.html?query=COVID-19+.
  • 10. Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007;23:2507-17.
  • 11. Van Der Maaten L, Postma E, & Van den Herik J. Dimensionality reduction: a comparative. J Mach Learn Res. 2009;10:13.
  • 12. Fonti V. Research Paper in Business Analytics: Feature Selection with LASSO. Amsterdam: VU Amsterdam. 2017.
  • 13. Sperandei S. Understanding logistic regression analysis. Biochem Med. 2014;24:12-8.
  • 14. Haykin S. Neural Networks, a comprehensive foundation, Prentice-Hall Inc. Upper Saddle River, New Jersey. 1999;7458:161-75.
  • 15. Johnson KW, Torres Soto J, Glicksberg BS, at al. Artificial intelligence in cardiology. J Am Coll Cardiol. 2018;71:2668-79.
  • 16. Kaya MO. Performance Evaluation of Multilayer Perceptron Artificial Neural Network Model in the Classification of Heart Failure. The J Cog Syst. 2021;6:35-8.
  • 17. Tunç Z, Çolak C, Özdemir R. Classification of Hydrocephalus Disease and Determination of Related Factors by Machine Learning Method. Journal of Inonu University Health Sciences. 2018:14-20.
  • 18. Kecman V. Learning and soft computing: support vector machines, neural networks, and fuzzy logic models: MIT press; 2001.
  • 19. Breiman L. Random forests. Machine learning. 2001;45:5-32.
  • 20. Pal M. Random forest classifier for remote sensing classification. Int J Remote Sens. 2005;26:217-22.
  • 21. Lior R. Data mining with decision trees: theory and applications: World Scientific; 2014.
  • 22. Rokach L. Pattern classification using ensemble methods: World Scientific; 2010.
  • 23. Ferreira AJ, Figueiredo MA. Boosting algorithms: A review of methods, theory, and applications. Ensemble machine learning: Springer; 2012;35- 85.
  • 24. Le T, Le Son H, Vo MT, et al. A cluster-based boosting algorithm for bankruptcy prediction in a highly imbalanced dataset. Symmetry, 2018;10:250.
  • 25. Divina F, Gilson A, Goméz-Vela F, et al. Stacking ensemble learning for short-term electricity consumption forecasting. Energies. 2018;11:949.
  • 26. Mierswa I, Klinkenberg R. RapidMiner Studio (9.2)[Data science, machine learning, predictive analytics]. 2018.
  • 27. Martin DB, Nelson PS. From genomics to proteomics: techniques and applications in cancer research. Trends Cell Biol. 2001;11:60-5.
  • 28. Del Boccio P, Urbani A. Homo sapiens proteomics: clinical perspectives. Ann Ist Super Sanita. 2005;41:479-82.
  • 29. Witten IH, Frank E. Data mining: practical machine learning tools and techniques with Java implementations. Acm Sigmod Record. 2002;31:76-7.
  • 30. Johri S, Jain D, Gupta I. Integrated analysis of bulk multi omic and singlecell sequencing data confirms the molecular origin of hemodynamic changes in Covid-19 infection explaining coagulopathy and higher geriatric mortality. medRxiv. 20 0.
  • 31. Klein RS. A moving target: the multiple roles of CCR5 in infectious diseases. The University of Chicago Press; 2008.
  • 32. Panda AK, Padhi A, Prusty BAK. CCR5 Δ32 minorallele is associated with susceptibility to SARS-CoV-2 infection and death: An epidemiological investigation. Clin Chim Acta; 2020.
  • 33. Gómez J, Cuesta-Llavona E, Albaiceta GM, et al. The CCR5-delta32 variant might explain part of the association between COVID-19 and the chemokine-receptor gene cluster. medRxiv. 2020.
  • 34. Vaninov N. In the eye of the COVID-19 cytokine storm. Nat Rev Immunol. 2020;20:277.
  • 35. Corley MJ, Sugai C, Schotsaert M, et al. Comparative in vitro transcriptomic analyses of COVID-19 candidate therapy hydroxychloroquine suggest limited immunomodulatory evidence of SARS-CoV-2 host response genes. bioRxiv. 2020.
Medicine Science-Cover
  • ISSN: 2147-0634
  • Yayın Aralığı: 4
  • Başlangıç: 2012
  • Yayıncı: Effect Publishing Agency ( EPA )
Sayıdaki Diğer Makaleler

Bilateral ossified subdural hematoma: Literature review

Nermin Tanik, Sevilay Vural, Hakan Ak, Ihsan Canbek

Lipase inhibitor orlistat: An old but still effective weapon

Bunyamin Aydin, Kevser Onbasi

The efficacy of Pharmacomechanical catheter-directed thrombolysis in patients with concomitant deep vein thrombosis and pulmonary embolism: A retrospective analysis of 26 patients

Emced Khalil

Evaluation of inflammatory markers in bipolar disorder: A comparative study

Ibrahim Eren, Memduha Aydin, Ali Metehan Caliskan, Ikbal Inanli

Respiratory viruses in hospitalized children with acute respiratory infections at 2019-2020 autumn-winter season: A single-center experience before COVID-19 pandemic

Mursit Hasbek, Ayca Komurluoglu Tan

The characteristics of intimate partner violence cases

Ahsen KAYA, Hatice Sezin YİLMAZER, Burcu ÖZÇALIŞKAN, Ekin Özgür AKTAŞ

The suicidal deaths in Isparta: A 10-year retrospective autopsy study

Abdulkadir Yildiz, İBRAHİM EROĞLU, Erdinc Cayli, Özge Savcı

Evaluation of adnexial torsion between adult and adolescent women

Tunay KİREMİTLİ, Melike DOĞANAY, Sevil KİREMİTLİ, Aytekin TOKMAK, Burak AKSELİM

A comparative analysis of long-term life qualities of head and neck cancer (nasopharyngeal, hypopharyngeal and laryngeal cancers) patients who were treated with the intensity-modulated radiation therapy (IMRT): Expectations and outcomes

Mehmet Turan Cicek, Mehmet Aslan

The effect of intraoperative neuromonitoring on complications of hypocalcemia following

Mehmet Turan Cicek, Mehmet Aslan