Adem Doganer, Seyma Yasar, Zeynep Kucukakcali

Ensemble learning-based prediction of COVID-19 positive patient groups determined by IL-6 levels and control individuals based on the proteomics data

Coronavirus disease (COVID-19) is a newly found coronavirus that causes an infectious disease. COVID-19, which has a detrimental impact on many people, has varied effects on different people. Therefore, proteomic analysis is an important approach used to develop early diagnosis and treatment strategies. This research to classify COVID-19 positive patient groups represented by interleukin 6 (IL-6) levels (low, medium, high) and control groups based on proteomic analysis using ensemble learning methods (Adaboost, Bagging, Stacking, and Voting). The public dataset from a website consists of 49 subjects (31 COVID-19 positives and 18 controls) and 493 proteins achieved from blood samples. The dataset was handled to estimate the relation between disease severity and proteins using ensemble learning approaches (Adaboost, Bagging, Stacking, and Voting) using ten-fold cross-validation. Predictions were evaluated with accuracy, sensitivity,etc. performance metrics. The accuracy of Adaboost (96.00%) was higher as compared to Voting (93.88%) and Bagging (91.84%). However, the Stacking ensemble learning method produced the highest accuracy (97.92%). IL6, SERPINA3, SERPING1, SERPINA1, and GSN were the five most important proteins associated with disease severity. In comparison to the other methods, the suggested ensemble learning model (Stacking) produced the best estimation of disease severity based on proteins. The results indicate that changes in blood protein levels correlated with the severity of COVID-19 may be benefited to follow early diagnosis/treatment of the COVID-19 disease.

PDF

___

1. Ye Q, Wang B, Mao J. Cytokine storm in COVID-19 and treatment. J Infection. 2020;80:607-13.
2. Zhang ZL, Hou YL, Li DT, et al. Laboratory findings of COVID-19: a systematic review and meta-analysis. Scand J. 2020;80:441-7.
3. Aziz M, Fatima R, Assaly R. Elevated interleukin‐6 and severe COVID‐19: A meta‐analysis. J Med Virol. 2020;1-3.
4. Herold T, Jurinovic V, Arnreich C, et al. Elevated levels of interleukin-6 and CRP predict the need for mechanical ventilation in COVID-19. J Allergy Clin Immunol. 2020;146:128-36.
5. Schwenker F. Ensemble methods: Foundations and algorithms [book review]. IEEE Comput Intell Mag.2013;8:77-9.
6. D’Alessandro A, Thomas T, Dzieciatkowska M, et all. Serum proteomics in COVID-19 patients: Altered coagulation and complement status as a function of IL-6 level. J Proteome Res. 2020;19:4417-27.
7. Brown ML, Kros JF. Data mining and the impact of missing data. IMDS.2003;103:611-21.
8. Rapidminer DA. RapidMiner 4.1 User Guide. In: Dortmund, 2008. 9. He Y. Missing data imputation for tree-based models. In: University of California, Los Angeles, 2006.
10. Chawla NV, Bowyer KW, Hall LO, et al. SMOTE: synthetic minority oversampling technique. J Artif Intell Res. 2003;16:321-57.
11. Kursa MB, Rudnicki WR. Feature selection with the Boruta package. J Stat Softw.2010;36:1-13.
12. İnik Ö, Ülker E. Deep learning and deep learning models used in image analysis. Gaziosmanpasa J Sci Res. 2017;6:85-104.
13. Brijain M, Patel R, Kushik MR, et al. A survey on decision tree algorithm for classification. Int J Eng Sci Invention Res Dev. 2014;2:1-5.
14. Wyner AJ, Olson M, Bleich J, Mease D. Explaining the success of adaboost and random forests as interpolating classifiers. J Mach Learn Res. 2017;18:1558-90.
15. Krauss C, Do XA, Huck N. (2017). Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500. Eur J Oper Res. 2017;259:689-702.
16. Polikar R. Ensemble learning. In Ensemble machine learning. Springer, Boston, 2012;1-34.
17. Divina F, Gilson A, Goméz-Vela F, et al. Stacking ensemble learning for short-term electricity consumption forecasting. Energies. 2018;11:949.
18. Browne MW. Cross-validation methods. J Math. Psychol. 2000;44:108-32. 19. Sullivan GM, Feinn R. Using effect size—or why the P value is not enough. J Grad. Med. Educ. 2012;4:279-82.
20. Allaire J. RStudio: integrated development environment for R. MA. 2012;770:165-71.
21. Yasar Ş, Arslan AK, Colak C, et al. A Developed Interactive Web Application for Statistical Analysis: Statistical Analysis Software. MBSJHS. 2020;6:227-39.
22. Yasar Ş, Arslan AK, Yologlu S, et al. DTROC: Diagnostic Tests and ROC analysis Software [Web-tabanlı yazılım].2019.
23. Hofmann M, Klinkenberg R. RapidMiner: Data mining use cases and business analytics applications. In: CRC Press. 2016.
24. López-Úbeda P, Díaz-Galiano MC, Martín-Noguerol T, et al. COVID-19 detection in radiological text reports integrating entity recognition. Comput Biol Med.2020;127:104066.
25. Lalmuanawma S, Hussain J, Chhakchhuak L. Applications of machine learning and artificial intelligence for Covid-19 (SARS-CoV-2) pandemic: A review. Chaos, Solitons & Fractals. 2020;39:110059.
26. Wang X, Han T. Transformer fault diagnosis based on stacking ensemble learning. IEEJ Trans Electr Electron Eng. 2020;15:1734-9.
27. Hu X, Mei H, Zhang H, et al. Performance evaluation of ensemble learning techniques for landslide susceptibility mapping at the Jinping county, Southwest China. Natural Hazards. 2021;105:1663-89.
28. Tanaka T, Narazaki M, Kishimoto T. IL-6 in inflammation, immunity, and disease. Cold Spring Harb. Perspect. Biol. 2014;6:a016295.
29. Chen L, Liu HG, Liu W, et al. Analysis of clinical features of 29 patients with 2019 novel coronavirus pneumonia. CJTRD. 2020;43:E005.
30. Zumla A, Hui DS, Azhar EI, et al. Reducing mortality from 2019-nCoV: host-directed therapies should be an option. Lancet. 2020;395:35-6.
31. Flowers LO. COVID-19 and the Cytokine Storm. BJSTR. 2020;29:22644- 7.
32. Mehta P, McAuley DF, Brown M, et al. COVID-19: consider cytokine storm syndromes and immunosuppression. Lancet, 2020;395:1033-4.
33. Zaim S, Chong JH, Sankaranarayanan V, et al. COVID-19 and multiorgan response. Curr Probl Cardiol. 2020;45:100618.
34. Kong Y, Han J, Wu X, et al. VEGF-D: a novel biomarker for detection of COVID-19 progression. Critical Care. 2020;24:1-4.
35. Ulhaq ZS, Soraya GV. Interleukin-6 as a potential biomarker of COVID-19 progression. Medecine et Maladies Infectieuses. 2020;50-382.
36. Montesarchio V, Parrella R, Iommelli C, et al. Outcomes and biomarker analyses among patients with COVID-19 treated with interleukin 6 (IL- 6) receptor antagonist sarilumab at a single institution in Italy. JITC. 2020;8:e001089.
37. De Marco Verissimo C, Jewhurst HL, Tikhonova IG, et al. Fasciola hepatica serine protease inhibitor family (serpins): Purposely crafted for regulating host proteases. PLOS Negl Trop Dis. 2020;14:e0008510.
38. Shu T, Ning W, Wu D, et al. Plasma proteomics identify biomarkers and pathogenesis of COVID-19. Immunity, 2020;53:1108-22.
39. Overmyer KA, Shishkova E, Miller IJ, et al. Large-scale multi-omic analysis of COVID-19 severity. Cell systems. 2021;12:23-40.
40. Haidar A, Holloway L. Integration of deep and ensemble learning for detecting covid-19 in computed tomography images. Research Square. 2020;3:12-6.
41. Charran SR, Dubey RK. Deep learning based hybrid models for prediction of COVID-19 using Chest X-Ray. Tech Rxiv. 2020;45:2-10.