COVID-19 varyantlarını tespit etmek için k-mer tabanlı bir metasezgisel yaklaşım

Emergence of SARS-CoV-2 variants threatens the public health and remarkably prolong the COVID-19 pandemic. Rapid and accurate detection of SARS-CoV-2 variants is crucial to track mutations, monitor the changes, measure the efficiency of the current vaccines, assess the evolution of SARS-CoV-2 as well as prevent its spread. In this paper, we propose a novel and efficient method to predict SARS-CoV-2 variants of concern from whole human genome sequences. In this method, we describe 16 dinucleotide and 64 trinucleotide features to differentiate SARS-CoV-2 variants of concern. The efficacy of the proposed features is proved by using four classifiers, k-nearest neighbor, support vector machines, multilayer perceptron, and random forest. The proposed method is evaluated on the dataset including 223,326 complete human genome sequences including recently designated variants of concern, Alpha, Beta, Gamma, Delta, and Omicron variants. Experimental results present that overall accuracy for detecting SARS-CoV-2 variants of concern remarkably increases when trinucleotide features rather than dinucleotide features are used. Furthermore, we use the whale optimization algorithm, which is the state-of-the-art method for reducing the number of features and choosing the most relevant features. We select 44 trinucleotide features out of 64 to differentiate SARS-CoV-2 variants with acceptable accuracy as a result of the whale optimization method. Experimental results indicate that the SVM classifier with selected features achieves about 99% accuracy, sensitivity, specificity, precision on average. The proposed method presents an admirable performance for detecting SARS-CoV-2 variants.

A k-mer based metaheuristic approach for detecting COVID-19 variants

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) belongs to coronaviridae family and a change in the genetic sequence of SARS-CoV-2 is named as a mutation that causes to variants of SARS-CoV-2. In this paper, we propose a novel and efficient method to predict SARS-CoV-2 variants of concern from whole human genome sequences. In this method, we describe 16 dinucleotide and 64 trinucleotide features to differentiate SARS-CoV-2 variants of concern. The efficacy of the proposed features is proved by using four classifiers, k-nearest neighbor, support vector machines, multilayer perceptron, and random forest. The proposed method is evaluated on the dataset including 223,326 complete human genome sequences including recently designated variants of concern, Alpha, Beta, Gamma, Delta, and Omicron variants. Experimental results present that overall accuracy for detecting SARS-CoV-2 variants of concern remarkably increases when trinucleotide features rather than dinucleotide features are used. Furthermore, we use the whale optimization algorithm, which is a state-of-the-art method for reducing the number of features and choosing the most relevant features. We select 44 trinucleotide features out of 64 to differentiate SARS-CoV-2 variants with acceptable accuracy as a result of the whale optimization method. Experimental results indicate that the SVM classifier with selected features achieves about 99% accuracy, sensitivity, specificity, precision on average. The proposed method presents an admirable performance for detecting SARS-CoV-2 variants.

___

  • [1] Volz, E., Mishra, S., Chand, M., Barrett, J. C., & al., R. J. et. (2021). Assessing transmissibility of SARS-CoV-2 lineage B.1.1.7 in England. Nature, 593(7858), 266–269. doi:10.1038/s41586-021-03470-x
  • [2] Lauring, A. S., & Malani, P. N. (09 2021). Variants of SARS-CoV-2. JAMA, 326(9), 880–880. doi:10.1001/jama.2021.14181
  • [3] Tegally, H., Wilkinson, E., Giovanetti, M., & al., A. I. et. (2021). Detection of a SARS-CoV-2 variant of concern in South Africa. Nature, 592(7854), 438–443. doi:10.1038/s41586-021-03402-9
  • [4] Sabino, E. C., Buss, L. F., Carvalho, M. P. S., & al., E. (2021). Resurgence of COVID-19 in Manaus, Brazil, despite high seroprevalence. The Lancet, 397(10273), 452–455. doi:10.1016/s0140-6736(21)00183-5
  • [5] Mlcochova, P., Kemp, S. A., Dhar, M. S., & al., G. P. et. (2021). SARS-CoV-2 B.1.617.2 Delta variant replication and immune evasion. Nature, 599(7883), 114–119. doi:10.1038/s41586-021-03944-y
  • [6] Sahoo, J. P., & Samal, K. C. (2021). World on alert: WHO designated south African new COVID strain (Omicron/B.1.1.529) as a variant of concern. Biotica Research Today, 3(11), 1086–1088.
  • [7] Jiang, X., Coffee, M., Bari, A., Wang, J., Jiang, X., Huang, J., … Huang, Y. (2020). Towards an Artificial Intelligence Framework for Data-Driven Prediction of Coronavirus Clinical Severity. Computers, Materials $\&$ Continua, 62(3), 537–551. doi:10.32604/cmc.2020.010691
  • [8] Zoabi, Y., Deri-Rozov, S., & Shomron, N. (2021). Machine learning-based prediction of COVID-19 diagnosis based on symptoms. Npj Digital Medicine, 4(1), 3. doi:10.1038/s41746-020-00372-6
  • [9] Muhammad, L. J., Algehyne, E. A., Usman, S. S., Ahmad, A., Chakraborty, C., & Mohammed, I. A. (2021). Supervised Machine Learning Models for Prediction of COVID-19 Infection using Epidemiology Dataset. SN Computer Science, 2(1), 11. doi:10.1007/s42979-020-00394-7
  • [10] Shi, F., Wang, J., Shi, J., Wu, Z., Wang, Q., Tang, Z., … Shen, D. (2021). Review of Artificial Intelligence Techniques in Imaging Data Acquisition, Segmentation, and Diagnosis for COVID-19. IEEE Reviews in Biomedical Engineering, 14, 4–15. doi:10.1109/RBME.2020.2987975
  • [11] Mohamadou, Y., Halidou, A., & Kapen, P. T. (2020). A review of mathematical modeling, artificial intelligence and datasets used in the study, prediction and management of COVID-19. Applied Intelligence, 50(11), 3913–3925. doi:10.1007/s10489-020-01770-9
  • [12] Arslan, H., & Arslan, H. (2021). A new COVID-19 detection method from human genome sequences using CpG island features and KNN classifier. Engineering Science and Technology, an International Journal. doi:10.1016/j.jestch.2020.12.026
  • [13] Arslan, H. (2021a). COVID-19 prediction based on genome similarity of human SARS-CoV-2 and bat SARS-CoV-like coronavirus. Computers $\&$ Industrial Engineering, 161, 107666. doi:10.1016/j.cie.2021.107666
  • [14] Arslan, H., & Aygün, B. (2021). Performance Analysis of Machine Learning Algorithms in Detection of COVID-19 from Common Symptoms. 2021 29th Signal Processing and Communications Applications Conference (SIU), 1–4. doi:10.1109/SIU53274.2021.9477809
  • [15] Arslan, H. (2021b). Machine Learning Methods for COVID-19 Prediction Using Human Genomic Data. Proceedings, 74(1). doi:10.3390/proceedings2021074020
  • [16] Ali, S., Tamkanat-E-Ali, Khan, M. A., Khan, I., & Patterson, M. (2021). Effective and scalable clustering of SARS-CoV-2 sequences. arXiv [q-bio.PE]. Ανακτήθηκε από http://arxiv.org/abs/2108.08143
  • [17] Jamil, S., & Rahman, M. (2021). A Dual-Stage Vocabulary of Features (VoF)-Based Technique for COVID-19 Variants’ Classification. Applied Sciences, 11(24). doi:10.3390/app112411902
  • [18] Ogiela, M. R., & Ogiela, U. (2021). Linguistic methods in healthcare application and COVID-19 variants classification. Neural Computing and Applications. doi:10.1007/s00521-021-06286-y
  • [19] Mann, C., Griffin, J. H., & Downard, K. M. (2021). Detection and evolution of SARS-CoV-2 coronavirus variants of concern with mass spectrometry. Analytical and Bioanalytical Chemistry, 413(29), 7241–7249. doi:10.1007/s00216-021-03649-1
  • [20] Mafarja, M., & Mirjalili, S. (2018). Whale optimization approaches for wrapper feature selection. Applied Soft Computing, 62, 441–453. doi:10.1016/j.asoc.2017.11.006
  • [21] Brown, I., & Mues, C. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications, 39(3), 3446–3453.
  • [22] Deng, Z., Zhu, X., Cheng, D., Zong, M., & Zhang, S. (2016). Efficient KNN Classification Algorithm for Big Data. Neurocomput., 195(C), 143–148. doi:10.1016/j.neucom.2015.08.112
  • [23] Abu Alfeilat, H., Hassanat, A., Lasassmeh, O., Tarawneh, A., Alhasanat, M., Eyal-Salman, H., & Prasath, S. (08 2019). Effects of Distance Measure Choice on K-Nearest Neighbor Classifier Performance: A Review. Big Data, 7. doi:10.1089/big.2018.0175
  • [24] Bishop, C. M. (2006). Pattern recognition and Machine Learning. Springer.
  • [25] Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359–366. doi:10.1016/0893-6080(89)90020-8
  • [26] Burges, C. J. C. (1998). A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 2(2), 121–167. doi:10.1023/A:1009715923555
  • [27] Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. doi:10.1007/978-1-4757-2440-0
  • [28] Hsu, C.-W., & Lin, C.-J. (2002). A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks, 13(2), 415–425. doi:10.1109/72.991427
  • [29] Min, J. H., & Lee, Y.-C. (2005). Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters. Expert Systems with Applications, 28(4), 603–614. doi:10.1016/j.eswa.2004.12.008
  • [30] Keerthi, S. S., & Lin, C.-J. (2003). Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel. Neural Computation, 15(7), 1667–1689. doi:10.1162/089976603321891855
  • [31] Breiman, L. (2001a). Random Forests. Machine Learning, 45(1), 5–32. doi:10.1023/A:1010933404324
  • [32] Breiman, L. (2001b). Machine Learning, 45(1), 5–32. doi:10.1023/a:1010933404324
  • [33] Shu, Y., & McCauley, J. (2017). GISAID: Global initiative on sharing all influenza data - from vision to reality. Eurosurveillance, 22(13). doi:10.2807/1560-7917.ES.2017.22.13.30494
  • [34] Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing $\&$ Management, 45(4), 427–437. doi:10.1016/j.ipm.2009.03.002
Dicle Üniversitesi Mühendislik Fakültesi Mühendislik Dergisi-Cover
  • ISSN: 1309-8640
  • Başlangıç: 2009
  • Yayıncı: DÜ Mühendislik Fakültesi / Dicle Üniversitesi
Sayıdaki Diğer Makaleler

Maximum Power Control and Optimization of Switched Reluctance Generators for Wind Turbines

Gökhan PARLA, Merve YILDIRIM, Mehmet ÖZDEMİR

Güneş Enerjisi Destekli Vakum Distilasyon Yöntemi İle Deniz Suyundan Tatlı Su Eldesinin Enerji Analizi

Mükerrem Sinem MUNGAN, Orhan ARPA

Thermal and mechanical analysis of thermal power plant ashes, cement and resin composites

Ayşe BİÇER

Rifampisinin sucul ortamlardan giderimi için farklı teknolojilerin uygulanması: Yakın tarihli bir derleme

Hatice ERDEM

Kumsal Sedimanlarının Mikroplastik İçeriklerinin ve Sediman Tanelerinin Karakterizasyonu: Muğla Kıyılarından (GB Türkiye) Örnek Çalışma

Murat GUL, Ceren KÜÇÜKUYSAL, Ahmed MASUD

Ayrık Dalgacık Dönüşüm Lideri ve Topluluk Öğrenme Yöntemleri Kullanılarak EEG Kayıtlarından Hafif Bilişsel Bozukluğun Otomatik Tespiti

Afrah SAİD, Hanife GÖKER

Pasternak türü zemin üzerindeki ince plakların statik analizi için yakınsama çalışmaları

Ülkü Hülya ÇALIK KARAKÖSE

304-430 Paslanmaz Çeliklerin Bakır Aratabaka Kullanılarak Difüzyon Kaynağı ile Birleştirilmesi ve Kaynak Mukavemetinin Optimizasyonu

Haluk KEJANLI, M. Selçuk KESKİN, Gamze TOSUN

The size of portlandite crystals in ITZ and its relation with ratios of ingredients and properties of LWAC

İsmail Ağa GÖNÜL, Hatice ÇİÇEK

El Sıkma Hareketinin İşlevsel Yakın Kızılaltı Spektroskopisi ve Elektromiyografi Sinyalleri Kullanılarak Sınıflandırılması

Aykut EKEN