SARS-CoV-2 Virus RNA Sequence Classification and Geographical Analysis with Convolutional Neural Networks Approach

SARS-CoV-2 Virus RNA Sequence Classification and Geographical Analysis with Convolutional Neural Networks Approach

Covid-19 infection, which spread to the whole world in December 2019 and is still active, caused more than 250 thousand deaths in the world today. Researches on this subject have been focused on analyzing the genetic structure of the virus, developing vaccines, the course of the disease, and its source. In this study, RNA sequences belonging to the SARS-CoV-2 virus are transformed into gene motifs with two basic image processing algorithms and classified with the convolutional neural network (CNN) models. The CNN models achieved an average of 98% Area Under Curve(AUC) value was achieved in RNA sequences classified as Asia, Europe, America, and Oceania. The resulting artificial neural network model was used for phylogenetic analysis of the variant of the virus isolated in Turkey. The classification results reached were compared with gene alignment values in the GISAID database, where SARS-CoV-2 virus records are kept all over the world. Our experimental results have revealed that now the detection of the geographic distribution of the virus with the CNN models might serve as an efficient method.

___

  • [1] M. Wainberg, D. Merico, A. Delong, and B. J. Frey, "Deep learning in biomedicine," Nature Biotechnology, vol. 36, no. 9, pp. 829-838, 2018/10/01 2018.
  • [2] Y. Lu, Y. Zhou, W. Qu, M. Deng, and C. Zhang, "A Lasso regression model for the construction of microRNA-target regulatory networks," Bioinformatics, vol. 27, no. 17, pp. 2406-2413, 2011.
  • [3] D. Fioravanti et al., "Phylogenetic convolutional neural networks in metagenomics," BMC Bioinformatics, vol. 19, no. 2, p. 49, 2018/03/08 2018.
  • [4] Y. Zhang, X. Liu, J. MacLeod, and J. Liu, "Discerning novel splice junctions derived from RNA-seq alignment: a deep learning approach," BMC Genomics, vol. 19, no. 1, p. 971, 2018/12/27 2018.
  • [5] G. Eraslan, L. M. Simon, M. Mircea, N. S. Mueller, and F. J. Theis, "Single-cell RNA-seq denoising using a deep count autoencoder," Nature Communications, vol. 10, no. 1, p. 390, 2019/01/23 2019.
  • [6] P. Danaee, R. Ghaeini, and D. A. Hendrix, "A DEEP LEARNING APPROACH FOR CANCER DETECTION AND RELEVANT GENE IDENTIFICATION," (in eng), Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, vol. 22, pp. 219-229, 2017.
  • [7] L. L. M. Poon and M. Peiris, "Emergence of a novel human coronavirus threatening human health," Nature Medicine, vol. 26, no. 3, pp. 317-319, 2020/03/01 2020.
  • [8] S. M. Smith and J. M. Brady, "SUSAN—A New Approach to Low Level Image Processing," International Journal of Computer Vision, vol. 23, no. 1, pp. 45-78, 1997/05/01 1997.
  • [9] Y. Shu and J. McCauley, "GISAID: Global initiative on sharing all influenza data - from vision to reality," Euro Surveill, vol. 22, no. 13, Mar 30 2017.
  • [10] E. L. Hatcher et al., "Virus Variation Resource–improved response to emergent viral outbreaks," Nucleic acids research, vol. 45, no. D1, pp. D482-D490, 2017.
  • [11] W. R. Pearson and D. J. Lipman, "Improved tools for biological sequence comparison," (in eng), Proceedings of the National Academy of Sciences of the United States of America, vol. 85, no. 8, pp. 2444-2448, 1988.
  • [12] I. Miko and L. LeJeune, Essentials of Genetics. Cambridge, MA: NPG Education, 2009.
  • [13] J. Bresenham, "A linear algorithm for incremental digital display of circular arcs," Communications of the ACM, vol. 20, no. 2, pp. 100-106, 1977.
  • [14] R. Chokshi, D. Israni, and N. Chavda, "An efficient deconvolution technique by identification and estimation of blur," in 2016 IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), 2016, pp. 17-23: IEEE.
  • [15] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely connected convolutional networks," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700-4708.
  • [16] C. Y. Li and N. T. Vu, "Densely Connected Convolutional Networks for Speech Recognition," in Speech Communication; 13th ITG-Symposium, 2018, pp. 1-5.
  • [17] J. Rubin, S. Parvaneh, A. Rahman, B. Conroy, and S. Babaeizadeh, "Densely connected convolutional networks and signal quality analysis to detect atrial fibrillation using short single-lead ECG recordings," in 2017 Computing in Cardiology (CinC), 2017, pp. 1-4.
  • [18] M. Long, Y. Cao, J. Wang, and M. I. Jordan, "Learning transferable features with deep adaptation networks," arXiv preprint arXiv:1502.02791, 2015.
  • [19] R. D. Gottapu and C. H. Dagli, "DenseNet for Anatomical Brain Segmentation," Procedia Computer Science, vol. 140, pp. 179-185, 2018/01/01/ 2018.
  • [20] V. Gupta et al., "Performance of a Deep Neural Network Algorithm Based on a Small Medical Image Dataset: Incremental Impact of 3D-to-2D Reformation Combined with Novel Data Augmentation, Photometric Conversion, or Transfer Learning," Journal of Digital Imaging, vol. 33, no. 2, pp. 431-438, 2020/04/01 2020.
  • [21] X. Xu, J. Lin, Y. Tao, and X. Wang, "An Improved DenseNet Method Based on Transfer Learning for Fundus Medical Images," in 2018 7th International Conference on Digital Home (ICDH), 2018, pp. 137-140.
  • [22] X.-L. Qiang, P. Xu, G. Fang, W.-B. Liu, and Z. Kou, "Using the spike protein feature to predict infection risk and monitor the evolutionary dynamic of coronavirus," Infectious Diseases of Poverty, vol. 9, no. 1, p. 33, 2020/03/25 2020.
  • [23] R. Lu et al., "Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding," The Lancet, vol. 395, no. 10224, pp. 565-574, 2020.
  • [24] L. Zhang, J.-R. Yang, Z. Zhang, and Z. Lin, "Genomic variations of SARS-CoV-2 suggest multiple outbreak sources of transmission," medRxiv, p. 2020.02.25.20027953, 2020.
  • [25] X. Tang et al., "On the origin and continuing evolution of SARS-CoV-2," National Science Review, 2020.
  • [26] G. Zehender et al., "Genomic characterization and phylogenetic analysis of SARS-COV-2 in Italy," Journal of Medical Virology, vol. n/a, no. n/a, 2020/03/29 2020.
  • [27] F. Díez-Fuertes et al., "Phylodynamics of SARS-CoV-2 transmission in Spain," bioRxiv, p. 2020.04.20.050039, 2020.
  • [28] P. Yadav et al., "Full-genome sequences of the first two SARS-CoV-2 viruses from India," Indian Journal of Medical Research, Original Article vol. 151, no. 2, pp. 200-209, February & March 1, 2020 2020.
  • [29] A. Bal et al., "Molecular characterization of SARS-CoV-2 in the first COVID-19 cluster in France reveals an amino-acid deletion in nsp2 (Asp268Del)," bioRxiv, p. 2020.03.19.998179, 2020.