Computational Prediction of Interactions Between SARS-CoV-2 and Human Protein Pairs by PSSM-Based Images

Computational Prediction of Interactions Between SARS-CoV-2 and Human Protein Pairs by PSSM-Based Images

Identifying protein-protein interactions is essential to predict the behavior of the virus and to design antiviral drugs against an infection. Like other viruses, SARS-CoV-2 virus must interact with a host cell in order to survive. Such interaction results in an infection in the host organism. Knowing which human protein interacts with the SARS-CoV-2 protein is an essential step in preventing viral infection. In silico approaches provide a reference for in vitro validation to protein-protein interaction studies by finding interacting protein pair candidates. The representation of proteins is one of the key steps for protein interaction network prediction. In this study, we proposed an image representation of proteins based on position-specific scoring matrices (PSSM). PSSMs are matrices that are obtained from multiple sequence alignments. In each of its cells, there is information about the probability of the occurrence of amino acids or nucleotides. PSSM matrices were handled as gray-scale images and called PSSM images. The main motivation of the study is to investigate whether these PSSM images are a suitable protein representation method. To determine adequate image size, conversion to grayscale images was performed at different sizes. SARS-CoV-2-human protein interaction network prediction based on image classification with siamese neural network and Resnet50 was performed on PSSM image datasets of different sizes. The accuracy results obtained with 200x200 size images and siamese neural network as 0.915, and with 400x400 size images and Resnet50 as 0.922 showed that PSSM images can be used for protein representation.

___

  • [1] P. Koehl, “Protein structure similarities”. Current opinion in structural biology, 11(3), 348-353, 2001. doi: 10.1016/S0959-440X(00)00214-1.
  • [2] D. P. Ryan, and J. M. Matthews, “Protein–protein interactions in human disease”. Current Opinion in Structural Biology, 15(4), 441-446, 2005. doi: 10.1016/j.sbi.2005.06.001
  • [3] V. Altuntaş, and M. Gök, “Protein–protein etkileşimi tespit yöntemleri, veri tabanları ve veri güvenilirliği”. Avrupa Bilim ve Teknoloji Dergisi, (19), 722-733, 2020. doi: doi.org/10.31590/ejosat.724390.
  • [4] J. Piehler, “New methodologies for measuring protein interactions in vivo and in vitro”. Current Opinion in Structural Biology, 15(1), 4-14, 2005. doi: 10.1016/j.sbi.2005.01.008.
  • [5] S. Xing, N. Wallmeroth, K. W. Berendzen, and C. Grefen, “Techniques for the analysis of protein-protein interactions in vivo”. Plant Physiology, 171(2), 727-758,2016. doi: 10.1104/pp.16.00470.
  • [6] S. Vivona, J. L. Gardy, S. Ramachandran, F. S. Brinkman, G. P. S. Raghava, D. R. Flower, and F. Filippini, “Computer-aided biotechnology: from immuno-informatics to reverse vaccinology”. Trends in Biotechnology, 26(4), 190-200, 2008. doi: 10.1016/j.tibtech.2007.12.006.
  • [7] S. J. Y. Macalino, S. Basith, N. A. B. Clavio, H. Chang, S. Kang, and S. Choi, “Evolution of in silico strategies for protein-protein interaction drug discovery”. Molecules, 23(8), 1963, 2018. doi: 10.3390/molecules23081963.
  • [8] P. Kangueane, and C. Nilofer. “Principles of Protein-Protein Interaction,” in Protein-Protein and Domain-Domain Interactions. Springer, Singapore, 2018. doi:10.1007/978-981-10-7347-2_8.
  • [9] B. Parlak, and A. K. Uysal. “On classification of abstracts obtained from medical journals”. Journal of Information Science, 2020, 46(5), 648-663.
  • [10] L. Dey, and A. Mukhopadhyay, “A classification-based approach to prediction of dengue virus and human protein-protein interactions using amino acid composition and conjoint triad features,” in 2019 IEEE Region 10 Symposium (TENSYMP), 2019, June, pp. 373-378, IEEE.
  • [11] Y. Ma, T. He, Y. Tan, and X. Jiang, “Seq-bel: Sequence-based ensemble learning for predicting virus-human protein-protein interaction”. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19(3), 1322-1333,2020. doi: 10.1109/TCBB.2020.3008157.
  • [12] X. Yang, S. Yang, Q. Li, S. Wuchty, and Z. Zhang, “Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method”. Computational and Structural Biotechnology Journal, Vol.18, pp. 153-161, 2020. doi: 10.1016/j.csbj.2019.12.005
  • [13] A. Mukhopadhyay, U. Maulik, and S. Bandyopadhyay, “A novel biclustering approach to association rule mining for predicting HIV-1–human protein interactions”. PLoS One, 7(4), e32289, 2012. doi: 10.1371/journal.pone.0032289.
  • [14] S. K. Ng, Z. Zhang, and S. H. Tan, “Integrative approach for computationally inferring protein domain interactions”. Bioinformatics, 19(8), 923-929, 2003. doi: 10.1093/bioinformatics/btg118.
  • [15] N. Zhang, M. Jiang, T. Huang, and Y. D. Cai, “Identification of Influenza A/H7N9 virus infection-related human genes based on shortest paths in a virus-human protein interaction network”. BioMed Research International, 2014, 2014. doi: 10.1155/2014/239462.
  • [16] S. Bandyopadhyay, and K. Mallick, “A new feature vector based on gene ontology terms for protein-protein interaction prediction”. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 14(4), 762-770, 2016. doi: 10.1109/TCBB.2016.2555304.
  • [17] H. Ge, Z. Liu, G. M. Church, and M. Vidal, “Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae”. Nature Genetics, 29(4), 482-486, 2001. doi: doi.org/10.1038/ng776.
  • [18] A. Zhang, L. He, and Y. Wang, “Prediction of GCRV virus-host protein interactome based on structural motif-domain interactions”. BMC Bioinformatics, 18(1), 1-13, 2017. doi: 10.1186/s12859-017-1500-8.
  • [19] M. D. Dyer, T. M. Murali, and B. W. Sobral, “Computational prediction of host-pathogen protein–protein interactions”. Bioinformatics, 23(13), i159-i166, 2007. doi: 10.1016/j.patter.2021.100242.
  • [20] A. Birlutiu, F. d’Alché-Buc, and T. Heskes, “A bayesian framework for combining protein and network topology information for predicting protein-protein interactions”. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12(3), 538-550, 2014. doi: 10.1109/TCBB.2014.2359441.
  • [21] S. Erten, X. Li, G. Bebek, J. Li, and M. Koyutürk, “Phylogenetic analysis of modularity in protein interaction networks”. BMC Bioinformatics, 10(1), 1-14, 2009. doi: 10.1186/1471-2105-10-333.
  • [22] N. Papanikolaou, G. A. Pavlopoulos, T. Theodosiou, and I. Iliopoulos, “Protein–protein interaction predictions using text mining methods”. Methods, 74, 47-53, 2015. doi: 10.1016/j.ymeth.2014.10.026.
  • [23] B. Khorsand, A. Savadi, J. Zahiri, and M. Naghibzadeh, “Alpha influenza virus infiltration prediction using virus-human protein-protein interaction network”. Mathematical Biosciences and Engineering, 17(4), 3109-3129, 2020. doi: 10.3934/mbe.2020176.
  • [24] P. Zhou, X. L. Yang, X. G. Wang, B. Hu, L. Zhang, W. Zhang, ... and Z. L. Shi, “A pneumonia outbreak associated with a new coronavirus of probable bat origin”. Nature, 579(7798), 270-273, 2020. doi: 10.1038/s41586-020-2012-7
  • [25] A. A. Khan, and Z. Khan, “Comparative host–pathogen protein–protein interaction analysis of recent coronavirus outbreaks and important host targets identification”. Briefings in Bioinformatics, 22(2), 1206-1214, 2021. doi: 10.1093/bib/bbaa207.
  • [26] J. Lanchantin, A. Sekhon, C. Miller, and Y. Qi, “Transfer learning with motiftrans-formers for predicting protein-protein interactions between a novel virus and humans”. BioRxiv, 36, i659-i667, 2020. doi: 10.1101/2020.12.14.422772.
  • [27] H. Du, F. Chen, H. Liu, and P. Hong, “Network-based virus-host interaction prediction with application to SARS-CoV-2”. Patterns, 2(5), 100242, 2021. doi: 10.1016/j.patter.2021.100242.
  • [28] S. Su, G. Wong, W. Shi, J. Liu, A. C. Lai, J. Zhou, ... and G. F. Gao, “Epidemiology, genetic recombination, and pathogenesis of coronaviruses”. Trends in Microbiology, 24(6), 490-502, 2016. doi: 10.1016/j.tim.2016.03.003.
  • [29] B. Khorsand, A. Savadi and M. Naghibzadeh, “SARS-CoV-2-human protein-protein interaction network”. Informatics in Medicine Unlocked, 20, 100413, 2020. doi: 10.1016/j.imu.2020.100413.
  • [30] R. Oughtred, C. Stark, B. J. Breitkreutz, J. Rust, L. Boucher, C. Chang, ... and M. Tyers, “The BioGRID interaction database: 2019 update”. Nucleic Acids Research, 47(D1), D529-D541, 2019. doi: 10.1093/nar/gky1079.
  • [31] L. Dey, S. Chakraborty and A. Mukhopadhyay, “Machine learning techniques for sequence-based prediction of viral–host interactions between SARS-CoV-2 and human proteins”. Biomedical Journal, 43(5), 438-450, 2020. doi: 10.1016/j.bj.2020.08.003.
  • [32] D. E. Gordon, G. M. Jang, M. Bouhaddou, J. Xu, K. Obernier, K. M. White ... and N. J. Krogan, “A SARS-CoV-2 protein interaction map reveals targets for drug repurposing”. Nature, 583(7816), 459-468, 2020. doi: 10.1038/s41586-020-2286-9.
  • [33] D. Pirolli, B. Righino, and M. C. De Rosa. “Targeting SARS‐CoV‐2 Spike Protein/ACE2 Protein‐Protein Interactions: a Computational Study”. Molecular Informatics, 2021, 40(6), 2060080.
  • [34] H. J. Lee. “An interactome landscape of SARS-CoV-2 virus-human protein-protein interactions by protein sequence-based multi-label classifiers”. BioRxiv, 2021.
  • [35] E. W. Bell, J. H. Schwartz, P. L. Freddolino, and Y. Zhang. “PEPPI: Whole-proteome protein-protein interaction prediction through structure and sequence similarity, functional association, and machine learning”. Journal of Molecular Biology, 2022, 167530.
  • [36] G. Launay, N. Ceres, and J. Martin. “Non-interacting proteins may resemble interacting proteins: prevalence and implications”. Scientific reports, 2017, 7(1), 1-12.
  • [37] R. K. Barman, S. Saha, and S. Das. “Prediction of interactions between viral and host proteins using supervised machine learning methods”. PloS One, 2014, 9(11), e112034.
  • [38] T. Sun, B. Zhou, L. Lai and J. Pei. “Sequence-based prediction of protein protein interaction using a deep-learning algorithm”. BMC Bioinformatics, 2017, 18(1), 1-8.
  • [39] S.R. Eddy. “Where did the BLOSUM62 alignment score matrix come from?” Nature Biotechnology, 2004, 22(8), 1035-1036.
  • [40] UniProt Consortium. “UniProt: a hub for protein information”. Nucleic Acids Research, 2015, 43(D1), D204-D212.
  • [41] J. D. Bernal. “Structure of proteins”. Nature, 1939, 143(3625), 663-667.
  • [42] J. C. Jeong, X. Lin, and X. W. Chen. “On position-specific scoring matrix for protein function prediction”. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2010, 8(2), 308-315.
  • [43] R.C. Edgar, and S. Batzoglou. “Multiple sequence alignment”. Current Opinion in Structural Biology, 2006, 16(3), 368-373.
  • [44] A. Mohammadi, J. Zahiri, S. Mohammadi, M. Khodarahmi, and S. S. Arab, “PSSMCOOL: a comprehensive R package for generating evolutionary-based descriptors of protein sequences from PSSM profiles”. Biology Methods and Protocols, 7(1), bpac008, 2022. doi: 10.1093/biomethods/bpac008
  • [45] N. Xiao, D. S. Cao, M. F. Zhu, and Q. S. Xu, “protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences”. Bioinformatics, 31(11), 1857-1859, 2015.
  • [46] S. Albawi, T. A. Mohammed and S. Al-Zawi. “Understanding of a convolutional neural network”, in 2017 International Conference on Engineering and Technology (ICET), 2017, pp. 1-6, IEEE.
  • [47] J. Wu, “Introduction to convolutional neural networks”. National Key Lab for Novel Software Technology. Nanjing University. China, 5(23), 495, 2017.
  • [48] S. Balaji, S. (2020, Aug 29). “Binary Image classifier CNN using TensorFlow”, medium.com. Aug. 29, 2020. [Online]. Available: https://medium.com/techiepedia/binary-image-classifier-cnn-using-tensorflow-a3f5d6746697. [Accessed: 15/11/2022].
  • [49] K. He, X. Zhang, S. Ren and J. Sun, “Deep residual learning for image recognition”, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, IEEE, pp. 770-778.
  • [50] P. Roy, S. Ghosh, S. Bhattacharya and U. Pal. “Effects of degradations on deep neural network architectures”. ArXiv, abs/1807.10108, 2018
  • [51] J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. Fei-Fei. “Imagenet: A large-scale hierarchical image database”, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, June, 2009, pp. 248-255, IEEE.
  • [52] D. Chicco, “Siamese Neural Networks: An Overview”, in: Cartwright, H. (eds) Artificial Neural Networks. Methods in Molecular Biology, vol 2190. Humana, New York, NY, 2021. doi:10.1007/978-1-0716-0826-5_3.
  • [53] L. Hudec, and W. Bencsova, “Texture similarity evaluation via siamese convolutional neural network”, in 2018 25th International Conference on Systems, Signals and Image Processing (IWSSIP), June, 2018, pp. 1-5, IEEE.
  • [54] M. D. Li, K. Chang, B. Bearce, C. Y. Chang, A. J. Huang, J. P. Campbell, ... and J. Kalpathy-Cramer. “Siamese neural networks for continuous disease severity evaluation and change detection in medical imaging”. NPJ Digital Medicine, 2020, 3(1), 1-9.
  • [55] J. Liang. “Confusion matrix”. POGIL Activity Clearinghouse, 2022, 3(4).
  • [56] S. V. Stehman. “Selecting and interpreting measures of thematic classification accuracy”. Remote sensing of Environment, 1997, 62(1), 77-89.
  • [57] H. B. Wong and G. H. Lim. “Measures of diagnostic accuracy: sensitivity, specificity, PPV and NPV”, in Proceedings of Singapore Healthcare, 2011, 20(4), 316-318.
  • [58] D. M. Powers. “Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation”. ArXiv preprint arXiv:2010.16061, 2020.