Classifying RNA Strands with A Novel Graph Representation Based on the Sequence Free Energy

Classifying RNA Strands with A Novel Graph Representation Based on the Sequence Free Energy

ABSTRACT Ribonucleic acids (RNA) are macromolecules in all living cell, and they are mediators between DNA and protein. Structurally, RNAs are more similar to the DNA. In this paper, we introduce a compact graph representation utilizing the Minimum Free Energy (MFE) of RNA molecules' secondary structure. This representation represents structural components of secondary RNAs as edges of the graphs, and MFE of these components represents their edge weights. The labeling process is used to determine these weights by considering both the MFE of the 2D RNA structures, and the specific settings in the RNA structures. This encoding is used to make the representation more compact by giving a unique graph representation for the secondary structural elements in the graph. Armed with the representation, we apply graph-based algorithms to categorize RNA molecules. We also present the result of the cutting-edge graph-based methods (All Paths Cycle Embeddings (APC), Shortest Paths Kernel/Embedding (SP), and Weisfeiler - Lehman and Optimal Assignment Kernel (WLOA)) on our dataset [1] using this new graph representation. Finally, we compare the results of the graph-based algorithms to a standard bioinformatics algorithm (Needleman-Wunsch) used for DNA and RNA comparison.

___

  • E. Algul and R. C. Wilson, “A database and evaluation for classification of rna molecules using graph methods,” in Graph-Based Representations in Pattern Recognition, D. Conte, J.-Y. Ramel, and P. Foggia, Eds. Cham: Springer International Publishing, 2019, pp. 78–87.
  • D. Bechhofer and M. Deutscher, “Bacterial ribonucleases and their roles in rna metabolism,” Critical Reviews in Biochemistry and Molecular Biology, vol. 54, pp. 242–300, 05 2019.
  • “3dna: a suite of software programs for the analysis, rebuilding and visualization of 3-dimensional nucleic acid structures,” x3dna.org. [Online]. Available: http://x3dna.org/
  • M. S. WATERMAN, “Secondary structure of singlestranded nucleic acids,” Studies in Foundations and Combinatorics Advances in Mathematics Supplementary Studies, vol. 1, pp. 167–212, 1978. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.15.4425rep=rep1type=pdf
  • D. Fera, N. Kim, N. Shiffeldrim, J. Zorn, U. Laserson, H. H. Gan, and T. Schlick, “Rag: Rna-as-graphs web resource,” BMC Bioinformatic, vol. 5, 07 2004. [Online]. Available: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471- 2105-5-88
  • D. Knisley, J. Knisley, C. Ross, and A. Rockney, “Classifying multigraph models of secondary rna structure using graph-theoretic descriptors,” ISRN Bioinformatics, International Scholarly Research Network, 11 2012. [Online]. Available: https://doi.org/10.5402/2012/157135
  • J. Huang, K. Li, and M. Gribskov, “Accurate classification of rna structures using topological fingerprints,” PLOS ONE, vol. 11, no. 10, pp. 1–19, 10 2016. [Online]. Available: https://doi.org/10.1371/journal.pone.0164726
  • R. C. Wilson and E. Algul, “Categorization of rna molecules using graph methods,” in Structural, Syntactic, and Statistical Pattern Recognition, X. Bai, E. R. Hancock, T. K. Ho, R. C. Wilson, B. Biggio, and A. Robles-Kelly, Eds. Cham: Springer International Publishing, 2018, pp. 439–448.
  • S. V. N. Vishwanathan, N. N. Schraudolph, R. Kondor, and K. M. Borgwardt, “Graph kernels,” Journal of Machine Learning Research, vol. 11, pp. 1201–1242, 2010.
  • G. M. Blackburn, M. J. Gait, D. Loakes, D. M. Williams, J. A. Grasby, M. Egli, A. Flavell, S. Allen, J. Fisher, A. M. Pyle, et al., Nucleic acids in chemistry and biology. Royal Society of Chemistry, 2006.
  • M. Zuker, “Mfold web server for nucleic acid folding and hybridization prediction,” Nucleic Acids Research, vol. 31, no. 13, pp. 3406–3415, 07 2003. [Online]. Available: https://doi.org/10.1093/nar/gkg595
  • H. Jabbari, I. Wark, and C. Montemagno, “Rna secondary structure prediction with pseudoknots: Contribution of algorithm versus energy model,” PLOS ONE, vol. 13, pp. 1–21, 04 2018.
  • Y. Wu, B. Shi, X. Ding, T. Liu, X. Hu, K. Y. Yip, Z. R. Yang, D. H. Mathews, and Z. J. Lu, “Improved prediction of RNA secondary structure by integrating the free energy model with restraints derived from experimental probing data,” Nucleic Acids Research, vol. 43, pp. 7247–7259, 07 2015.
  • K. Doshi, J. Cannone, C. Cobaugh, and R. Gutell, “Evaluation of the suitability of free-energy minimization using nearest-neighbor energy parameters for rna secondary structure prediction,” BMC bioinformatics, vol. 5, p. 105, 09 2004.
  • M. Zuker and P. Stiegler, “Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information,” Nucleic Acids Research, vol. 9, no. 1, pp. 133–148, 01 1981. [Online]. Available: https://doi.org/10.1093/nar/9.1.133
  • I. L. Hofacker, “Vienna RNA secondary structure server,” Nucleic Acids Research, vol. 31, no. 13, pp. 3429–3431, 07 2003. [Online]. Available: https://doi.org/10.1093/nar/gkg599
  • L. Wang, Y. Liu, X. Zhong, H. Liu, C. Lu, C. Li, and H. Zhang, “Dmfold: A novel method to predict rna secondary structure with pseudoknots based on deep learning and improved base pair maximization principle,” Frontiers in Genetics, vol. 10, p. 143, 2019.
  • P. S. Klosterman, M. Tamura, S. R. Holbrook, and S. E. Brenner, “SCOR: a Structural Classification of RNA database,” Nucleic Acids Research, vol. 30, pp. 392–394, 01 2002.
  • X. Lu and W. K. Olson, “3DNA: a software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures,” Nucleic Acids Research, vol. 31, pp. 5108–5121, 09 2003.
  • F. Vendeix, A. Munoz, and P. Agris, “Free energy calculation of modified base-pair formation in explicit solvent: A predictive model,” RNA (New York, N.Y.), vol. 15, pp. 2278–87, 10 2009.
  • I. TINOCO, O. C. UHLENBECK, and M. D. LEVINE, “Estimation of Secondary Structure in Ribonucleic Acids,” Nature, vol. 230, pp. 362– 367, 04 1971. [Online]. Available: https://doi.org/10.1038/230362a0
  • N. Nicolo, “Learning with kernels on graphs: Dag-based kernels, data streams and rna function prediction,” Alma Mater Studiorum-Universita di Bologna ´, 2014. [Online]. Available: https://pdfs.semanticscholar.org/313b/7d182e81e021faed1cf650f480fdeaeeb3d6.pdf
  • G. K. D. de Vries, “A fast approximation of the weisfeiler-lehman graph kernel for rdf data,” in Machine Learning and Knowledge Discovery in Databases, H. Blockeel, K. Kersting, S. Nijssen, and F. Zelezn ˇ y, Eds. ´ Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 606–621.
  • N. M. Kriege, P.-L. Giscard, and R. C. Wilson, “On valid optimal assignment kernels and applications to graph classification,” in Advances in Neural Information Processing Systems, 2016, pp. 1615–1623.
  • N. Shervashidze, P. Schweitzer, E. J. van Leeuwen, K. Mehlhorn, and K. M. Borgwardt, “Weisfeiler-lehman graph kernels,” Journal of Machine Learning Research, vol. 12, pp. 2539–2561, 2011. [Online]. Available: http://dl.acm.org/citation.cfm?id=2078187
  • K. M. Borgwardt and H. Kriegel, “Shortest-path kernels on graphs,” in Proceedings of the 5th IEEE International Conference on Data Mining (ICDM 2005), 27-30 November 2005, Houston, Texas, USA, 2005, pp. 74–81. [Online]. Available: http://dx.doi.org/10.1109/ICDM.2005.132
  • P.-L. Giscard and R. C. Wilson, “The all-paths and cycles graph kernel,” arXiv preprint arXiv:1708.01410, 2017.
  • S. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” Journal of Molecular Biology, vol. 43, no. 3, pp. 443–453, 1970.
  • Schmidt, Marco F. "DNA: Blueprint of the Proteins." Chemical Biology: and Drug Discovery. Berlin, Heidelberg: Springer Berlin Heidelberg, 2022. 33-47.
  • Ou, Xiujuan, et al. "Advances in RNA 3D Structure Prediction." Journal of Chemical Information and Modeling 62.23 (2022): 5862-5874.
  • Schulz, Till Hendrik, et al. "A generalized weisfeiler-lehman graph kernel." Machine Learning 111.7 (2022): 2601-2629.
  • Salim, Asif, S. S. Shiju, and S. Sumitra. "Graph kernels based on optimal node assignment." Knowledge-Based Systems 244 (2022): 108519.