MicroRNA prediction based on 3D graphical representation of RNA secondary structures

MicroRNAs (miRNAs) are posttranscriptional regulators of gene expression. While a miRNA can target hundreds of messenger RNA (mRNAs), an mRNA can be targeted by different miRNAs, not to mention that a single miRNA might have various binding sites in an mRNA sequence. Therefore, it is quite involved to investigate miRNAs experimentally. Thus, machine learning (ML) is frequently used to overcome such challenges. The key parts of a ML analysis largely depend on the quality of input data and the capacity of the features describing the data. Previously, more than 1000 features were suggested for miRNAs. Here, it is shown that using 36 features representing the RNA secondary structure and its dynamic 3D graphical representation provides up to 98% accuracy values. In this study, a new approach for ML-based miRNA prediction is proposed. Thousands of models are generated through classification of known human miRNAs and pseudohairpins with 3 classifiers: decision tree, naïve Bayes, and random forest. Although the method is based on human data, the best model was able to correctly assign 96% of nonhuman hairpins from MirGeneDB, suggesting that this approach might be useful for the analysis of miRNAs from other species.

___

  • Acar IE, Saçar Demirci MD, Groß U, Allmer J (2018). The expressed MicroRNA-mRNA interactions of Toxoplasma gondii. Frontiers in Microbiology 8: 2630. doi: 10.3389/fmicb.2017.02630.
  • Avci CB, Baran Y (2014). Use of microRNAs in personalized medicine. Methods in Molecular Biology 1107: 311-325. doi: 10.1007/978-1-62703-748-8_19
  • Batey RT, Rambo RP, Doudna JA (1999). Tertiary motifs in RNA structure and folding. Angewandte Chemie (International Ed. in English) 38 (16): 2326-2343.
  • Batuwita R, Palade V (2009). microPred: effective classification of pre-miRNAs for human miRNA gene prediction. Bioinformatics 25 (8): 989-995. doi: 10.1093/bioinformatics/btp107
  • Berthold MR, Cebron N, Dill F, Gabriel TR, Kötter T et al. (2008). KNIME: The Konstanz Information Miner. SIGKDD Explorations 11: 319-326. doi: 10.1007/978-3-540-78246-9_38
  • Fromm B, Domanska D, Hackenberg M, Mathelier A, Hoye E et al. (2018). MirGeneDB2.0: The Curated microRNA Gene Database. BioRxiv, 258749. doi: 10.1101/258749 Fu X, Liao B, Zhu W, Cai L (2018). New 3D graphical representation for RNA structure analysis and its application in the premiRNA identification of plants. RSC Advances 8: 30833-30841. doi: 10.1039/C8RA04138E
  • Hofacker IL (2003). Vienna RNA secondary structure server. Nucleic Acids Research 31 (13): 3429-3431. doi: 10.1093/nar/gkg599
  • Jiang P, Wu H, Wang W, Ma W, Sun X et al. (2017). MiPred: Classification of real and pseudo microRNA precursors using random forest prediction model with combined features. Nucleic Acids Research 35 (2): 339–344. doi: 10.1093/nar/gkm368
  • Kozlowski P, Starega-Roslan J, Legacz M, Magnus M, Krzyzosiak WJ (2008). Structures of microRNA precursors. In: Ying SY (editor). Current Perspectives in microRNAs (miRNA). Dordrecht, the Netherlands: Springer, pp. 1-16. doi: 10.1007/978- 1-4020-8533-8_1
  • Kozomara A, Griffiths-Jones S (2014). miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Research 42: D68-73. doi: 10.1093/nar/gkt1181
  • Li Y, Duan M, Liang Y (2012). Multi-scale RNA comparison based on RNA triple vector curve representation. BMC Bioinformatics 13 (1): 280. doi: 10.1186/1471-2105-13-280
  • Ng KLS, Mishra SK (2007). De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures. Bioinformatics 23 (11): 1321-1330. doi: 10.1093/bioinformatics/btm026
  • Roden C, Gaillard J, Kanoria S, Rennie W, Barish S et al. (2017). Novel determinants of mammalian primary microRNA processing revealed by systematic evaluation of hairpin-containing transcripts and human genetic variation. Genome Research 27 (3): 374-384. doi: 10.1101/gr.208900.116
  • Saçar MD, Allmer J (2013a). Comparison of four Ab initio microRNA prediction tools. In: Bioinformatics 2013; Barcelona, Spain. Barcelona, Spain: SciTePress, pp. 190-195. doi: 10.5220/0004248201900195
  • Saçar MD, Allmer J (2013b). Data mining for microRNA gene prediction: on the impact of class imbalance and feature number for microRNA gene prediction. In: 2013 8th International Symposium on Health Informatics and Bioinformatics. doi: 10.1109/HIBIT.2013.6661685
  • Saçar MD, Hamzeiy H, Allmer J (2013). Can MiRBase provide positive data for machine learning for the detection of MiRNA hairpins? Journal of Integrative Bioinformatics 10 (2): 215. doi: 10.2390/biecoll-jib-2013-215
  • Saçar Demirci MD, Allmer J (2017a). Delineating the impact of machine learning elements in pre-microRNA detection. PeerJ 5: e3131. doi: 10.7717/peerj.3131
  • Saçar Demirci MD, Allmer J (2017b). Improving the quality of positive datasets for the establishment of machine learning models for pre-microRNA detection. Journal of Integrative Bioinformatics 14 (2): 0032. doi: 10.1515/jib-2017-0032
  • Saçar Demirci MD, Baumbach J, Allmer J (2017). On the performance of pre-microRNA detection algorithms. Nature Communications 8 (1): 330. doi: 10.1038/s41467-017-00403-z
  • Saçar Demirci MD, Toprak M, Allmer J (2016). A machine learning approach for MicroRNA precursor prediction in retro-transcribing virus genomes. Journal of Integrative Bioinformatics 13 (5): 303. doi: 10.2390/biecoll-jib-2016-303
  • Tüfekci KU, Oner MG, Meuwissen RLJ, Genç Ş (2014). The role of microRNAs in human diseases. Methods in Molecular Biology 1107: 33-50. doi: 10.1007/978-1-62703-748-8_3
  • Varani G, McClain WH (2000) The G x U wobble base pair. A fundamental building block of RNA structure crucial to RNA function in diverse biological systems. EMBO Reports 1 (1): 18-23. doi: 10.1093/embo-reports/kvd001
  • Xu QS, Liang YZ (2001). Monte Carlo cross validation. Chemometrics and Intelligent Laboratory Systems 56 (1): 1-11. doi: 10.1016/S0169-7439(00)00122-2
  • Xue C, Li F, He T, Liu GP, Li Y et al. (2005). Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine. BMC Bioinformatics 6: 310. doi: 10.1186/1471-2105-6-310
  • Yao YH, Nan XY, Wang TM (2005). A class of 2D graphical representations of RNA secondary structures and the analysis of similarity based on them. Journal of Computational Chemistry 26 (13): 1339-1346. doi: 10.1002/jcc.20271
  • Yousef M, Saçar Demirci MD, Khalifa W, Allmer J (2016). Feature selection has a large impact on one-class classification accuracy for microRNAs in plants. Advances in Bioinformatics 2016; 2016: 5670851. doi: 10.1155/2016/5670851
  • Zhang Y, Huang H, Dong X, Fang Y, Wang K et al. (2016). A dynamic 3D graphical representation for RNA structure analysis and its application in non-coding RNA classification. PLoS One 11 (5): e0152238. doi: 10.1371/journal.pone.0152238