MicroRNA prediction based on 3D graphical representation of RNA secondary structures
MicroRNAs (miRNAs) are posttranscriptional regulators of gene expression. While a miRNA can target hundreds of messenger RNA (mRNAs), an mRNA can be targeted by different miRNAs, not to mention that a single miRNA might have various binding sites in an mRNA sequence. Therefore, it is quite involved to investigate miRNAs experimentally. Thus, machine learning (ML) is frequently used to overcome such challenges. The key parts of a ML analysis largely depend on the quality of input data and the capacity of the features describing the data. Previously, more than 1000 features were suggested for miRNAs. Here, it is shown that using 36 features representing the RNA secondary structure and its dynamic 3D graphical representation provides up to 98% accuracy values. In this study, a new approach for ML-based miRNA prediction is proposed. Thousands of models are generated through classification of known human miRNAs and pseudohairpins with 3 classifiers: decision tree, naïve Bayes, and random forest. Although the method is based on human data, the best model was able to correctly assign 96% of nonhuman hairpins from MirGeneDB, suggesting that this approach might be useful for the analysis of miRNAs from other species.
___
- Acar IE, Saçar Demirci MD, Groß U, Allmer J (2018). The expressed
MicroRNA-mRNA interactions of Toxoplasma gondii. Frontiers in Microbiology 8: 2630. doi: 10.3389/fmicb.2017.02630.
- Avci CB, Baran Y (2014). Use of microRNAs in personalized medicine. Methods in Molecular Biology 1107: 311-325. doi:
10.1007/978-1-62703-748-8_19
- Batey RT, Rambo RP, Doudna JA (1999). Tertiary motifs in RNA
structure and folding. Angewandte Chemie (International Ed.
in English) 38 (16): 2326-2343.
- Batuwita R, Palade V (2009). microPred: effective classification of
pre-miRNAs for human miRNA gene prediction. Bioinformatics 25 (8): 989-995. doi: 10.1093/bioinformatics/btp107
- Berthold MR, Cebron N, Dill F, Gabriel TR, Kötter T et al. (2008).
KNIME: The Konstanz Information Miner. SIGKDD Explorations 11: 319-326. doi: 10.1007/978-3-540-78246-9_38
- Fromm B, Domanska D, Hackenberg M, Mathelier A, Hoye E et al.
(2018). MirGeneDB2.0: The Curated microRNA Gene Database. BioRxiv, 258749. doi: 10.1101/258749
Fu X, Liao B, Zhu W, Cai L (2018). New 3D graphical representation for RNA structure analysis and its application in the premiRNA identification of plants. RSC Advances 8: 30833-30841.
doi: 10.1039/C8RA04138E
- Hofacker IL (2003). Vienna RNA secondary structure server. Nucleic
Acids Research 31 (13): 3429-3431. doi: 10.1093/nar/gkg599
- Jiang P, Wu H, Wang W, Ma W, Sun X et al. (2017). MiPred: Classification of real and pseudo microRNA precursors using random
forest prediction model with combined features. Nucleic Acids
Research 35 (2): 339–344. doi: 10.1093/nar/gkm368
- Kozlowski P, Starega-Roslan J, Legacz M, Magnus M, Krzyzosiak
WJ (2008). Structures of microRNA precursors. In: Ying SY
(editor). Current Perspectives in microRNAs (miRNA). Dordrecht, the Netherlands: Springer, pp. 1-16. doi: 10.1007/978-
1-4020-8533-8_1
- Kozomara A, Griffiths-Jones S (2014). miRBase: annotating high
confidence microRNAs using deep sequencing data. Nucleic
Acids Research 42: D68-73. doi: 10.1093/nar/gkt1181
- Li Y, Duan M, Liang Y (2012). Multi-scale RNA comparison based
on RNA triple vector curve representation. BMC Bioinformatics 13 (1): 280. doi: 10.1186/1471-2105-13-280
- Ng KLS, Mishra SK (2007). De novo SVM classification of precursor
microRNAs from genomic pseudo hairpins using global and
intrinsic folding measures. Bioinformatics 23 (11): 1321-1330.
doi: 10.1093/bioinformatics/btm026
- Roden C, Gaillard J, Kanoria S, Rennie W, Barish S et al. (2017). Novel determinants of mammalian primary microRNA processing
revealed by systematic evaluation of hairpin-containing transcripts and human genetic variation. Genome Research 27 (3):
374-384. doi: 10.1101/gr.208900.116
- Saçar MD, Allmer J (2013a). Comparison of four Ab initio microRNA prediction tools. In: Bioinformatics 2013; Barcelona, Spain. Barcelona, Spain: SciTePress, pp. 190-195. doi:
10.5220/0004248201900195
- Saçar MD, Allmer J (2013b). Data mining for microRNA gene prediction: on the impact of class imbalance and feature number
for microRNA gene prediction. In: 2013 8th International
Symposium on Health Informatics and Bioinformatics. doi:
10.1109/HIBIT.2013.6661685
- Saçar MD, Hamzeiy H, Allmer J (2013). Can MiRBase provide positive data for machine learning for the detection of MiRNA
hairpins? Journal of Integrative Bioinformatics 10 (2): 215. doi:
10.2390/biecoll-jib-2013-215
- Saçar Demirci MD, Allmer J (2017a). Delineating the impact of machine learning elements in pre-microRNA detection. PeerJ 5:
e3131. doi: 10.7717/peerj.3131
- Saçar Demirci MD, Allmer J (2017b). Improving the quality of positive datasets for the establishment of machine learning models
for pre-microRNA detection. Journal of Integrative Bioinformatics 14 (2): 0032. doi: 10.1515/jib-2017-0032
- Saçar Demirci MD, Baumbach J, Allmer J (2017). On the performance of pre-microRNA detection algorithms. Nature Communications 8 (1): 330. doi: 10.1038/s41467-017-00403-z
- Saçar Demirci MD, Toprak M, Allmer J (2016). A machine learning
approach for MicroRNA precursor prediction in retro-transcribing virus genomes. Journal of Integrative Bioinformatics
13 (5): 303. doi: 10.2390/biecoll-jib-2016-303
- Tüfekci KU, Oner MG, Meuwissen RLJ, Genç Ş (2014). The role of
microRNAs in human diseases. Methods in Molecular Biology
1107: 33-50. doi: 10.1007/978-1-62703-748-8_3
- Varani G, McClain WH (2000) The G x U wobble base pair. A fundamental building block of RNA structure crucial to RNA function in diverse biological systems. EMBO Reports 1 (1): 18-23.
doi: 10.1093/embo-reports/kvd001
- Xu QS, Liang YZ (2001). Monte Carlo cross validation. Chemometrics and Intelligent Laboratory Systems 56 (1): 1-11. doi:
10.1016/S0169-7439(00)00122-2
- Xue C, Li F, He T, Liu GP, Li Y et al. (2005). Classification of real and
pseudo microRNA precursors using local structure-sequence
features and support vector machine. BMC Bioinformatics 6:
310. doi: 10.1186/1471-2105-6-310
- Yao YH, Nan XY, Wang TM (2005). A class of 2D graphical representations of RNA secondary structures and the analysis of similarity based on them. Journal of Computational Chemistry 26
(13): 1339-1346. doi: 10.1002/jcc.20271
- Yousef M, Saçar Demirci MD, Khalifa W, Allmer J (2016). Feature
selection has a large impact on one-class classification accuracy
for microRNAs in plants. Advances in Bioinformatics 2016;
2016: 5670851. doi: 10.1155/2016/5670851
- Zhang Y, Huang H, Dong X, Fang Y, Wang K et al. (2016). A dynamic
3D graphical representation for RNA structure analysis and its
application in non-coding RNA classification. PLoS One 11
(5): e0152238. doi: 10.1371/journal.pone.0152238