A proposal of a hybrid model to predict the secondary protein structures based on amino acid sequences

A proposal of a hybrid model to predict the secondary protein structures based on amino acid sequences

Aim: Predicting the secondary structure of proteins based on amino acid sequences is one of the most significant issues inbioinformatics that requires clarification. A high accuracy in determining the secondary structure is a key to programmaticallyuncover 3D structure of proteins and for individual drug applications of programmable proteins. The success rates in predicting thesecondary structures (Q3 score) were around 0.60 when relevant research was initiated and now the rates have reached to the limitof 0.80.Material and Methods: In this study, the secondary structure was predicted through 3-state (Helix, Strand and Turn). Artificial neuralnetworks and machine learning algorithms were used as a hybrid model and a framework was developed. The probability of thepaired presence of amino acids in sequences was used in digitizing amino acid sequences. Calculations were completed separatelyfor each secondary structural element and the cascade mean filter was used as a threshold method to clarify the differences. Thegenerated matrices were used to digitize the protein sequences. Secondary structure was predicted through the Helix-Strand, HelixTurn, Strand-Turn, and subsequently, a final decision as Helix, Strand and Turn was reached via machine learning models.Results: It was determined that the success rates in the dual estimation of secondary structural elements were 0.797 for helixstrand, 0.848 for helix-turn and 0.829 for strand-turn. The average success rate for paired estimation of secondary structuralelements was calculated as 0.824. In the proposed model, accuracy was calculated as 0.742 for Helix, 0.703 for Strand and 0.880for Turn. Q3 score was obtained as 0.775.

___

  • 1. Narloch PH, Parpinelli RS. The Protein Structure Prediction Problem Approached by a Cascade Differential Evolution Algorithm Using ROSETTA. Brazilian Conference on Intelligent Systems (BRACIS) 2017;294-9.
  • 2. Weng JT-Y, Wu L-C, Chang W-C et al. Novel Bioinformatics Approaches for Analysis of HighThroughput Biological Data. BioMed Res Int 2014;1-3.
  • 3. Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983;22:2577- 637.
  • 4. Yang Y, Gao J, Wang J, et al. Sixty-five years of the long march in protein secondary structure prediction: the final stretch? Brief Bioinform 2016;19:482-94.
  • 5. Chou PY, Fasman GD. Empirical Predictions of Protein Conformation. Annu Rev Biochem 1978;47:251-76.
  • 6. Garnier J, Osguthorpe DJ, Robson B. Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins J Mol Biol 1978;120:97-120.
  • 7. Rost B, Sander C. Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol. 1993;232:584-99.
  • 8. Jones DT. Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 1999;292:195-202.
  • 9. Jiang Q, Jin X, Lee S-J, et al. Protein secondary structure prediction: A survey of the state of the art. J Mol Graph Model 2017;76:379-402.
  • 10. Selbig J, Mevissen T, Lengauer T. Decision tree-based formation of consensus protein secondary structure prediction. Bioinformatics. 1999;15:1039-46.
  • 11. He J, Hu H-J, Harrison R, et al. Rule Generation for Protein Secondary Structure Prediction With Support Vector Machines and Decision Tree. IEEE Trans Nanobioscience 2006;5:46-53.
  • 12. Yendralwar AA, Waghmare SL, Biyani RM, et al. Bayesian Approach to Prediction of Protein Secondary Structure 2014;5:3375-5.
  • 13. Chawla N, Moore Jr, Bowyer KW, et al. Bagging-like effects for decision trees and neural nets in protein secondary structure prediction. Proceedings of the 1st International Conference on Data Mining in Bioinformatics. Springer, Verlag, 2001;50-9.
  • 14. Lou W, Wang X, Chen F, et al. Sequence Based Prediction of DNA-Binding Proteins Based on Hybrid Feature Selection Using Random Forest and Gaussian Naïve Bayes. PLoS ONE 2014;9:e86703.
  • 15. Python, Python.org. https://www.python.org/ access date 2019.
  • 16. Guzzi PH. Computing Languages for Bioinformatics: Python. Encyclopedia of Bioinformatics and Computational Biology 2019;1:195-8.
  • 17. National Center for Biotechnology Information. https://www.ncbi.nlm.nih.gov/ access date 2019.
  • 18. The Universal Protein Resource Knowledge base. https://www.uniprot.org/ access date 2019.
  • 19. McKinney W. pandas: a foundational Python library for data analysis and statistics. Python for High Performance and Scientific Computing. 2011;14:1-9.
  • 20. van der Walt S, Colbert SC, Varoquaux G. The NumPy Array: A Structure for Efficient Numerical Computation. Comput Sci Eng. 2011;13:22-30.
  • 21. SciPy.org. https://www.scipy.org/ access date 2019.
  • 22. StatsModels: Statistics in Python, statsmodels 0.9.0 documentation. https://www.statsmodels .org/ stable/index.html access date 2019.
  • 23. Hunter JD. Matplotlib: A 2D Graphics Environment. Comput Sci Eng 2007;9:90–5.
  • 24. Mwaskom/Seaborn: V0.8.1. https://zenodo.org/ record/883859 access date 2019.
  • 25. Pedregosa F, Varoquaux G, Gramfort A, et al. Scikitlearn: Machine Learning in Python. J Mach Learn Res 2011;12:2825-30.
  • 26. scikit-learn: machine learning in Python, scikitlearn 0.21.2 documentation. https://scikit-learn.org/ stable/index.html access date 2019.
  • 27. Sehirli E, Turan MK, Demiral E. A randomized automated thresholding method to identify comet objects on comet assay images. Proceedings of the 3rd International Conference on Communication and Information Processing, 2017; 464-7.
  • 28. Turan MK, Yücer E, Sehirli E, et al. Estimation of population number via light activitieson night-time satellite images. ISPRS - Int Arch Photogramm Remote Sens Spat Inf Sci. 2017;103-5.
  • 29. Lin K, May ACW, Taylor WR. Amino Acid Encoding Schemes from Protein Structure Alignments: Multidimensional Vectors to Describe Residue Types. J Theor Biol. 2002;216:361-5.
  • 30. Swanson R. A, Vecctor representation for amino acid sequences. Bull Math Biol. 1984;64:623-39.
  • 31. Zamani M, Kremer SC. Amino acid encoding schemes for machine learning methods. IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW), Atlanta, GA, 327-33.
  • 32. Jing X, Dong Q, Hong D, et al. Amino acid encoding methods for protein sequences: a comprehensive review and assessment. IEEE/ACM Trans Comput Biol Bioinform 2018;1-14.
  • 33. Panchal G, Ganatra A, Kosta YP, et al. Behaviour Analysis of Multilayer Perceptronswith Multiple Hidden Neurons and Hidden Layers. Int J Comput Theory Eng. 2011;332-7.
  • 34. Jurman G, Riccadonna S, Furlanello C. A Comparison of MCC and CEN Error Measures in Multi-Class Prediction. Biondi-Zoccai G, editor. PLoS ONE. 2012;7:41882.
  • 35. Raschka S. An Overview of General Performance Metrics of Binary Classifier Systems. arXiv preprint arXiv:1410.5330 2014;1-5.
Annals of Medical Research-Cover
  • Yayın Aralığı: Aylık
  • Yayıncı: İnönü Üniversitesi Tıp Fakültesi
Sayıdaki Diğer Makaleler

Protective effect of ibuprofen against renal ischemiareperfusion injury

Erdal BENLİ, Ahmet KARATAŞ, Ebru ÇANAKÇI, Tülin BAYRAK, Ahmet BAYRAK, Mürüvvet AKÇAY ÇELİK

Effects of bilateral knee arthroplasty on sagittal spinopelvic balance in patients with primer degenerative osteoarthritis

Sefa Giray BATIBAY, Serkan BAYRAM, Turgut AKGÜL, Hüseyin KOCA, Savaş ÇAMUR, Özcan KAYA, Necdet SAĞLAM

The effect of the rapid injection technique without aspiration on pain level in intramuscular vaccination-a single-blind randomized-controlled trial

İlknur GÖL

The effect of amniotic membrane wrapping on colorectal anastomosis in rats undergoing pelvic radiotherapy

Şadiye Mehtat ÜNLÜ, Baha ASLAN, Süleyman Özkan AKSOY, Ali İbrahim SEVİNÇ, Mustafa Cem TERZİ, Hatice ŞİMŞEK KESKİN, Hilmi Feyzi ALANYALI

The correlation between deformity and metatarsus projection area and the ratios of projection to all metatarsi in direct radiographs in hallux valgus cases

Serkan ÖNER, Meral YEDİGÜL, Zülal ÖNER

Prognostic significance of B7H4 expression in patients with optimally or maximally cytoreduced ovarian clear cell carcinoma

Koray ASLAN, Mehmet Mutlu MEYDANLI, Zeliha FIRAT CÜYLAN, Kamil Hakan MÜFTÜOĞLU, Murat ÖZ

The positivity rates of a novel test in the patients with suspected clostridioides difficile associated diarrhea

Nafia Canan GÜRSOY, Yakup GEZER

Prevalence and severity of non-traumatic acute kidney injury in multiple trauma patients attending emergency department

Monira Taha ISMAIL, Ramy Tarek ELSAID, Islam Mohamed ELSHABOURY, Seham Ahmed OMER

Vitronectin and prolactin levels of cervicovaginal irrigation liquid in preterm birth risk evaluation

Ebru İnci ÇOŞKUN, Salim SEZER, İsmail DAĞ, Ercan YILMAZ, Yavuz Tahsin AYANOĞLU

The effects of carpal tunnel syndrome on sleep quality

Özgür Zeliha KARAAHMET, Mufit AKYÜZ, Gülşah KARATAŞ, Öznur KÜTÜK, Elif YALÇIN