Protein fold classi cation with Grow-and-Learn network

Protein fold classi cation with Grow-and-Learn network

Protein fold classi cation is an important subject in computational biology and a compelling work from the point of machine learning. To deal with such a challenging problem, in this study, we propose a solution method for the classi cation of protein folds using Grow-and-Learn (GAL) neural network together with one-versus-others (OvO) method. To classify the most common 27 protein folds, 125 dimensional data, constituted by the physicochemical properties of amino acids, are used. The study is conducted on a database including 694 proteins: 311 of these proteins are used for training and 383 of them for testing. Overall, the classi cation system achieves 81.2% fold recognition accuracy on the test set, where most of the proteins have less than 25% sequence identity with the ones used during the training. To portray the capabilities of the GAL network among the other methods, comparisons between a few approaches have also been made, and GAL's accuracy is found to be higher than those of the existing methods for protein fold classi cation.

___

  • [1] Hashemi HB, Shakery A, Naeini MP. Protein fold pattern recognition using Bayesian ensemble of RBF neural networks. In: IEEE 2009 Soft Computing and Pattern Recognition Conference; 4{7 December 2009; Malacca, Maleysia. IEEE. pp. 436-441.
  • [2] Levitt M, Chothia C. Structural patterns in globular proteins. Nature 1976; 27: 254-256.
  • [3] Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classi cations of proteins database for the investigation of sequences and structures. J Mol Biol 1995; 247: 536-540.
  • [4] Dubchak I, Muchnik I, Mayor C, Dralyuk I, Kim SH. Recognition of a protein fold in the context of the SCOP classi cation. Proteins 1999; 35: 401-407.
  • [5] Cai YD, Zhou GP. Prediction of protein structural classes by neural network. Biochimie 2000; 82: 783-785.
  • [6] Chou KC. A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space. Proteins 1995; 21: 319-344.
  • [7] Chou KC, Zhang CT. Review: prediction of protein structural classes. Crit Rev Biochem Mol Biol 1995; 30: 275-349.
  • [8] Isik Z, Yanikoglu B, Sezerman U. Protein structural class determination using support vector machines. Lecture Notes in Computer Science 2004; 3280: 82-89.
  • [9] Sun XD, Huang RB. Prediction of protein structural classes using support vector machine. Amino Acids 2006; 30: 469-475.
  • [10] Zhang CT, Chou KC. An optimization approach to predicting protein structural class from amino acid composition. Protein Sci 1992; 1: 401-408.
  • [11] An nsen CB. Principles that govern the folding of protein chains. Science 1973; 181: 223-230.
  • [12] An nsen CB, Scheraga HA. Experimental and theoretical aspects of protein folding. Adv Protein Chem 1975; 29: 205-300.
  • [13] Dubchak I, Muchnik I, Holbrook SN, Kim SH. Prediction of protein folding class using global description of amino acid sequence. P Natl Acad Sci USA 1995; 92: 8700-8704.
  • [14] Ding CHQ, Dubchak I. Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 2001; 17: 349-358.
  • [15] Bologna G, Appel RD. A comparison study on protein fold recognition. In: IEEE 2002 9th International Conference on Neural Information Processing; 18{22 Nov 2002; IEEE. pp. 2492-2496.
  • [16] Chen K, Kurgan L. PFRES: protein fold classi cation by using evolutionary information and predicted secondary structure. Bioinformatics 2007; 23: 2843-2850.
  • [17] Nanni L. A novel ensemble of classi ers for protein fold recognition. Neurocomputing 2006; 69: 2434-2437.
  • [18] Okun O. Protein fold recognition with k-local hyperplane distance nearest neighbour algorithm. In: Proceedings of the Second European Workshop on Data Mining and Text Mining in Bioinformatics 2004; Pisa, Italy. Citeseer. pp. 51-57.
  • [19] Shen HB, Chou KC. Ensemble classi er for protein fold pattern recognition. Bioinformatics 2006; 22: 1717-1722.
  • [20] Shen HB, Chou KC. Predicting protein fold pattern with functional domain and sequential evolution information. J Theor Biol 2009; 256: 441-446.
  • [21] Kavousi K, Moshiri B, Sadeghi M, Araabi BN, Moosavi-Movahedi AA. A protein fold classi er formed by fusing different modes of pseudo amino acid composition via PSSM. Comput Biol Chem 2011; 35: 1-9.
  • [22] Kavousi K, Sadeghi M, Moshiri B, Araabi BN, Moosavi-Movahedi AA. Evidence theoretic protein fold classi cation based on the concept of hyperfold. Math Biosci 2012; 240: 148-160.
  • [23] Yang T, Kecman V, Cao L, Zhang C, Huang JZ. Margin-based ensemble classi er for protein fold recognition. Expert Syst Appl 2011; 38: 12348-12355.
  • [24] Suvarnavani K, Ra ah SB, Kamisetti NR. Multiclass classi cation for protein fold prediction using Smote. Int J Adv Res Comput Sci Softw Eng 2012; 2: 290-296.
  • [25] Chmielnicki W, Stapor K. A hybrid discriminative/generative approach to protein fold recognition. Neurocomputing 2012; 75: 194-198.
  • [26] Lin C, Zou Y, Qin J, Liu X, Jiang Y, Ke C, Zou Q. Hierarchical classi cation of protein folds using a novel ensemble classi er. PloS ONE 2013; 8: 1-11.
  • [27] Aram RZ, Charkari NM. Two-layer classi cation framework for protein fold recognition. J Theor Biol 2015; 365: 32-39.
  • [28] Alpaydin E. GAL: Networks that grow when they learn and shrink when they forget. Int J Pattern Recogn 1991; 8: 391-414.
  • [29] Cai YD. Is it a paradox or misinterpretation. Proteins 2001; 43: 336-338.
  • [30] Zhou GP. An intriguing controversy over protein structural class prediction. J Protein Chem 1998; 17: 729-738.
  • [31] Zhou GP, Assa-Munt N. Some insights into protein structural class prediction. Proteins 2001; 44: 57-59.
  • [32] Chou KC, Maggiora GM. Domain structural class prediction. Protein Eng 1998; 11: 523-538.
  • [33] Dokur Z,  Olmez T. Classi cation of respiratory sounds by using an arti cial neural network. Int J Pattern Recogn 2003; 17: 567-580.
  • [34] Brown MPS, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS, Ares M, Haussler D. Knowledge-based analysis of microarray gene expression data by using Support Vector Machines. P Natl Acad Sci USA 2000; 97: 262-267.
Turkish Journal of Electrical Engineering and Computer Sciences-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: Yılda 6 Sayı
  • Yayıncı: TÜBİTAK