Evrişimsel Sinir Ağlarını Kullanarak Protein Dizilimlerinin Sınıflandırılması

Biyoinformatikteki en büyük zorluklardan biri, protein yapısının ve fonksiyonunun öngörülmesi ve sınıflandırılmasıdır. Çok miktarda Ribonükleik Asit (RNA) verisi geleneksel laboratuvar yolu kullanılarak yönetilemez. Bunun için proteinler yapılarına ve ailelerine göre ayrılmalıdır. Bu nedenle proteinlerin biyolojik ailelerini ve fonksiyonlarını tanımlamak için sınıflandırılması gerekmektedir. Geleneksel makine öğrenme yaklaşımlarında, proteinler sınıflandırılırken çeşitli özellik çıkarım algoritmaları kullanılmaktadır. Elle özellik çıkarımında, seçilen özellikler, başarımı doğrudan etkilemektedir. Bu nedenle, bu çalışmada önerilen yaklaşımda ise protein sekanslarını amino asit bileşimi yöntemi ile sayısallaştırılmıştır. Sayısallaştırılan protein dizilimleri spektrograma dönüştürülmüş ve iki boyutlu Evrişimsel Sinir Ağı (ESA) modelleri (VGG19, ResNet) kullanılarak otomatik özellik çıkarımı gerçekleştirilmiştir. Çıkarılan özellikler Destek Vektör Makineleri (DVM) ve k-En Yakın Komşuluk (k-NN) ile sınıflandırılmıştır. Sonuç olarak, ResNet kullanılarak gerçekleştirilen protein sekanslarının sınıflandırma işleminde %95.03’lük bir doğruluğa ulaşılmıştır.Anahtar kelimeler: Protein sınıflama, biyoinformatik, evrişimsel sinir ağları, makine öğrenmesi.

Classifying Protein Sequences Using Convolutional Neural Network

One of the major challenges in bioinformatics is the classification and identification of protein structure and function. Large amounts of Ribonucleic Acid (RNA) data cannot be managed using traditional laboratory methods. For this, proteins should be separated according to their structure and families. Therefore, proteins need to be classified to define their biological families and functions. In traditional machine learning approaches, various feature extraction algorithms are used to classify proteins. In manual feature extraction, the selected features directly affect performance. Therefore, in the proposed method of this study, protein sequences were digitized by the amino acid composition technique. The digitized protein sequences were converted to spectrograms, and automatic feature extraction was performed using two-dimensional Convolutional Neural Network (CNN) models (VGG19, ResNet). The extracted features were classified with Support Vector Machine (SVM) and k-Nearest Neighbors (k-NN). As a result, the accuracy of 95.03% was achieved in the classification of protein sequences using ResNet.Keywords: Protein classification, bioinformatics, convolutional neural network, machine learning.

___

  • [1] Satpute B., Yadav R. 2018. Machine Intelligence Techniques for Protein Classification. 3rd International Conference for Convergence in Technology (I2CT). IEEE, pp 1–4, 6-8 April, Pune, India.
  • [2] Yang L., Yan Tang Y., Yang L., Luo H. 2015. A Fractal Dimension and Wavelet Transform Based Method for Protein Sequence Similarity Analysis. IEEE/ACM Trans Comput Biol Bioinforma, 12 (2): 348-359.
  • [3] Charuvaka A., Rangwala H. 2014. Classifying Protein Sequences Using Regularized Multi-Task Learning. IEEE/ACM Trans Comput Biol Bioinforma, 11 (6): 1087-1098.
  • [4] Wang D., Huang G.-B. 2005. Protein sequence classification using extreme learning machine. Proceedings of the International Joint Conference on Neural Networks, 31 July-4 Aug, Montreal,Que., Canada.
  • [5] Bandyopadhyay S. 2005. An efficient technique for superfamily classification of amino acid sequences: Feature extraction, fuzzy clustering and prototype selection. Fuzzy Sets Syst., 152 (1): 5-16.
  • [6] Ma P.C.H., Chan K.C.C. 2008. UPSEC: An Algorithm for Classifying Unaligned Protein Sequences into Functional Families. J Comput Biol, 15 (4):431-443.
  • [7] Jaakkola T., Diekhans M., Haussler D. 2000. A Discriminative Framework for Detecting Remote Protein Homologies. J Comput Biol, 7 (1-2): 95-114.
  • [8] Saigo H., Vert J.-P., Ueda N., Akutsu T. 2004. Protein homology detection using string alignment kernels. Bioinformatics, 20 (11):1682-1689.
  • [9] Bharill N., Tiwari A., Rawat A. 2005. A Novel Technique of Feature Extraction with Dual Similarity Measures for Protein Sequence Classification. Procedia Comput Sci, 48: 795-801.
  • [10] Tsuda K., Shin H.J., Schölkopf B. 2005. Fast protein classification with multiple networks. Bioinformatics, 21 (2): 59-65.
  • [11] Huang D.S., Zhao X.M., Huang G.-B., Cheung Y.M. 2005. Classifying protein sequences using hydropathy blocks. Pattern Recognit., 39 (12): 2293-2300.
  • [12] Hayat M., Khan A. 2010. Membrane protein prediction using wavelet decomposition and pseudo amino acid based feature extraction. 6th International Conference on Emerging Technologies (ICET). IEEE, pp 1–6,18-19 Oct, Islamabad, Pakistan.
  • [13] Chaturvedi B., Patil N. 2015. A novel semi-supervised approach for protein sequence classification. Souvenir of the IEEE International Advance Computing Conference, IACC 2015, 12-13 June, Banglore, India.
  • [14] Mansoori E.G., Zolghadri M.J., Katebi S.D. 2009. Protein superfamily classification using fuzzy rule-based classifier. IEEE Trans Nanobioscience, 8 (1): 92-99.
  • [15] Lacey A., Deng J., Xie X. 2014. Protein classification using Hidden Markov models and randomised decision trees. 7th International Conference on Biomedical Engineering and Informatics. IEEE, pp 659–664. 14-16 Oct, Dalian, China.
  • [16] Iqbal M.J., Faye I., Samir B.B., Md Said A. 2014. Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics. Sci World J; 2014: 1-12.
  • [17] Yang W.-Y., Lu B.-L., Yang Y. 2006. A Comparative Study on Feature Extraction from Protein Sequences for Subcellular Localization Prediction. IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology. IEEE, pp 1–8, 28-29 Sept, Toronto, Ont., Canada.
  • [18] Gopalakrishnan K., Khaitan S.K., Choudhary A., Agrawal A. 2017. Deep Convolutional Neural Networks with transfer learning for computer vision-based data-driven pavement distress detection. Constr Build Mater,157: 322-330.
  • [19] Ullah I., Hussain M., Qazi E.-H., Aboalsamh H. 2018. An automated system for epilepsy detection using EEG brain signals based on deep learning approach. Expert Syst Appl., 107: 61- 71.
  • [20] Documentation K. 2019. Keras. https://keras.io/. (Accessed: 01.10.2019).
  • [21] He K., Zhang X., Ren S., Sun J. 2016. Deep residual learning for image recognition. IEEE on Computer Vision and Pattern Recognition. pp: 770-778. 27-30 June. LasVegas, NV, USA.
  • [22] Zagoruyko S., Komodakis N. 2017. Wide residual networks. arXiv 2017;1–15.
  • [23] Reddy N., Rattani A., Derakhshani R. 2018. Comparison of Deep Learning Models for Biometric-based Mobile User Authentication. IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS), 22-25 Oct, Redondo Beach, CA, USA.
  • [24] Arslan Tuncer S., Akılotu B., Toraman S. 2019. A deep learning-based decision support system for diagnosis of OSAS using PTT signals. Med Hypotheses, 127: 15-22.
  • [25] Toraman S., Girgin M., Üstündağ B., Türkoğlu İ. 2019. Classification of the likelihood of colon cancer with machine learning techniquesusing FTIR signals obtained from plasma. Turkish J Electr Eng Comput Sci., 27: 1765-1779.
  • [26] Chrysostomou C., Seker H. 2016. Structural classification of protein sequences based on signal processing and support vector machines. Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, pp 3088-3091, 16-20 Aug, Orlando, FL, USA.
  • [27] Iqbal M.J., Faye I., Said A.M. 2015. Belhaouari Samir B. Optimized tree-classification algorithm for classification of protein sequences. International Symposium on Mathematical Sciences and Computing Research (iSMSC). IEEE, pp 110–115, 19-20 May, Ipon, Malaysia.
  • [28] Shawkat Z.M., Rahman J. 2018. Prediction of Protein-Protein Interaction from Amino Acid Sequence Using Ensemble Classifier. International Conference on Computer, Communication, Chemical, Material and Electronic Engineering (IC4ME2). 8-9 Feb, Rajshahi, Bangladesh.
Bitlis Eren Üniversitesi Fen Bilimleri Dergisi-Cover
  • Yayın Aralığı: Yılda 4 Sayı
  • Başlangıç: 2012
  • Yayıncı: Bitlis Eren Üniversitesi Rektörlüğü