Sayısal haritalama teknikleri ve Fourier dönüşümü kullanılarak DNA dizilimlerinin sınıflandırılması

Bir DNA dizilimindeki bazların oluşturdukları kombinasyonlar, o DNA dizilimindeki bir gene karşılık gelir, bu genlerden de RNA kopya dizilimleri çıkarılır. Kopyalanan bu RNA'lar oluşurken genin baz dizilimi baştan sona tümüyle okunmaz. Genlerin okunmayan ve kodlanmayan bölümüne intron, kodlanan kısımlarına ise ekson denir. Bir DNA dizilimindeki protein nerede, ne kadar kodlanır? Büyüme ve gelişme nerede düzenlenir? Kök hücreler nerede başka hücreye dönüştürülür? Tüm bu soruların cevabı ve kanser gibi genetik hastalıkların araştırılması DNA dizilimlerinin ekson ve intron olarak sınıflandırmasıyla mümkündür. Çalışmanın amacı, DNA diziliminin ekson ve intron olarak sınıflandırılmasında farklı sayısal haritalama tekniklerinin performanslarını karşılaştırmaktır. Bu amaç doğrultusunda insan türünün MEFV genine ait DNA dizilimleri, 9 farklı haritalama tekniği ile sayısal dizilere dönüştürülmüştür. Dönüştürülen sayısal dizilerin sınıflandırılmasında Ayrık Fourier Dönüşümü yöntemi kullanılmıştır. Bu yöntemde 4 farklı pencereleme fonksiyonu kullanılmış, sınıflandırma başarımları karşılaştırılmıştır. Ayrıca Fourier tabanlı yöntemle elde edilen sonuçlar, Destek Vektör Makineleri ve K-en yakın komşu algoritması gibi makine öğrenme tabanlı yöntemlerle karşılaştırılmıştır. İnteger haritalama tekniği Ayrık Fourier Dönüşümü yönteminde %96,2 ile diğer makine öğrenme yöntemlerine göre en yüksek sınıflandırma başarımı sağlamıştır.fonksiyonlarından daha yüksek çıkmıştır.

Classification of DNA sequences using numerical mapping techniques and Fourier transformation

The combinations of bases in a DNA sequence correspond to a gene in that DNA sequence, RNA copy sequences are extracted from these genes. When these copied RNA's extracted, the base sequence of gene is not read from the beginning to the end completely. The uncoded and unreadable section of the gene is called 'intron' and the coded section of the gene is called 'exon'. Where is a protein coded? How much is encoded? Where are growth and development regulated? Where are stem cells converted to other cells? The answer to all of these questions and the investigation of genetic diseases, such as cancer, are possible by DNA sequences that can be classified as the exon and intron. The aim of this study is to compare the performance of different digital mapping techniques for classification of DNA sequence as the exon or intron. For this purpose, DNA sequences of the MEFV gene in human species are transformed to numeric sequences by nine different digital mapping techniques. The Discrete Fourier Transform Method (DFT) is used to classify these transformed sequences. Four different windowing functions are used and their classification performance are compared in this method. Also, the results obtained from the Fourier-based method have been compared using the Support Vector Machine and the K-Nearest Neighbor methods. Integer mapping technique achieved the highest classification performance with 96.2% in the DFT method than other machine learning methods. Classification performance in Hamming windowing function is higher than other windowing functions.

___

  • Internet: http://schoolworkpelher.net/dna-mrna-introns- and-exons, Erişim Tarihi: 01.01.2015.
  • Kwan J.Y.Y., Kwan B.Y.M., Kwan H.K., Spectral Analysis of Numerical Exon and Intron Sequences, Proceedings of IEEE International Conference on Bioinformatics and Biomedicine Workshops, Hong Kong, 876-877, 2010.
  • Marhon S.A., Kremer S.J., A dynamic representation- based, de novomethod for protein-coding region prediction and biological information detection, Elsevier, Digital Signal Processing 46, 10-18, 2015.
  • Zhang J., Yang C., DNA Sequence Recognition Based on the Markov Model, 6th International Conference on Biomedical Engineering and Informatics (BMEI 2013), 2013.
  • Mandal S.B., Saha S., Mandal A., Roy M., Prediction of Protein Coding Regions of a DNA Sequence through International Conference on Informatics, Electronics & Vision, 2012. IEEE/OSA/IAPR
  • Xia J., Caragea D., Brown S.J, Prediction of Alternatively Spliced Exons Using Support Vector Machines, Int. J. Data Mining and Bioinformatics, 4 (4), 411-30, 2010.
  • Dror G., Sorek R., Shamir R., Accurate Identification of Alternatively Spliced Exons Using Support Vector Machine, Bioinformatics, 21 (7), 897-901, 2005.
  • Barman S., Saha S., Mandal A., Roy M., Prediction of protein coding regions of a DNA sequence through spectral analysis, Informatics, Electronics & Vision (ICIEV), 2012 International Conference, 18-19 May 2012.
  • Cristea P.D., Genetic Signal Representation and Analysis, Biomedical Optics Symposium, 4623, 77-84, 2002.
  • Chakravarthy N., Spanias A., Lasemidis L.D., Tsakalis K., Autoregressive Modeling and Feature Analysis of DNA Sequences, EURASIP Journal of Genomic Signal Processing, 1,13-28, January 2004.
  • Cristea P.D., Genomic Signals of Reoriented ORFs, EURASIP J. Appl. Signal Process., 1, 132-137, 2004.
  • Berger J.A., Mitra S.K., Carli M., NeriA., New Approaches to Genome Sequence Analysis Based on Digital Signal Processing, IEEE Workshop on Genomic Signal Processing and Statistics (GENSIPS), 1-4, October 2002.
  • Cristea P.D., Conversion of Nucleotides Sequences Into Genomic Signals, [J]. Cell. Mol. Med, 6, 279-303, April-June, 2002.
  • Dougherty E.R., Hmulevich I., Chen J., Wang Z.J., Genomic Signal Processing and Statistics, EURASIP Book Series in Signal Processing and Communications, Hindawi Pub. Corp, ISBN 977-5945-07-0, 2, 15-66, 2005.
  • Andersson J.D., Doolittle W.F., Nesbo C.L., Are There Bugs in Our Genome?, Science, 292, 1848-1850, 2001.
  • Todd Holden R., Subramaniam R., Sullivan E., Cheng C., Sneider G., Tremberger J.A., Flamholz, D. H., Leiberman, and Cheung, T. D., ATCG Nucleotide Fluctuation of Deinococcus Radiodurans Radiation Genes, Proceedings of Society of Photo-Optical Instrumentation Engineers (SPIE), 669417, 1-10, August 2007.
  • Buldyrev S.V., Dokholyan N.V., Goldberger A.L., Havlin S., Peng C.K., Stanley H.E., Viswanathan G.M., Analysis of DNA Sequences Using Methods of Statistical Physics, Physica A, Elsevier, 249, 430-438, 1998.
  • Berger J.A., Mitra S.K., Carli M., Neri A., Visualization and Analysis of DNA Sequences Using DNA Walks, Journal of the Franklin Institute, 341, 37- 53, January-March 2004.
  • Buldyrev S.V., Goldberger A.L., Havlin S., Stanley H.E., Long-Range Correlation Properties of Coding and Noncoding DNA Sequences: GenBank Analysis, Phy. Rev. E, 51 (5), 5084-5091, May 1995.
  • Akhtar M., Epps J., Ambikairajah E., Paired Spectral Content Measure for Gene and Exon Prediction in Eukaryotes, International Conference on Information and Emerging Technologies, ICIET 07, 1- 4, July 2007.
  • Nair A.S., Pillai S.S., A Coding Measure Scheme Employing Electron-Ion Interaction Pseudo Potential (EIIP), Journal of Bio-information, 1, 197-202, October, 2006.
  • Chakraborty S., Gupta V., DWT Based Cancer Identification Using EIIP, 2016 Second International Conference Communication Technology (CICT), 12-13 February 2016.
  • Yee Kwan J.Y., Ming Kwan B.Y., Keung Kwan H., Spectral Analysis of Numerical Exon ve Intron Sequences, 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops, 2010.
  • Shakya D.K., Saxena R., Sharma S.N., An Adaptive Window Length Strategy for Eukaryotic CDS Prediction, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 10, 1241 - 1252, 2 July 2013.
  • Datta S., Asif A., A Fast DFT Based Gene Prediction Algorithm For Identification of Protein Coding Regions, ICASSP, 5, 653-656, 2005.
  • Intelligence & Başkent Üniversitesi, http://www.baskent.edu.tr/~aerdamar/LAB1.pdf Erişim Tarihi: 01.01.2015.
  • Internet: İstanbul Teknik Üniversitesi, http://web.itu.edu.tr/~baykut/lab/pdf/Deney_3.pdf, Erişim Tarihi: 01.01.2015.
  • Saberkari H., Shamsi M., Sedaaghi M., Golabi F., Prediction of protein coding regions in DNA sequences using signal processing methods, Industrial Electronics and Applications (ISIEA), 2012 IEEE Symposium on, 23-26 September 2012.
  • Ramachandran P., Lu W.S., Antoniou A., Filter-Based Methodology for the Location of Hot Spots in Proteins and Exons in DNA, IEEE Transactions on Biomedical Engineering, 59, 1598-1609, June 2012.
  • Oppenheim A.V., Schafer R.W., Discrete Time Signal Processing, Prentice Hall, New Jersey, 1989.
  • Söderström T., Stoica P., System Identification, Prentice Hall, Cambridge, 1989.
  • Kayran A.H., Sayısal İşaret İşleme, İstanbul Teknik Üniversitesi, 1990.
  • Proakis J.G., Manolakis D.G., Digital Processing, Prentice Hall, New Jersey, 1996. Signal
  • Avcı K., Kaiser-Hamming Window and Its Performance Analysis For Nonrecursive Digital Filter Design, Journal of the Faculty of Engineering and Architecture of Gazi University, 29 (4), 823-833, 2014.
  • Kaya T., İnce M.C., Design of FIR Filter Using Modeled Window Function With Helping of Artifıcial Neural Networks, Journal of the Faculty of Engineering and Architecture of Gazi University, 27 (3), 599-606, 2012.
  • Karaarslan A., İskender İ., A Novel Method in Power Factor Correction Circuits Using Average Current Control Technique and Digital Signal Processor, Journal of the Faculty of Engineering and Architecture of Gazi University, 26 (1), 193-203, 2011.
  • Abo-Zahhad M., Ahmed S.M., Abd-Elrahman A.S., Genomic Analysis and Classification of Exon and Intron Sequences Using DNA Numerical Mapping Techniques, Technology and Computer Science, 8, 22-36, 2012.
  • Hota M.K., Srivastava V.K., DSP Technique for Gene and Exon Prediction Taking Complex Indicator Sequence, Proc. IEEE TENCON, 1-6, 2008.
  • Sahu S., Panda G. Identification of Protein-Coding Regions in DNA Sequences Using A Time-Frequency Filtering Approach, Genomic Proteomics&Bioinformatics, October 2010
  • Hota M., Srivastava V., Identification of Protein Coding Regions Using Antinotch Filter, Digital Signal Processing, 22, 869-877, June, 2012.
  • Vaidyanathan P.P., Yoon B.J., The Role of Signal- Processing Concepts in Genomics and Proteomics, J. Franklin Inst. 341, 111-135, 2004.
  • Vaidyanathan P.P., Yoon B.J., Gene and Exon Prediction Using Allpass-Based Flters, Workshop on Genomic Signal Process. Stat., Raleigh, NC, 2002.
  • Mena-Chalco J., Carrer H., Zana Y., Cesar R.M., IdentiŞcation of Protein Coding Regions Using the ModiŞed Gabor-Wavelet Transform, IEEE/ACM Trans. Comput.Biol. Bioinformatic., 5, 198-207, 2008.
  • Kotlar D., Levner Y., Gene Prediction by Spectral Rotation Measure: A New Method for Identifying Protein-Coding Regions, Genome Res., 13, 1930- 1937, 2003.
  • Ramachandran P., Lu W.S., Antoniou A., Location of Exons in DNA Sequences Using Digital Flters, Proceedings of IEEE, 2337-2340, 2009.
  • Akhtar M., Epps J., Ambikairajah E., Time and Frequency Domain Methods for Gene and Exon Prediction in Eukaryotes, Proc. IEEE ICASSP, 573- 576, 2007.
  • Kwan H.K., Arniker S.B., Numerical Representation of DNA Sequences IEEE Inter, Conf. on Electro/Information Technology, EIT '09, Windsor, 307-310, 2009.
  • Cristea P.D., Representation and analysis of DNA sequences. in Genomic signal Processing and Statistics, EURASIP Book Series in Signal Processing and Communications, (Eds) Edward R. Dougherty et al Hindawi Pub., 2, 15-66, 2005.
  • Kwan J.Y.Y., Kwan B.Y.M., Kwan H.K., Novel Methodologies for Spectral Classification of Exon and Intron Sequences, EURASIP Journal on Advances in Signal Processing, 2012.
  • Das B., Türkoğlu İ., DNA Dizilimlerindeki Nükleotid Çiftlerinin Frekans Değerlerine Göre Farklı Sınıflandırma Yöntemleri ile Karşılaştırılması, Tıp Teknolojileri Ulusal Kongresi, 2014.
  • Law N.F., Cheng K., Siu W., On Relationship of ZCurve and Fourier Approaches for DNA Coding Sequence Classification, Bioinformation, 242-246, 2006.
  • Akhtar M., Epps J., Ambikairajah E., On DNA Numerical Representations for Period-3 Based Exon Prediction, IEEE Workshop on Genomic Signal Processing and Statistics (GENSIPS), 1-4, June 2007.
  • Saberkari H., Shamsi M., Sedaaghi M.H., Golabi H., Prediction of protein coding regions in DNA sequences using signal processing methods, IEEE Symposium on Industrial Electronics and Aplications (ISIEA), September 23-26, Bandung Indonesia, 2012.
  • Zhang L., Tian F., Wang S., A Modified Statistically Optimal Null Filter Method for Recognizing Proteincoding Regions, SciVerse ScienceDirect, Genomics Proteomics Bioinformatics 10, 166-173, 2012.
  • Ensembl Genbankası veritabanı, online erişim: http://www.ensembl.org
  • Yücesoy E., Nabiev V., Determination of a speaker's age and gender with an SVM classifier based on GMM supervectors, Journal of the Faculty of Engineering and Architecture of Gazi University, 31 (3), 501-509, 2016.
  • Sengur A., Multiclass Least-Squares Support Vector Machines for Analog Modulation Classification, Expert Systems with Applications, 36 (3), 6681-6685, 2009.
  • Yıldız O., Tez M., Bilge H.Ş., Akcayol M.A., Güler İ., Gene Selection for Breast Cancer Classification Based on Data Fusion and Genetic Algorithm, Journal of the Faculty of Engineering and Architecture of Gazi University, 27 (3), 659-668, 2012.
  • Kumar M., Gromiha M.M., Raghava G.P.S., Identification of DNA-Binding Proteins Using Support Vector Machines and Evolutionary Profiles, BMC Bioinformatics, 463 (8), 1471-2105, 2007.
  • Kwan B., YM., Kwan J., YY., Kwan H.K., Spectral Classification of Short Numerical Exon and Intron Sequences, BMC Bioinformatics, DOI: 10.1186/1471- 2105-12-S11-A13, 2011.
Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi-Cover
  • ISSN: 1300-1884
  • Yayın Aralığı: Yılda 4 Sayı
  • Başlangıç: 1986
  • Yayıncı: Oğuzhan YILMAZ
Sayıdaki Diğer Makaleler

AKI BARİYERLİ TLA TİPİ SENKRON RELÜKTANS MOTOR TASARIMI VE OPTİMİZASYONU

Yusuf ÖNER, Metin ERSÖZ, Okan BİNGÖL

Katı Kümeleme ve Yeni Bir Geçiş Fonksiyonuyla Uzman Karışımlarında Sınıflandırma

Faruk BULUT, Mehmet Fatih AMASYALI

ELEKTROKOAGÜLASYON REAKTÖRÜNDE BULANIK KONTROL METODU İLE PH, İLETKENLİK VE SICAKLIĞIN EŞ ZAMANLI KONTROLÜ

Ayla Altınten, Yavuz Demirci, Lütfiye Canan Pekel, Mustafa Alpbaz

YÜKSEK GERİLİMLİ DOĞRU AKIM İLETİM SİSTEMLERİ İÇİN AKTİF DOĞRU AKIM FİLTRESİ TASARIMI VE SİMÜLASYONU

Murat AKDEMİR, Selçuk YILDIRIM, Naci GENÇ

KEDİ SÜRÜSÜ OPTİMİZASYON ALGORİTMASIYLA DOĞRU VE ANLAŞILABİLİR NÜMERİK SINIFLANDIRMA KURALLARININ OTOMATİK KEŞFİ

Bilal ALATAŞ, Sinem AKYOL

KOBİ’lere sağlanan desteklerin performans etkinlik sıralarının Promethee ve Oreste yöntemleri ile belirlenmesi

Mehmet Akif YERLİKAYA, Feyzan ARIKAN

Akı bariyerli TLA tipi senkron relüktans motor tasarımı ve optimizasyonu

Okan BİNGÖL, Yusuf ÖNER, Metin ERSÖZ

Üç ayrık ölçüme dayalı parabol algoritması ile termoelektrik modülün Imax, Vmax ve Emax parametrelerinin belirlenmesi

Serkan DİŞLİTAŞ, Raşit AHISKA

BETA TİPİ RHOMBİC HAREKET MEKANİZMALI BİR STİRLİNG MOTORUNUN TASARIMI VE PERFORMANS TESTLERİ

Fatih AKSOY, Halit KARABULUT, Can ÇINAR, Hamit SOLMAZ, Yaşar Önder ÖZGÖREN, Muhammed ARSLAN

BİLYALI TİP SANTRİFÜJ KAVRAMANIN MATEMATİKSEL VE DENEYSEL İNCELENMESİ

İsmail TÜRKBAY