Yumurtalık Kanseri Veri Kümesindeki Gen İfadelerinin Veri Madenciliği ile Analizi

Biyoinformatik, moleküllerle ilgili bilgiyi anlamak ve düzenlemek için bilgisayar bilimleri, matematik, istatistik, biyoloji ve fizik gibi karmaşık disiplinlerden türetilen bir alandır. Biyoinformatiğin temel görevi, genomu verilen bir organizma ile ilgili fonksiyonların anlaşılması ve yaşam kalitesinin yükseltilmesidir. Biyoinformatik veritabanları nükleotid dizileri, protein dizileri, makro moleküler üç boyutlu (3D) yapılar gibi farklı veri türlerinden oluşabilmektedir. Biyoinformatik alanında elde edilen devasa boyuttaki verileri veri madenciliği yöntemleri kullanarak işlemek büyük önem kazanmaktadır. Bu çalışmada bioinformatik alanında veri madenciliğinin günümüz disiplinleri arasında geldiği noktaya değinilmiş ve kanser veri kümeleri ile veri madenciliği üzerine yapılan çalışmalar ve gerçekleştirilen uygulamalar incelenmiştir. Yapılmış uygulamalar ışığında yumurtalık kanseri verilerinin çeşitli öznitelik seçme ve sınıflandırma yöntemleri ile modellenerek algoritmaların doğruluk oranları incelenip karşılaştırılmıştır.

Analysis of Gene Expressions in Ovarian Cancer Data Set by using Data Mining

Bioinformatics is a field that is derived from the complicated disciplines such as computer sciences, mathematics, statistics, biology, and physics, in order understand and organize the knowledge with molecules. The fundamental role of bioinformatics is to understand the organism that is given its genomes and to increase the quality of the standard life. Bioinformatics data bases may consist of different data types such as nucleoid sequences, protein sequences, 3D structures of the macro molecules. The processing of the huge amount of bioinformatics data is one of the exciting area for researchers. In this study, state-of-the-art of the data mining in bioinformatics is shortly explained and the studies are investigated that have performed on the cancer databases by using data mining. Ovarian cancer database is used and different feature selection and classification methods are implemented and the results are compared.

___

  • Hornick, F.M., Marcadé, E., Venkayala, S. (2007). Java Data Mining: Strategy, Standard and Practice a Practical Guide for Architecture, Design and Implementation. Morgan Kaufman.
  • Larose, D.T. (2005). Discovering Knowledge in Data: An Introduction to Data Mining. Wiley Publishing.
  • Berger, A. M. D. (2005). Identification of factors associated with postoperative pneumonia using a data mining approach. Boston College Dissertations and Theses, AAI3161705.
  • Edelstein, H.A. (1999). Introduction to Data Mining and Knowledge Discovery. 3. baskı, Two Crows Corporation.
  • Olson, D.L., Delen, D. (2008). Advanced data mining techniques. Springer.
  • Chen, K. H., Wang, K. J., Wang, K. M., & Angelia, M. A. (2014). Applying particle swarm optimization- based decision tree classifier for cancer classification on gene expression data. Applied Soft Computing, 24, 773-780.
  • Nguyen, T., Khosravi, A., Creighton, D., & Nahavandi, S. (2015). Hidden Markov models for cancer classification using gene expression profiles. Information Sciences, 316, 293-307.
  • Cao, J., Zhang, L., Wang, B., Li, F., & Yang, J. (2015). A fast gene selection method for multi-cancer classification using multiple support vector data description. Journal of biomedical informatics, 53, 381-389.
  • Banka, H., & Dara, S. (2015). A Hamming distance based binary particle swarm optimization (HDBPSO) algorithm for high dimensional feature selection, classification and validation. Pattern Recognition Letters, 52, 94-100.
  • Lotfi, E., & Keshavarz, A. (2014). Gene expression microarray classification using PCA–BEL. Computers in biology and medicine, 54, 180-187.
  • Vanitha, C. D. A., Devaraj, D., & Venkatesulu, M. (2015). Gene Expression Data Classification Using Support Vector Machine and Mutual Information- based Gene Selection. Procedia Computer Science, 47, 13-21.
  • Nguyen, T., Khosravi, A., Creighton, D., & Nahavandi, S. (2015). A novel aggregate gene selection method for microarray data classification. Pattern Recognition Letters, 60, 16-23.
  • Du, D., Li, K., Li, X., & Fei, M. (2014). A novel forward gene selection algorithm for microarray data. Neurocomputing, 133, 446-458.
  • Xiang, J., Han, X., Duan, F., Qiang, Y., Xiong, X., Lan, Y., & Chai, H. (2015). A novel hybrid system for feature selection based on an improved gravitational search algorithm and k-NN method. Applied Soft Computing, 31, 293-307.
  • Chandra, B., & Babu, K. N. (2014). Classification of gene expression data using spiking wavelet radial basis neural network. Expert systems with applications, 41(4), 1326-1330.
  • Latkowski, T., & Osowski, S. (2015). Computerized system for recognition of autism on the basis of gene expression microarray data. Computers in biology and medicine, 56, 82-88.
  • Ovarian Cancer (NCI PBSII Data), http://datam. i2r.a-star.edu.sg/datasets/krbd/OvarianCancer/ OvarianCancer-NCI-PBSII.html (Aralık 2015)
  • Petricoin, E. F., Ardekani, A. M., Hitt, B. A., Levine, P. J., Fusaro, V.A., Steinberg, S. M., Mills, G. B., Simone, C., Fishman, D. A., Kohn, E. C. and Liotta, L.A. (2002). Use of proteomic patterns in serum to identify ovarian cancer. The Lancet,359, 572–77.
  • Liu, H., Li, J., & Wong, L. (2002). A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome informatics, 13, 51-60.
  • Kalousis, A., Prados, J., & Hilario, M. (2007). Stability of feature selection algorithms: a study on high-dimensional spaces. Knowledge and information systems, 12(1), 95-116
  • Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine learning, 46(1-3), 389-422.
  • Saeys, Y., Abeel, T., & Van de Peer, Y. (2008). Robust feature selection using ensemble feature selection techniques. In Machine learning and knowledge discovery in databases (pp. 313-325). Springer Berlin Heidelberg.