Lojistik Regresyonun Özellik Azaltma Teknikleri ile Gen Dizilimlerinin Sınıflandırılmasındaki Başarısı

Gen dizilimlerinin sınıflandırılması, hastalıkların ön görülebilmesi veya teşhis edilebilmesinde çok önemli rol oynamaktadır. Bütün gen dizilimi üzerinde etkili bir sınıflandırma yapabilmek mümkün olmadığından sağlıklı bir sınıflandırma yapılabilmesi için gerekli bilgiyi içeren genlerin (özelliklerin) özellik azaltma algoritmaları ile ayıklanması önem taşımaktadır. Bu çalışmada, özellikleri azaltmak için sezgisel arama teknikleri, özellik azaltma yaklaşımları(filter, wrapper, vb.) gibi farklı yöntemler analiz edilerek ön işleme adımının daha etkin bir şekilde gerçekleştirilmesi; bunun sonucunda elde edilen veri kümelerinin LR (Lojistik Regresyon) ve SVM (Destek Vektör Makineleri) gibi güçlü sınıflandırma araçları ile daha etkin şekilde sınıflandırılması hedeflenmiştir. Makine öğrenmesinde güçlü bir sınıflandırıcı olarak kabul edilen LR sınıflandırıcısı, özellik eksiltme yöntemleri ile gen dizilimlerinin sınıflandırılmasında SVM kadar geçerli ve etkin sınıflama aracı haline gelmiştir.

The Success Of Logistic Regression With Feature Reduction Techniques On Microarray Gene Classification

DNA microarray classification is important to discovery of differentially expressed genes between normal and diseased patients are a central research problem in bioinformatics. All the genes used in the expression profile are not informative. Further, many of them are redundant. A pre-processing step in order to reduce the number of genes by feature selection and still retaining best class prediction accuracy for the cla1 ssifier is crucial for precise tumor classification. In this study comparison between class prediction accuracy of two different classifiers, LR (Logistic Regression) and SVM (Support Vector Machines), was carried out using the best genes select by wrapper and filter technique to use heuristic search methods. We conclude that LR together with heuristic search based feature selection is the as efficient as SVM to the microarray gene prediction techniques.

___

  • Ben-Dor, A., Shamir, R., Yakhini, Z., 1999, Clustering gene expression patterns ,J Comput Biol, 6(3): 281–97.
  • Roberts, C.J., Nelson, B., Marton, M.J., Stoughton, R., Meyer, M.R., Bennett, H.A., 2000, Signaling and circuitry ofmultiple Mapk pathways revealed by a matrix of global gene expression profiles, Science, 287: 873–80.
  • Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., Yakhini, Z., 2000, Tissue classification with geneexpression profiles, In: Proceedings of the Fourth International Conference on Computational Molecular Biology. Tokyo: Universal Academic Press.
  • Alizadeh, A., Eisen, M.B., Davis, R.E., Ma C Lossos, I.S., Rosenwald, A., 2000, Distinct types of diffuse largeB-cell lymphoma identified by gene expression profiling, Nature, 403: 503–11.
  • Wang, X., Gotoh, O., 2010, A robust gene selection method for microarray-basedcancer classification, Cancer Inf, 9:15–30.
  • Ruiz, R., Riquelme, J.C., Aguilar-Ruiz, J. S., 2005, Incremental wrapper-based gene selection from microarray datafor cancer classification, Pattern Recognition, 39: 2383 – 2392.
  • Langley, P., 1994, Selection of relevant features in machine learning, In: Proceedings of the AAAI Fall Symposium on Relevance.
  • Kohavi, R., John,G., 1997, Wrappers for feature subset selection, Artif. Intell. 1–2: 273–324.
  • Alter, O., Brown, P.O., Botstein, D., 2000, Singular value decomposition for genomewide expression data processing and modeling, Proc. Natl. Acad. Sci., 97(18).
  • Cangelosi, R., Goriely, A., 2007, Component retention in principal component analysiswith application to cdna microarray data, Biol. Direct, 2:1–21.
  • Liu, K., Li, B., Wu,Q.Q., Zhang, J. , Du, J.X., Liu,G.Y., 2009, Microarray data classification based on ensemble independent component selection, Comput. Biol. Med., 39(11): 953–960.
  • Inza, I., Larranaga, P., Blanco, R., Cerrolaza, A., 2004, Filter versus wrappergene selection approaches in DNA microarray domains. Artif. Intell. Med., 31(2): 91–103.
  • Pohar, M., Blas, M., Turk, S., 2004, Comparison of Logistic Regression and Linear. Discriminant Analysis: A Simulation Study”, Metodološki zvezki, 1: 143-161.
  • Maroco, J., Silva, D., Rodrigues, A., Guerreiro, M., Santana, I., Mendonça, A., 2011, Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests, BMC Research Notes, 4:299.
  • Hall, M.A., Smith, L.A., 1997, Feature subset selection: A Correlation Based Filter Approach, In International Conference on Neural Information Processing and Intelligent Information Systems. Berlin: Springer, 855-858.
  • Jackson, J., 1991, A users guide to principal components, Wiley & Sons, New York.
  • Loh, W., 2006, Logistic regression tree analysis, In Springer Handbook of Engineering Statistics, 537-551.
  • Breiman, L., Friedman, H., Olshen, J., Stone, C., 1984, Classification and Regression Trees, Belmont, CA: Wadsworth.
  • Le Cessie, S., Van Houwellingen, J.C., 1992, Ridge Estimators in Logistic Regression, University of Leiden, the Netherlands. Appl. Statist., 41(1): 191-201.
  • Liu, D., Ghosh, D., lin, X., 2008, Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models, BMC Bioinformatics.
  • Bartenhagen, C.,Klein, H.U., Ruckert, C., Jiang, X., Dugas, M., 2010, Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data, BMC Bioinformatics, 11(567).
  • Kim , K.J., Cho , S.B., 2006, Ensemble classifiers based on correlation analysisfor DNA microarray classification, Neurocomputing, 70:187-199.
  • Nguyen, D.V., Rocke, D.M., 2002, Tumor classification by partial leastsquares using microarray gene expression data, Bioinformatics, 18: 39–50.
  • Cortes, C., Vapnik, V., 1995, Support-Vector Networks, Machine Learning, 20: 273-297.
  • Smith, L.I., 2002, A tutorial on Principal Components Analysis.
  • Dagliyan, O., Uney-Yuksektepe, F., Kavakli, I.H, Turkay, M., 2011, Optimization Based Tumor Classification from Microarray Gene Expression Data.
  • Vimaladevi, M., Kalaavathi, B., 2014, Cancer Classification using Hybrid Fast ParticleSwarm Optimization with BackpropagationNeural Network, International Journal of Advanced Research in Computer and Communication Engineering, 3(11).
  • Paulya, F., Smedbyc, K.E., Jerkemand, M., Hjalgrime, H., Ohlssonf, M., Rosenquist, R., Borrebaecka, C.A.K., Wingrena, C., 2014, Identification of B-cell lymphoma subsets by plasma protein profilingusing recombinant antibody microarrays, Leukemia Research, 38: 682–690.
  • Yan, Z., Li, J.Xiong, Y., Xu, W., Zheng, G., 2012, Identification of candidate colon cancer biomarkers by applying a random forest approach on microarray data, Oncology Reports, 28: 1036-1042.
  • Thorsteinsson, M., Kirkeby, L.T., Hansen, R., Lund L.R., Sørensen L.T., Gerds, T.A., Jess, P., Olsen, J., 2012, Gene expression profiles in stages II and III colon cancers:application of a 128-gene signature, Int J Colorectal Dis, 27: 1579–1586.
  • Bennet, J., Ganaprakasam, C.A., Arputharaj, K., 2014, A Discrete Wavelet Based Feature Extraction and HybridClassification Technique for Microarray Data Analysis, Hindawi Publishing Corporation The Scientific World Journal.
  • www.biomedcentral.com/1471-2105/12/390 /#B12
  • www.cs.waikato.ac.nz/ml/weka/