Hybrid SPR algorithm to select predictive genes for effectual cancer classification

Designing an automated system for classifying DNA microarray data is an extremely challenging problem because of its high dimension and low amount of sample data. In this paper, a hybrid statistical pattern recognition algorithm is proposed to reduce the dimensionality and select the predictive genes for the classification of cancer. Colon cancer gene expression profiles having 62 samples of 2000 genes were used for the experiment. A gene subset of 6 highly informative genes was selected by the algorithm, which provided a classification accuracy of 93.5%.

Hybrid SPR algorithm to select predictive genes for effectual cancer classification

Designing an automated system for classifying DNA microarray data is an extremely challenging problem because of its high dimension and low amount of sample data. In this paper, a hybrid statistical pattern recognition algorithm is proposed to reduce the dimensionality and select the predictive genes for the classification of cancer. Colon cancer gene expression profiles having 62 samples of 2000 genes were used for the experiment. A gene subset of 6 highly informative genes was selected by the algorithm, which provided a classification accuracy of 93.5%.

___

  • H.E. Shortliffe, Biomedical Informatics: Computer Applications in Health Care and Biomedicine, Berlin, Springer, 200 D. West, P. Mangiameli, R. Rampal, V. West, “Ensemble strategies for a medical diagnosis decision support system: a breast cancer diagnosis application”, European Journal of Operation Research, Vol. 162, pp. 532–551, 2005.
  • A. Azuaje, “Interpretation of genome expression patterns: computational challenges and opportunities”, IEEE Engineering in Medicine and Biology, Vol. 19, p. 119, 2000.
  • J. DeRisi, L. Penland, P.O. Brown, M.L. Bittner, P.S. Meltzer, M. Ray, Y. Chen, Y.A. Su, J.M. Trent, “Use of a cDNA microarray to analyze gene expression patterns in human cancer”, Natural Genetics, Vol. 4, pp. 457–460, 19 W. Dubitzky, M. Granzow, D. Berrar, S. Bulashevska, C. Conrad, D. Gerlich, R. Eils, “Comparing symbolic and subsymbolic machine learning approaches to classification of cancer and gene identification”, Methods of Microarray Data Analysis, pp. 151–165, 2002.
  • T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, E.S. Lander, “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring”, Science, Vol. 286, pp. 531–537, 1999.
  • S.L. Pomeroy, P. Tamayo, M. Gassenbeek, L.M. Sturla, M. Angelo, M.E. McLaughlin, Y.H. Kim, L.C. Goumnerova, M. Black, C. Lau, C. Allen, D. Zagzag, J.M. Olson, T. Curran, C. Wetmore, J.A. Biegel, T. Poggio, S. Mukherjee, R. Rifkin, A. Califano, G. Stolovitzky, D.N. Louis, J.P. Mesirov, E.S. Lander, T.R. Golub, “Prediction of central nervous embryonal tumor outcome based on gene expression”, Nature, Vol. 415, pp. 436–442, 2002.
  • T. Sİrile, C.M. Perou, R. Tibshirani, T. Aas, S. Geisler, H. Johnsen, T. Hastie, M.B. Eisen, M. Rijn, S.S. Jeffrey, T. Thorsen, H. Quist, J.C. Matese, P.O. Brown, D. Botstein, P.E. Lİnning, A.L. Bİrresen-Dale, “Gene expression patterns of breast carcinomas distinguish tumor subclass with clinical implications”, Proceedings of the National Academy of Science of the USA, Vol. 98, pp. 10869–10874, 2001.
  • L.J. Van’t Veer, D. De Jong, “The microarray way to tailored cancer treatment”, Nature Medicine, Vol. 8, pp. 13–14, 2002.
  • L.J. van’t Veer, H. Dai, M.J. van de Vijver, Y.D. He, A.A. Hart, M. Mao, H.L. Peterse, K. van der Kooy, M.J. Marton, A.T. Witteveen, G.J. Schreiber, R.M. Kerkhoven, C. Roberts, P.S. Linsley, R. Bernards, S.H. Friend, “Gene expression profiling predicts clinical outcome of breast cancer”, Nature, Vol. 415, pp. 530–536, 2002.
  • D.A. Zajchowski, M.F. Bartholdi, Y. Gong, L. Webster, H.L. Liu, A. Munishkin, C. Beauheim, S. Harvey, S.P. Ethier, P.H. Johnson, “Identification of gene expression profiles that predict the aggressive behavior of breast cancer cells”, Cancer Research, Vol. 61, pp. 5168–5178, 2001.
  • T. Hastie, R. Tibshirani, M.B. Eisen, A. Alizadeh, R. Levy, W.C. Chan, D. Botstein, P.O. Brown, “Gene shaving as a method for identifying distinct sets of genes with similar expression patterns”, Genome Biology, Vol. 1, pp. 1–21, 2000.
  • S. Lakhani, A. Ashworth, “Microarray and histopathological analysis of tumors: the future and the past?”, Nature Reviews Cancer, Vol. 1, pp. 151–157, 2001.
  • D. Slonim, P. Tamayo, J. Mesirov, T. Golub, E. Lander, “Class prediction and discovery using gene expression data”, Proceedings of the 4th International Conference on Computational Molecular Biology, pp. 263–272, 2000. Y. Saeys, I. Inza, P. Larranaga, “A review of feature selection techniques in Bioinformatics”, Bioinformatics, Vol. 23, pp. 2507–2517, 2007.
  • D. Zongker, A. Jain, “Algorithms for feature selection: an evaluation”, Proceedings of the 13th International Conference on Pattern Recognition, Vol. 2, pp. 18–22, 1996.
  • I. Guyon, A. Elisseeff, “An introduction to variable and feature selection”, Journal of Machine Learning Research, Vol. 3, pp. 1157–1182, 2003.
  • A.A. Goshtasby, Image Registration: Principles, Tools and Methods, Berlin, Springer, 2012.
  • N.V. Vapnik, Statistical Learning Theory, New York, Wiley, 1998.
  • Weka: A multi-task machine learning software. Available at http://www.cs.waikato.ac.nz/ml/weka. P. Baldi, S. Brunak, Y. Chauvin, F. Anderson, H. Nielsen, “Assessing the accuracy of prediction algorithms for classification and overview”, Bioinformatics, Vol. 16, pp. 412–424, 2000.
  • National Center for Biotechnology Information, Colon Cancer Data, U.S. National Library of Medicine, available at http://www.ncbi.nlm.nih.gov. G.M. Groisman, S. Polak-Charcon, H.D. Appelman, “Fibroblastic polyp of the colon: clinicopathological analysis of 10 cases with emphasis on its common association with serrated crypts”, Histopathology, Vol. 48, pp. 431–437, 200 U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, A.J. Levine, “Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays”, Cell Biology, Vol. 95, pp. 6745–6750, 1999.
  • GeneCards, DES Gene GeneCards - DESM Protein - DESM Antibody, available at http://www.genecards.org/cgibin/carddisp.pl?gene = DES. Z. Chen, J. Li, L. Wei, “A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue”, Artificial Intelligence in Medicine, Vol. 41, pp. 161–175, 2007.
  • P. Mahata, K. Mahata, “Selecting differentially expressed genes using minimum probability of classification error”, Journal of Biomedical Informatics, Vol. 40, pp. 775–786, 2007.
  • S. Barnhill, I. Guyon, J. Weston, US Patent US20050165556 - Colon Cancer Biomarkers, 2005. Available at http://www.google.com/patents/US20050165556.
  • L. Sun, D. Miao, H. Zhang, “Gene selection with rough sets for cancer classification”, 4th International Conference on Fuzzy Systems and Knowledge Discovery, Vol. 3, pp. 167–172, 2007.
  • J. Li, X. Tang, J. Liu, J. Huang, Y. Wang, “A novel approach to feature extraction from classification models based on information gene pairs”, Pattern Recognition, Vol. 41, pp. 1975–1984, 2008.
  • X. Li, S. Rao, T. Zhang, Z. Guo, Q. Zhang, K. Moser, E. Topol, “An ensemble method for gene discovery based on DNA microarray data”, Science in China Series C, Vol. 47, pp. 396–405, 2004.
  • C. Shi, L. Chen, “Feature dimension reduction for microarray data analysis using locally linear embedding”, Asia Pacific Bioinformatics Conference, pp. 211–217, 2005.
  • T.J. Umpai, S. Aitken, “Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes”, Bioinformatics, Vol. 6, pp. 168–174, 2005.
  • L. Yu, H. Liu, “Redundancy based feature selection for microarray data”, Department of Computer Science and Engineering of Arizona State University, Technical Report, 2004.
  • S. Kim, “Spectral methods for cancer classification using microarray data”, International Conference on Computational Sciences and Optimization, Vol. 1, pp. 588–592, 2009.
  • Y. Wang, F.S. Makedon, J.C. Ford, J. Pearlman, “HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data”, Bioinformatics, Vol. 21, pp. 1530–1537, 2005.
  • S.M. Alladi, P.S. Santosh, V. Ravi, U.S. Murthy, “Colon cancer prediction with genetic profiles using intelligent techniques”, Bioinformation, Vol. 3, pp. 130–133, 2008.
  • M.L. Hou, S.L. Wang, X.L. Li, Y.K. Lei, “Neighborhood rough set reduction based gene selection and prioritization for gene expression profile analysis and molecular cancer classification”, Journal of Biomedicine and Biotechnology, Vol. 2010, pp. 1–12, 2010.
  • C. Ding, H. Peng, “Minimum redundancy feature selection from microarray gene expression data”, Proceedings of Computational Systems Bioinformatics, pp. 185–205, 2003.
  • X. Liu, A. Krishnan, A. Mondry, “An entropy-based gene selection method for cancer classification using microarray data”, BMC Bioinformatics, Vol. 6, p. 76, 2005.
  • Z. Wang, V. Palade, Y. Xu, “Neuro-fuzzy ensemble approach for microarray cancer gene expression data analysis”, Proceedings of the 2nd International Symposium on Evolving Fuzzy System, pp. 241–246, 2006.
  • G. Zhang, H.W. Deng, “Gene selection for classification of microarray data based on the Bayes error”, BMC Bioinformatics, Vol. 8, p. 370, 2007.
Turkish Journal of Electrical Engineering and Computer Science-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: Yılda 6 Sayı
  • Yayıncı: TÜBİTAK