Comparison of AIS and fuzzy c-means clustering methods on the classification of breast cancer and diabetes datasets

Data reduction is an indispensable part of pattern classification processes in many cases. If the number of samples is excessive, sample reduction or data reduction algorithms can be used for an effective processing time and reliable successive results. Many methods have been used for data reduction. Fuzzy c-means is one of these methods and it is widely used in such applications as clustering algorithms. In this study, we applied a different clustering algorithm, an artificial immune system (AIS), for the data reduction process. We realized the performance evaluation experiments on the standard Chainlink and Iris datasets, while the main application was conducted using the Wisconsin Breast Cancer and Pima Indian datasets, which were taken from the University of California, Irvine Machine Learning Repository. For these datasets, the performance of the AIS in the data reduction process was compared with the fuzzy c-means clustering algorithm, in which a multilayer perceptron artificial neural network was used as a classifier after the data reduction processes. The obtained results show that the maximum classification accuracies were obtained as 73.96 % for the Pima Indian Diabetes dataset and 97.80% for the Wisconsin Breast Cancer dataset with the AIS and the compression rates were 80% and 40% for these results. For fuzzy c-means clustering, however, the aforementioned accuracies were obtained as 63% and 86.69% for the Pima Indian Diabetes and Wisconsin Breast Cancer datasets, respectively. Moreover, the compression rates for these results for fuzzy c-means were 90% and 70%. When the mean classification accuracy values over the experimented compression rates were taken into consideration, the AIS reached a mean classification accuracy of 70.07% for the Pima Indian Diabetes dataset, while 47.64% was obtained by fuzzy c-means for this dataset. For the Wisconsin Breast Cancer dataset, however, the mean classification accuracies of the AIS and fuzzy c-means methods were recorded as 94.90% and 75.43%, respectively.

Comparison of AIS and fuzzy c-means clustering methods on the classification of breast cancer and diabetes datasets

Data reduction is an indispensable part of pattern classification processes in many cases. If the number of samples is excessive, sample reduction or data reduction algorithms can be used for an effective processing time and reliable successive results. Many methods have been used for data reduction. Fuzzy c-means is one of these methods and it is widely used in such applications as clustering algorithms. In this study, we applied a different clustering algorithm, an artificial immune system (AIS), for the data reduction process. We realized the performance evaluation experiments on the standard Chainlink and Iris datasets, while the main application was conducted using the Wisconsin Breast Cancer and Pima Indian datasets, which were taken from the University of California, Irvine Machine Learning Repository. For these datasets, the performance of the AIS in the data reduction process was compared with the fuzzy c-means clustering algorithm, in which a multilayer perceptron artificial neural network was used as a classifier after the data reduction processes. The obtained results show that the maximum classification accuracies were obtained as 73.96 % for the Pima Indian Diabetes dataset and 97.80% for the Wisconsin Breast Cancer dataset with the AIS and the compression rates were 80% and 40% for these results. For fuzzy c-means clustering, however, the aforementioned accuracies were obtained as 63% and 86.69% for the Pima Indian Diabetes and Wisconsin Breast Cancer datasets, respectively. Moreover, the compression rates for these results for fuzzy c-means were 90% and 70%. When the mean classification accuracy values over the experimented compression rates were taken into consideration, the AIS reached a mean classification accuracy of 70.07% for the Pima Indian Diabetes dataset, while 47.64% was obtained by fuzzy c-means for this dataset. For the Wisconsin Breast Cancer dataset, however, the mean classification accuracies of the AIS and fuzzy c-means methods were recorded as 94.90% and 75.43%, respectively.

___

  • [1] M. Kantardzic, Data Mining: Concepts, Models, Methods and Algorithms, New York, Wiley, 2002.
  • [2] P. Dulyakarn, Y. Rangsanseri, “Fuzzy C-means clustering using spatial information with application to remote sensing”, Proceedings of the 22nd Asian Conference on Remote Sensing, 2001.
  • [3] J. Timmis, “Artificial immune systems: a novel data analysis technique inspired by the immune network theory”, PhD Dissertation, Department of Computer Science, University of Wales, U.K., 2000.
  • [4] J. Timmis, M. Neal, “A resource limited artificial immune system for data analysis”, Knowledge Based Systems, Vol. 14, pp.121–130, 2001.
  • [5] L.N. De Castro, F.J. Von Zuben, “An evolutionary immune network for data clustering”, Proceedings of the IEEE Brazilian Symposium on Artificial Neural Networks, pp. 84–89, 2000.
  • [6] P. Vishwambhar, D. Praven, M. Prabhat, “Data clustering with artificial innate immune system adding probabilistic behaviour”, International Journal of Data Mining and Emerging Technologies, Vol. 1, pp.77–84, 2011.
  • [7] A.J. Graff, A.P. Engelnrecht, “Clustering data in an uncertain environment using an artificial immune system”, Pattern Recognition Letters, Vol. 32, pp. 342–351, 2011.
  • [8] W. Ahmad, A. Narayanan, “Population based artificial immune system clustering algorithm”, Proceedings of the 10th International Conference on Artificial Immune Systems, pp. 348–360, 2011.
  • [9] C.Y. Chiu, C.H. Lin, “Cluster analysis based on artificial immune system and ant algorithm”, Proceedings of the 3rd Annual Conference on Natural Computation, Vol. 3, pp. 647–650, 2007.
  • [10] Z. Liu, X. Jin, R. Bie, X. Gao, “FAISC: a fuzzy artificial immune system clustering algorithm”, Proceedings of the 3rd Annual Conference on Natural Computation, Vol. 3, pp. 657–661, 2007.
  • [11] T. Liu, Y. Zhau, H. Zhifeng, “A new clustering algorithm based on artificial immune system”, Proceedings of the 5th International Conference on Fuzzy Systems and Knowledge Discovery, pp. 347–351, 2008.
  • [12] L. Lan, R. Qiao-Mei, “Implementation of clustering algorithm using artificial immune system”, Proceedings of the 1st International Workshop on Database Technology and Applications, pp. 275–278, 2009.
  • [13] A. Secker, M.N. Davies, A.A. Freitas, J. Timmis, E. Clark, D.R. Flower, “An artificial immune system for clustering amino acids in the context of protein function classification”, Journal of Mathematical Modelling and Algorithms, Vol. 8, pp. 103–123, 2009.
  • [14] D. Dasgupta, Artificial Immune Systems: A Bibliography. Technical Report, No: CS-07-004, USA, 2007.
  • [15] A.S. Perelson, G.F. Oster, “Theoretical studies of clonal selection: minimal antibody repertuarie size and reliability of self-nonself discrimination”, Journal of Theoretical Biology, Vol. 81, pp. 645–670, 1979.
  • [16] J.S.R. Jang, C.T. Sun, E. Mizutani, Neuro-Fuzzy and Soft Computing, Prentice Hall, USA, 1997.
  • [17] D. Li, H. Gu, L. Zhang, “A fuzzy c-means clustering algorithm based on nearest neighbour intervals for incomplete data”, Expert Systems with Applications, Vol. 37, pp. 6942–6947, 2010.
  • [18] http://www.ifs.tuwien.ac.at/dm/somtoolbox/datasets.html, Last accessed: 1 June 2011).
Turkish Journal of Electrical Engineering and Computer Science-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: Yılda 6 Sayı
  • Yayıncı: TÜBİTAK
Sayıdaki Diğer Makaleler

Sliding mode controller design with fractional order differentiation: applications for unstable time delay systems

Celaleddin YEROĞLU, Gürkan KAVURAN

Abrupt and incipient fault detection and compensation for a 4-tank system benchmark

Pedram HAJIANI, Javad POSHTAN

pRediCS: A new GO-PO-based ray launching simulator for the calculation of electromagnetic scattering and RCS from electrically large and complex structures

Caner ÖZDEMIR, Betül YILMAZ, Özkan KIRIK

New approach using structure-based modeling for the simulation of real power/frequency dynamics in deregulated power systems

Mostafa EIDIANI, Hossein ZEYNAL

A new approach of nonblind watermarking methods based on DWT and SVD via LU decomposition

Onur JANE, Ersin ELBAŞI

Class-E GaAs HBT power amplifier with passive linearization scheme for mobile wireless communications

Uthirajoo ESWARAN, Harikrishnan RAMIAH, Jeevan KANESAN, Ahmed Wasif REZA

Optimal reactive power flow solution in multiterminal AC-DC systems based on artificial bee colony algorithm

Faruk YALÇIN, Uğur ARİFOĞLU

Comparative learning global particle swarm optimization for optimal distributed generations' output

Jasrul Jamani JAMIAN, Hazlie MOKHLIS, Mohd Wazir MUSTAFA, Mohd Noor ABDULLAH, Muhammad Arif BAHARUDIN

Application of Hilbert--Huang transform and support vector machine for detection and classification of voltage sag sources

Alireza FOROUGHI, Ebrahim MOHAMMADI, Saeid ESMAEILI

Stability analysis of multimachine thermal power systems using the nature-inspired modified cuckoo search algorithm

Shivakumar RANGASAMY, Panneerselvam MANICKAM