K-means algorithm with a novel distance measure

In this paper, we describe an essential problem in data clustering and present some solutions for it. We investigated using distance measures other than Euclidean type for improving the performance of clustering. We also developed an improved point symmetry-based distance measure and proved its efficiency. We developed a k-means algorithm with a novel distance measure that improves the performance of the classical k-means algorithm. The proposed algorithm does not have the worst-case bound on running time that exists in many similar algorithms in the literature. Experimental results shown in this paper demonstrate the effectiveness of the proposed algorithm. We compared the proposed algorithm with the classical k-means algorithm. We presented the proposed algorithm and their performance results in detail along with avenues of future research.

K-means algorithm with a novel distance measure

In this paper, we describe an essential problem in data clustering and present some solutions for it. We investigated using distance measures other than Euclidean type for improving the performance of clustering. We also developed an improved point symmetry-based distance measure and proved its efficiency. We developed a k-means algorithm with a novel distance measure that improves the performance of the classical k-means algorithm. The proposed algorithm does not have the worst-case bound on running time that exists in many similar algorithms in the literature. Experimental results shown in this paper demonstrate the effectiveness of the proposed algorithm. We compared the proposed algorithm with the classical k-means algorithm. We presented the proposed algorithm and their performance results in detail along with avenues of future research.

___

  • G. Gan, C. Ma, J. Wu, Data Clustering: Theory, Algorithms, and Applications (ASA-SIAM Series on Statistics and Applied Probability), SIAM, Philadelphia, ASA, Alexandria, 2007.
  • S. Theodoridis, K. Koutroumbas, Pattern Recognition, Elsevier Academic Press, Amsterdam, 2003.
  • K.K. G¨ undo˘ gan, B. Alatas, A. Karcı, “Mining classification rules by using genetic algorithms with non-random initial population and uniform operator”, Turkish Journal of Electrical Engineering & Computer Sciences, Vol. 12, pp. 43–52, 2004.
  • T. Zhang, R. Ramakrishnan, M. Linvy, “BIRCH: an efficient method for very large databases”, Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, 1996.
  • G. Hamerly, C. Elkan, “Alternatives to the k-means algorithm that find better clusterings”, Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 600–607, 2002.
  • P.S. Bradley, U.M. Fayyad, “Refining initial points for k-means clustering”, ICML, Vol. 98, pp. 91–99, 1998. J. MacQueen, “Some methods for classification and analysis of multivariate observations”, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, pp. 281–297, 1967.
  • A.K. Jain, M.N. Murty, P.J. Flynn, “Data clustering: a review”, ACM Computing Surveys, Vol. 31, pp. 264–323, 19 S.J. Redmond, C. Heneghan, “A method for initialising the K-means clustering algorithm using k d-trees”, Pattern Recognition Letters, Vol. 28, pp. 965–973, 2007.
  • K. Mumtaz, K. Duraiswamy, “A novel density based improved k-means clustering algorithm – Dbkmeans”, International Journal on Computer Science and Engineering, Vol. 2, pp. 213–218, 2010.
  • M. Ester, H.P, Kriegel, J. Sander, X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise”, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Vol. 96, pp. 226–231, 1996.
  • F. Attneave, “Symmetry information, and memory for patterns”, The American Journal of Psychology, Vol. 68, pp. 209–222, 1995.
  • M.C. Su, C.H. Chou, “A modified version of the k-means algorithm with a distance based on cluster symmetry”, Pattern Analysis and Machine Intelligence, Vol. 23, pp. 674–680, 2001.
  • S. Bandyopadhyay, S. Saha, “GAPS: a clustering method using a new point symmetry-based distance measure”, Pattern Recognition, Vol. 40, pp. 3430–3451, 2007.
  • S. Bandyopadhyay, S. Saha, “A point symmetry-based clustering technique for automatic evolution of clusters”, Knowledge and Data Engineering, Vol. 20, pp. 1–17, 2008.
  • C.H. Chou, M.C. Su, E. Lai, “Symmetry as a new measure for cluster validity”, Second WSEAS International Conference on Scientific Computation and Soft Computing, pp. 209–213, 2002.
  • A.W. Moore, “An introductory tutorial on k d-trees”, PhD, University of Cambridge, 1991.
  • C.L. Blake, E. Keough, C.J. Merz, “UCI repository of machine learning database”, http://www.ics.uci.edu/ ∼mlearn/MLrepository.html, 1998.
  • R. Bouckaert, E. Frank, M. Hall, R. Kirkby, P. Reutemann, A. Seewald, D. Scuse, “WEKA manual for version 3-6-2”, University of Waikato, Hamilton, New Zealand, http://www.cs.waikato.ac.nz/ ∼ml/weka, 2010.
Turkish Journal of Electrical Engineering and Computer Science-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: Yılda 6 Sayı
  • Yayıncı: TÜBİTAK