Görsel Kümelenme Eğilimi Değerlendirmesi ve R’de Uygulaması

Bölümleyici kümeleme algoritmalarında girdi olarak küme sayısı parametresi kullanılmakta ve kümelemenin başarısı büyük ölçüde analiz öncesi seçilen bu değere bağlı olmaktadır. Optimal küme sayısını bulmak için analiz sonrasında küme geçerliliği kontrolleri yapılsa da hem hesaplama zamanı maliyeti hem de kullanılan ölçütlerin veri yapılarına duyarlılığından etkilenmektedir. Bu nedenle küme sayısını analiz öncesi tahmin eden yöntemlere ihtiyaç duyulmaktadır. Görsel kümelenme eğilimi değerlendirmesi (GÖKED), küme sayısını bulmak için kullanılan öncü algoritmalardan biridir. Bu çalışmada görsel kümelenme eğilimi algoritmasının tanıtımı yapılarak R ortamında geliştirilen bir GÖKED fonksiyonu ile test edilmektedir

A Programmatic Implementation of the Visual Assessment of Cluster Tendency Algorithm in R

In cluster analysis, the partitioning algorithms require a priori estimate of number of clusters (c) as an input parameter, and thus the success of partitioning depends mostly on this parameter. In order to find an optimal c, the obtained results are checked by the various cluster validity indices at the end of each run of successive cluster analyses. Cluster validation is time consuming, and also depends on the clustering indices which may not guarantee the quality of clustering since their performances vary with complexity in data structures. In order to find an optimal number of clusters in data sets, one may benefit from the pre-analysis approaches before going to clustering. The visual assessment of clustering tendency (VAT) is a frontier algorithm which produces a grey-level image of reordered distance matrix showing existing clusters with dark blocks. This paper aims to introduce VAT algorithm and demonstrate it with a user-defined function developed in R statistical computing environment

___

  • Bezdek, J.C. and R.J. Hathaway (2002). VAT: A tool for Visual Assessment of (Cluster) Tendency. Proc. of IEEE Int. Joint Conference on Neural Networks (IJCNN 02), 12-17 May 2002, vol. 21, pp. 22252230.
  • Bezdek, J.C., Hathaway, R.J. and J. M. Huband (2007). Visual Assessment of Fuzzy Clustering Tendency for Rectangular Dissimilarity Matrices., IEEE Transactions on Fuzzy Systems, 15(5): 890-903.
  • Ferraro, M.B. and P. Giordani (2015). A toolbox for fuzzy clustering using the R programming language. Fuzzy Sets and Systems. http://dx.doi.org/10.1016/j.fss. 2015.05.001).
  • Hahsler, M., Hornik, K. and C. Buchta (2008). Getting things in order: An introduction to the R package seriation. J Statistical Software, 25(3): 1-34.
  • Hathaway, R.J., Bezdek, J.C. and J. M. Huband (2006). Scalable Visual Assessment of Cluster Tendency. Pattern Recognition, 39(6):1315-1324.
  • Havens, T.C., Bezdek, J.C., Keller, J.M. and M. Popescu (2009). Clustering in Ordered Dissimilarity Data. Int. J. of Intelligent Systems, 24, 504–528.
  • Havens, T.C. and Bezdek, J.C. (2012). "An Efficient Formulation of the Improved Visual Assessment of Cluster Tendency (iVAT) Algorithm", IEEE Transactions on Knowledge & Data Engineering, 24 (5) 5:813-822.
  • Hu, Y. (2012). VATdt: Visual Assessment of Cluster Tendency Using Diagonal Tracing, American J of Computational Mathematics, 2: 27-41.
  • Hu, Y, R. and J. Hathaway (2008). An Algorithm for Clustering Tendency Assessment, WSEAS Transactions on Mathematics, 7(7): 441-450.
  • Huband, J.M., Bezdek, J.C and R.J. Hathaway (2005). BigVAT: Visual Assessment of Cluster Tendency for Large Data Set. Pattern Recognition, 38(11):1875-1886.
  • Krishnamoorthi (2011). Automatic Evaluation of Cluster in Unlabeled Datasets. Proc. of Int.Conf. on Information and Network Technology. IACSIT Press, Singapore. pp 120-124.
  • Malarvizhi, M and S. Jayanthi (2013). Visualization of Clusters Using scoiVat Algorithm. International Journal of Educational Science and Research, 3(3): 35-40.
  • Pakhira, M.K (2012). Finding Number of Clusters before Finding Clusters. Procedia Technology, 4: 27-37.
  • Prabhu, P. and K. Duraiswamy (2012). Enhanced VAT for Cluster Quality Assessment in Unlabeled Datasets. J. of Circuits, Systems and Computers, 21(1): 1-19.
  • Prabhu, P., K. Duraiswamy (2013). An Efficient Visual Analysis Method for Cluster Tendency Evaluation, Data Partitioning and Internal Cluster Validation . Computing and Informatics, 32: 1013-1037.
  • Prim, R. (1957). Shortest connection networks and some generalisations. Bell System Tech J, 36:1389–1401.
  • R Core Team (2015). R: A language and environment for statistical computing R Foundation for Statistical Computing, Vienna, Austria.