Birleştirilmiş Kümeleme ve Diskriminant Analizi Kullanarak Gauss Karma Modellerde Grup Homojenliğinin Değerlendirilmesi

Kümeleme analizi hem gözetimsiz öğrenme olarak veri madenciliğinde hem de veri seti altında yatan doğal grupları ortaya çıkaran çok değişkenli istatistiksel bir yöntem olarak istatistikte yaygın olarak kullanılmaktadır. Ancak, ikili örtüşmeden ötürü doğal bir heterojenlik temsili ortaya çıkaran sonlu karma modellerine ilişkin homojen grup sayısını belirlemek zor bir işlemdir. Bu çalışmada, grup homojenliği açısından sonlu karma modellerden biri olan Gauss karma modelleri ele alınmıştır. Bu amaçla, Gauss karma bileşenlerin doğru sınıflama oranlarını değerlendirmek ve homojen grupları elde etmede bileşenlerin daha fazla bölünmesinin gerekli olup olmadığına karar vermek için birleştirilmiş kümeleme ve lineer diskriminant analizi ile birleştirilmiş kümeleme ve karesel diskriminant analizi karşılaştırılmıştır. 

Evaluation of Group Homogeneity in Gaussian Mixture Models Using Combined Cluster and Discriminant Analysis

Cluster analysis has been widely used in both data mining as unsupervised learning method and in statistics as multivariate statistical method which reveals natural groups underlying data set. However, determining the number of homogeneous groups regarding with finite mixture models which provides a natural representation of heterogeneity due to pairwise overlap is a difficult process. In this study, Gaussian mixture components which is one of finite mixture models are considered in terms of group homogeneity. For this purpose,  combined cluster and linear discriminant analysis is compared with combined cluster and quadratic discriminant anlysis in order to evaluate correctly classification rates of the Gaussian mixture components and to determine whether further division of components is nessessary to obtain homogeneous groups. The comparison has been carried out by using a simulation study for 81 different scenarios and an illisturative example is presented.

___

  • [1] B. Everitt, S. Landau, M. Leese, 2001. Cluster Analysis, Arnold.
  • [2] B. Everitt, S. Landau, M. Leese and D. Stahl, , 2011. Cluster Analysis Wiley.
  • [3] J. P. Baudry, A. E. Raftery, G. Celeux, K. Lo, R. Gottardo, 2008. Combining Mixture Components for Clustering, Technical Report 540, University of Washington, Seattle.
  • [4] D. L. Davies and D.W. Bouldin, 1979. A cluster seperation measure, IEEE Trans. Pattern Anal. Machine Intell., 1 (4), pp. 224-225.
  • [5] E. B. Fowlkes and C. L. Mallows, 553-569. A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78 (383) (1983), pp.
  • [6] G. Chen, S. A. Jaradat and N. Banarjee, 2002. Evaluation and comparision of clustering algorithms in analyzing Es Cell gene expression data, Statistica Sinica, 12, pp. 241-262.
  • [7] G. McLachlan, 2004. Discriminant analysis and statistical pattern recognition, John Wiley & Sons.
  • [8] J. H. Ward, 1963. Hierarchical grouping to optimize an objective function, Journal of American Statistical Association, 58, pp. 236-244.
  • [9] J. Dunn, 1974. Well seperated clusters and optimal fuzzy partitions, Journal of Cybernetics, 4, pp. 95-104.
  • [10] J. Kovacs, S. Kovacs, N. Magyar, P. Tanos and I. G. Hatvani, 2014. Classification into homogeneous groups using combined cluster and discriminant analysis, Environmental Modelling & Software, 57, pp. 52-59.
  • [11] J. Hartigan, 1975. Clustering Algorithms. John Wiley & Sons, Inc.
  • [12] J. P. Baudry, A. E. Raftery, G. Celeux, K. Lo, R. Gottardo, 2010. Combining Mixture Components for Clustering, Journal of Computational and Graphical Statistics, 19 (2), pp. 332-353.
  • [13] L. Hubert, 1974. Approximate evaluation techniques for the single-link and complete-link hierarchical clustering procedures, Journal of American Statistical Association, 69, pp. 698-704.
  • [14] L. Hubert and P. Arabie, 1985. Comparing Partitions, Journal of Classification 2(1), pp.193-218.
  • [15] M. M. R. El-Hanjouri and B.S. Hamad, 2015. Using cluster analysis and discriminant analysis methods in classification with application on standart of living family in Palestinian Areas, International Journal of Statistics, 5 (5), pp. 213-222.
  • [16] O. A. Abbas, 2008. Comparision between data clustering algorithms, The International
  • Arab Journal of Information Technology, 5 (3), pp. 320-325.
  • [17] P. Jaccard, 1908. Nouvelles recherches sur la distribution florale, Bulletin dela Societe Vaudoise des Sciences Naturelles, 44 (163), pp. 223-270.
  • [18] P. J. Rousseeuw, 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, pp. 53-65.
  • [19] R. B. Calinski and J. Harabasz, 1974. A dendrite method for cluster analysis, Communications in Statistics, 3, pp. 1-27.
  • [21] R. A. Fisher, 1936. The use of multiple measurements in taxonomic problems.
  • Annals of Eungenics, 7, pp. 179-188.
  • [22] R. A. Johnson and D. W. Wichern, 1998. Applied Multivariate Analysis, Prentice Hall, Englewood Cliffs, New Jersey.
  • [23] V. Melnykov, 2014. Merging mixture components for clustering through pairwise overlap, Journal of Computational and Graphical Statistics, Forthcoming.
  • [24] V. Melnykov, W. C. Chen and R. Maitra, 2012. MixSim: An R package for simulating data to study performance of clustering algorithms, Journal of Statistical Software, 51 (12), pp.