Osman COREKCI, Ayla SAYLI

A New Clustering Algorithm of Hybrid Data According to Weights of Attributes

Separating large data into similar clusters is one of the basic problems of data mining. Storing large data in an organized way has currently increased the importance of the methods developed for clustering. Even if the hierarchical clustering methods give effective results, they are still inadequate due to their computational complexity. Non-hierarchical clustering methods cannot be used for all data types because of the cost function which cannot run by categorical data. Recently, some non-hierarchical clustering methods have been improved for categorical and hybrid data. In addition, the weights of attributes in clustering might be different due to the nature of the data or the expected results. In this paper, we introduce an algorithm which has been improved for the clustering of large hybrid data in an effective way that also includes the weights of attributes. This algorithm, mainly based on the K-Prototypes algorithm, will be called "W-K-Prototypes". The computational results show that the algorithm can be used efficiently for clustering.

PDF

___

Huang, Z. (1997a) Clustering Large Data Sets with Mixed Numeric and Categorical Values, In Proceedings of The First Pacific-Asia Conference on Knowledge Discovery and Data Mining, Singapore, World Scientific.
Huang, Z. (1997b) A fast clustering algorithm to cluster very large categorical data sets in data mining. Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Dept. of Computer Science, The University of British Columbia, Canada, pp. 1-8.
Huang, Z. (1998) "Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values", Data Mining and Knowledge Discovery, Vol. 2, No. 3, pp. 283 - 304.
Anderberg, M. R. (1973) Cluster Analysis for Applications, Academic Press.
S. Boriah, V. Chandola and V. Kumar. (2008) "Similarity Measures for Categorical Data: A Comparative Evaluation", Proceedings of the 8th SIAM International Conference on Data Mining, pp. 243 - 254.
Renato Cordeiro de Amorim. (2015) A survey on feature weighting based K-Means algorithms.
Kantardzic, M. (2003) Data Mining: Concepts, Models and Algorithms, IEEE Press and John Wiley, New York.
B. Everitt. (1974) Cluster Analysis. Heinemann Educational Books Ltd.
Tryon, Robert C. (1939). Cluster Analysis: Correlation Profile and Orthometric (factor) Analysis for the Isolation of Unities in Mind and Personality. Edwards Brothers.