M-FDBSCAN: A multicore density-based uncertain data clustering algorithm

In many data mining applications, we use a clustering algorithm on a large amount of uncertain data. In this paper, we adapt an uncertain data clustering algorithm called fast density-based spatial clustering of applications with noise (FDBSCAN) to multicore systems in order to have fast processing. The new algorithm, which we call multicore FDBSCAN (M-FDBSCAN), splits the data domain into c rectangular regions, where c is the number of cores in the system. The FDBSCAN algorithm is then applied to each rectangular region simultaneously. After the clustering operation is completed, semiclusters that occur during splitting are detected and merged to construct the final clusters. M-FDBSCAN is tested for correctness and performance. The experiments show that there is a significant performance increase due to M-FDBSCAN, which is not just due to multicore usage.

M-FDBSCAN: A multicore density-based uncertain data clustering algorithm

In many data mining applications, we use a clustering algorithm on a large amount of uncertain data. In this paper, we adapt an uncertain data clustering algorithm called fast density-based spatial clustering of applications with noise (FDBSCAN) to multicore systems in order to have fast processing. The new algorithm, which we call multicore FDBSCAN (M-FDBSCAN), splits the data domain into c rectangular regions, where c is the number of cores in the system. The FDBSCAN algorithm is then applied to each rectangular region simultaneously. After the clustering operation is completed, semiclusters that occur during splitting are detected and merged to construct the final clusters. M-FDBSCAN is tested for correctness and performance. The experiments show that there is a significant performance increase due to M-FDBSCAN, which is not just due to multicore usage.

___

  • K. Lin, L. Shih, Y. Cheng, S. Lee, “Fuzzy product line design model while considering preference uncertainty: a case study of notebook computer industry in Taiwan”, Expert Systems with Applications, Vol. 38, pp. 1789–1797, 20
  • V. Lakshmana, G. Nayagam, S. Muralikrishnan, G. Sivaraman, “Multi-criteria decision-making method based on interval-valued intuitionistic fuzzy sets”, Expert Systems with Applications, Vol. 38, pp. 1464–1467, 2011.
  • H.P. Kriegel, M. Pfeifle, “Density-based clustering of uncertain data”, Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, 2005.
  • H.P. Kriegel, M. Pfeifle, “Hierarchical density based clustering of uncertain data”, Proceedings of the 5th IEEE International Conference on Data Mining, 2005.
  • S.N. Rao, E.V. Prasad, N.B. Venkateswarlu, “A critical performance study of memory mapping on multi core processors: an experiment with k-means algorithm with large data mining data sets”, International Journal of Computer Applications, Vol. 1, 2010.
  • W. Ngai, B. Kao, C. Chui, R. Cheng, M. Chau, K.Y. Yip, “Efficient clustering of uncertain data”, Proceedings of the 6th IEEE International Conference on Data Mining, 2006.
  • S.P. Chatzis, “A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional”, Expert Systems with Applications, Vol. 38, pp. 8684–8689, 2011. H. Izakian, A. Abraham, “Fuzzy C-means and fuzzy swarm for fuzzy clustering problem”, Expert Systems with Applications, Vol. 38, pp. 1835–1838, 2011.
  • D. Li, H. Gu, L. Zhang, “A fuzzy c -means clustering algorithm based on nearest-neighbor intervals for incomplete data”, Expert Systems with Applications, Vol. 37, pp. 6942–6947, 2010.