A clustering approach using a combination of gravitational search algorithm and k-harmonic means and its application in text document clustering

A clustering approach using a combination of gravitational search algorithm and k-harmonic means and its application in text document clustering

Data clustering is one of the most popular techniques of information management, which is used in many applications of science and engineering such as machine learning, pattern reorganization, image processing, data mining, and web mining. Different algorithms have been suggested by researchers, where the evolutionary algorithms are the best in data clustering and especially in big datasets. It is illustrated that GSA-KM, which is a combination of the gravitational search algorithm (GSA) and K-means (KM), is superior over some other comparative evolutionary methods. One of the drawbacks of this approach is dependency on the initial seeds. In this paper, a combination method of GSA and K- harmonic means, called GSA-KHM, has been proposed, in which the dependency on the initialization has been improved. The proposed GSA-KHM method has been applied to data clustering. As a special application, it has also been used on the text document clustering application. The simulation results show that the proposed method works better than the GSA-KM and other comparative methods in both data clustering and text document clustering applications.

___

  • [1] Jain AK. Data clustering: 50 years beyond K-means. J Pattern Recogn 2010; 31: 651-666.
  • [2] Zhi XB, Fan JI. Some notes on k-harmonic means clustering algorithm. Advances in Intelligent and Soft Computing 2010; 82: 375-384.
  • [3] Maulik U, Bandyopadhyay S. Genetic algorithm-based clustering technique. J Pattern Recogn 2000; 33: 1455-1465.
  • [4] Krishna K, Narasimha M. Genetic k-means algorithm. IEEE T Syst Man Cy B 1999; 29: 433-439.
  • [5] Shelokar PS, Jayaraman VK, Kulkarni BD. An ant colony approach for clustering. Anal Chim Acta 2004; 509: 187-195.
  • [6] Fathian M, Amiri B, Maroosi A. Application of honey-bee mating optimization algorithm on clustering. J Appl Math Comput 2007; 190: 1502-1513.
  • [7] Chen CY, Fun Y. Particle swarm optimization algorithm and its application to clustering analysis. In: IEEE 2004 Networking, Sensing and Control International Conference; 21{23 March 2004; Taipei, Taiwan. New York, NY, USA: IEEE. pp. 789-794.
  • [8] Jin P, Zhu YL, Hu KY. A clustering algorithm for data mining based on swarm intelligence. In: IEEE 2007 Machine Learning and Cybernetics International Conference; 19{22 August 2007; Hong Kong. New York, NY, USA: IEEE. pp. 803-807.
  • [9] Selim SZ, Alsultan K. Simulated annealing algorithm for the clustering problem. Pattern Recogn 1991; 24: 1003- 1008.
  • [10] Hatamlou A, Abdullah S, Nezamabadi-pour H. Application of gravitational search algorithm on data clustering. Lect Notes Comp Sci 2011; 6954: 3379-3466.
  • [11] Hatamlou A, Abdullah S, Othman Z. Gravitational search algorithm with heuristic search for clustering problems. In: IEEE 2011 Data Mining and Optimization Conference; 28{29 June 2011; Putrajaya, Malaysia. New York, NY, USA: IEEE. pp. 190-193.
  • [12] Elhamifar E, Vidal R. Sparse subspace clustering: algorithm, theory, and applications. IEEE T Pattern Anal 2013; 35: 2765-2781.
  • [13] Lee SJ, Jiang JY. Multilabel text categorization based on fuzzy relevance clustering. IEEE T Fuzzy Syst 2014; 22: 1457-1471.
  • [14] Cheng MY, Prayogo D. Symbiotic organisms search: a new metaheuristic optimization algorithm. Comput Struct 2014; 139: 98-112.
  • [15] Gungor Z,  Unler A. K -Harmonic means data clustering with tabu-search method. Appl Math Model 2008; 32: 141-151.
  • [16] Yin M, Hu Y, Yang F, Li X, Gu W. A novel hybrid k-harmonic means and gravitational search algorithm approach for clustering. Expert Syst Appl 2011; 38: 9319-9324.
  • [17] Gungor Z, Unler A. A combined approach for clustering based on K-means and gravitational search algorithms. Swarm and Evolutionary Computation 2012; 6: 47-52.
  • [18] Blake CL, Merz CJ. UCI Repository of Machine Learning Databases. Irvine, CA, USA: University of California- Irvine, 2013.
  • [19] MacQueen JB. Some methods for classi cation and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; 21 June{18 July 1967; Berkeley, CA, USA. Berkeley, CA, USA: University of California Press. pp. 281-297.
  • [20] Rashedi E, Nezamabadi-pour H, Saryazdi S. GSA: A gravitational search algorithm. Inform Sciences 2009; 179: 2232-2248.
  • [21] Kahraman HT. Meta-heuristic linear modeling technique for estimating excitation current of synchronous motor. Turk J Elec Eng & Comp Sci 2014; 22: 1637-1652.
  • [22] Huang A. Similarity measures for text document clustering. In: Proceedings of the Sixth New Zealand Computer Science Research Student Conference; 14{18 April 2008; Hamilton, New Zealand. pp. 49-56.
  • [23] Porter MF. An algorithm for suffix stripping. Program 1980; 14: 130-137.
  • [24] Karol S, Mangat V. Evaluation of a text document clustering approach based on particle swarm optimization. International Journal of Computer Science & Network Security 2013; 13: 69-90.
  • [25] Berry MW, Browne M. Lecture Notes in Data Mining. Singapore: World Scienti c, 2006.
  • [26] Lin YS, Jiang JY, Lee SJ. A similarity measure for text classi cation and clustering. IEEE T Knowl Data En 2014; 26: 1575-1590.
  • [27] Mirhosseini M, Mashinchi M, Nezamabadi-pour H. Improving n-similarity problem by genetic algorithm and its application in text document resemblance. Fuzzy Information and Engineering 2014; 6: 263-278.
Turkish Journal of Electrical Engineering and Computer Sciences-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: Yılda 6 Sayı
  • Yayıncı: TÜBİTAK