DBSCAN,OPTICS ve K-means kümeleme algoritmalarının uygulamalı karşılaştırılması

Bu çalışmada, veri madenciliğinde güncel kümeleme algoritmalarından DBSCAN, OPTICS ile geçmişi daha eskilere dayanan K-means algoritması karşılaştırılmıştır. Karşılaştırma sentetik veritabanı üzerinde gösterdikleri küme bulma performansları değerlendirilerek yapılmıştır. Sonuçta, yakın zamanda literatüre giren DBSCAN ve OPTICS algoritmalarının K-means algoritmasından daha üstün küme oluşturma özelliklerine sahip olduğu tespit edilmiştir.

Applied comparison of DBSCAN,OPTICS and K-means clustering algorithms

DBSCAN and OPTICS are two recent clustering algorithms on data mining. In this study, these two algorithms and K-means which is one of the oldest clustering algorithms are compared. Comparison is based on cluster discovery performance on synthetic database. Consequently, two recent clustering algorithms DBSCAN and OPTICS are performed superior accuracy and cluster discovery ability over K-means algorithm.

___

  • 1.Han, J., Kamber, M., "Data Mining Concepts and Techniques", Morgan Kaufinann Publishers Inc., 2001.
  • 2.Shu-hsien Liao, "Knowledge management technologies and applications—literatür overview from 1995 to 2002", Expert Systems with Applications 25, 155-164, 2003.
  • 3.R.Grossman., C. Kamath., V.Kumar, "Data mining for scientific and engineering Approach", Kluwer Academic Publishers, Inc. (2001)
  • 4.M.S. Chen., J. Han., PS. Yu., "Data Mining An Overview from a Database Perspective", IEEE Transactions on Knowledge and Data Engineering, cilt 8, 866-883, 1996.
  • 5.Michalski, R.S., Step, R.E., "Learning from Observation Conceptual Clustering", Machine Learning An Artificial Intelligence Approach, Palo Alto, California USA, 331-363.1983.
  • 6.Fukunaga, K., "Introduction to Statistical Pattern ,Recognition", Academic press, Inc., Boston, USA, 2 edition, 1990.
  • 7.M.S. Chen., J. Han., P.S. Yu., "Data Mining An Overview from a Database Perspective", IEEE Transactions on Knowledge and Data Engineering, V 8, 866-883, 1996.
  • 8.Fayyad, U.M., Piatesky-Shapiro, G., Smyth, P., Uthurusamy, R., "Advances in data mining and Knowledge Discovery." AAAI Pres, USA, 1994.
  • 9.Berkhin, Pavel., "Survey of Clustering Data Mining Techniques", Accrue Software Inc., San Jose, California, USA, 2002.
  • 10.Ester, M., Kriegel, H. P., Sander, J., Xu, X., "A density based algorithm for discovering clusters in large spatial databases." Int. Conference of Knowledge Discovery and Data Mining (KDD'96), Portland, USA 226-231, 1996.
  • 11.Ankerst, M., Breunig, M., Kriegel, H.P., Sander, J., "OPTICS: Ordering points to identify the clustering structure." ACM SIGMOD Int. Conf. Management of Data (SIGMOD'99), Philadelphia, Pennsylvania USA, 49-60, 1999.
  • 12.Bradley, P., Fayyad, U., & Reina, C, "Scaling clustering algorithms to large databases", In proceedings of the 4th Int'l Conference on Knowledge Discovery and Data Mining, New York, NY, pp 9-15, 1998.
  • 13.McQueen, J., "Some methods for classification and analysis of multivariate observation". L. Le Cam and J. Neyman (Eds.), 5th Berkeley Symp. Math. Stat. Prob., l,pp 281-297, 1967.
  • 14.Khaled A, Sanjay R., Vineet S., "An Efficient K-Means Clustering Algorithm", IPPS: 11th International Parallel Processing Symposium, 1998.
  • 15.http://user.it.uu.se/~kostis/Teaching/DM/Assignments Erişim tarihi Şubat 2003.
  • 16.Khaled A, Sanjay R., Vineet S., "An Efficient K-Means Clustering Algorithm", IPPS: 11th International Parallel Processing Symposium, 1998.
  • 17.Kaufman, L., Rosseeauw, P.J., "Finding Groups in Data: An Introduction to Cluster Analysis." John Wiley and Sons Inc., New York, USA, 1990.
  • 18.Martin, E., Kriegel, H.P., Sander, J., Wimmer, M., Xu, X., "Incremental Clustering for Mining in a Data Warehousing Environment." , Proceedings of the 24th VLDB Conference, New York, USA, 1998.
  • 19.Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B., "The R*-tree: An Efficient and Robust Access Method for Points and Rectangles", Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, New Jersey, USA 322-331, 1990.
  • 20.Brinkhoff, T., Kriegel, H.P., Schneider, R., Seeger, B., "Efficient Multi-Step Processing of Spatial Joins", Proc. ACM SIGMOD Int. Conf. on Management of Data, Minneapolis, USA, 197-208, 1994.
  • 21.Hinneburg A., Keim D. A., "An Efficient Approach to Clustering in Large Multimedia Databases with Noise", Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining (KDD'98), New York, USA, 58-65, 1998.
  • 22.Sheikholeslami,G., Chatterjee,S., Zhang,A., "WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases." Prpc. 24th Int. Conf. on Very Large Data Bases, New York, USA, 428-439, 1998.
  • 23. Halkidi, M., Batistakis, Y., Vazirgiannis M., "On Clustering Validation Techniques", Journal of Intelligent Information Systems, 17:2/3, 107-145, Kluwer Academic Publishers, 2001.