A New Approach to Determine Eps Parameter of DBSCAN Algorithm

In recent years, data analysis has become important with increasing data volume. Clustering, which groups objects according to their similarity, has an important role in data analysis. DBSCAN is one of the most effective and popular density-based clustering algorithm and has been successfully implemented in many areas. However, it is a challenging task to determine the input parameter values of DBSCAN algorithmwhich are neighborhood radius Epsand minimum number of points MinPts. The values of these parameters significantly affectclustering performanceof the algorithm. In this study, we proposeAE-DBSCAN algorithm which includes a new method to determine the value of neighborhood radius Epsautomatically. The experimental evaluations showed that the proposed method outperformed the classicalmethod

___

[1] M. Ester, H.-P. Kriegel, and X. Xu "A density-based algorithm for discovering clusters in large spatial databases with noise," in Proc. KDD, Oregon, USA, 1996, pp. 226-231.

[2] X. P. Yu, D. Zhou, and Y. Zhou, “A New Clustering Algorithm Based on Distance and Density,” in Proc. ICSSSM, Chongquing, China, 2005, pp. 1016-1021

[3] S. K. Popat and M. Emmanuel, "Review and Comparative Study of Clustering Techniques," Int. J. of Computer Science and Information Technologies, vol. 5, no.1, pp. 805–12, 2014.

[4] P. Liu, D. Zhou, and N. J. Wu,“VDBSCAN: Varied density based spatial clustering of applications with noise,” in Proc. ICSSSM, Chengdu, China, 2007, pp 1-4.

[5] K. Khan, S. U. Rehman, K. Aziz, S. Fong, and S. Sarasvady, "DBSCAN: Past, present and future." in Proc. ICADIWT, Bangalore, India, 2014, pp. 232-238.

[6] A. Ram, S. Jalal, A. S. Jalal, and M. Kumar "A density based algorithm for discovering density varied clusters in large spatial databases," Int. J. of Computer App., vol. 3, no. 6, pp. 1-4, 2010.

[7] A.K. Jain, M.N. Murty, and P.J. Flynn, "Data Clustering: A Review," ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999.

[8] D. Birant and A. Kut, “ST-DBSCAN: An algorithm for clustering spatial-temporal data,” Data & Knowledge Engineering, vol. 60, no. 1, pp. 208–221, 2007.

[9] M. Celik, F. Dadaser-Celik, and A. Dokuz, “Anomaly detection in temperature data using dbscan algorithm,” in Proc. INISTA, Istanbul, Turkey, 2011, pp. 91–95.

[10] P. N. Tan, M. Steinbach, and V. Kumar, "Introduction to Data Mining," Boston Addison-Wesley, April 2005.

[11] G. Sheikholeslami, S. Chatterjee, and A. Zhang, "Wave Cluster: A multi-resolution clustering approach for very large spatial databases," in Proc. VLDB, San Francisco, CA, 1998, pp.428-439.

[12] G. Sudipto, R. Rastogi, and K. Shim, "CURE: An efficient clustering algorithm for large Databases," in Proc. ACM SIGMOD, Seattle, WA, 1998, pp.73-84.

[13] T. Zhang, R. Ramakrishnan, and M. Livny, “BIRCH: An efficient data clustering method for very large databases,” in Proc. ACM SIGMOD, 1996, pp. 103–114.

[14] W. Wang, J. Yang, and R. R. Muntz, “STING: A statistical information grid approach to spatial data mining,” in Proc VLDB, San Francisco, CA, USA, 1997, pp. 186–195.

[15] M. Halkidi, Y. Batistakis, and M. Varzirgiannis, “On clustering validation techniques,” J. of Intelligent Information Systems, vol. 17, no. 2-3, pp. 107–145, 2001.

[16] Karypis, G., Han, E.H., and Kumar, V.: “Chameleon: A Hierarchical Clustering Algorithm Using Dynamic Modeling,” IEEE Computer, vol. 32, no. 8, pp 68-75, August 1999.

[17] X. Chen, W. Liu, H. Qui and J. Lai, "APSCAN: A parameter free algorithm for clustering", Pattern Recognition Letters, vol. 32, pp. 973-986, 2011.

[18] H. Zhou, P. Wang, and H. Li, "Research on adaptive parameters determination in DBSCAN algorithm," J. of Information & Computational Science, vol. 9, no. 7, pp. 1967-1973, 2012.

[19] X. Xu, M. Ester, H.-P. Kriegel, and J. Sander "A distribution-based clustering algorithm for mining in large spatial databases", in Proc. ICDE, Orlando, USA, 1998.

[20] J. Hou, H. Gao, and X. Li, "DSets-DBSCAN: a parameter-free clustering algorithm", IEEE Transaction on Image Processing, vol.25, no. 7, pp. 3182-3193, 2016.

[21] N. Soni and A. Ganatra, " AGED (Automatic Generation of Eps for DBSCAN, Int. J. of Computer Science and Information Security (IJSIS),, vol. 14, no. 5, pp. 536-559, 2016.

[22] W.-T. Wang, Y.-L. Wu, C.-Y. Tang, and M.-K. Hor, "Adaptive density-based spatial clustering of applications with noise (DBSCAN) according to data ", in Proc. ICMLC, Guangzhou, 2015, pp. 445-451.

[23] B. J. Lakshmi, K. B. Madhuri, and M. Shashi, "An efficient algorithm for density based subspace clustering with dynamic parameter setting", Int. J. of Information Technology and Computer Science (IJITCS), vol. 6 , 2017, pp. 27-33.

[24] M. Daszykowski, B. Walczak, and D. L. Massart, "Looking for Natural Patterns in Data. Part 1: Density Based Approach", Chemometrics and Intelligent Laboratory Systems, vol. 56, no. 2, pp. 83-92, 2001.

[25] Clustering datasets, Available: http://cs.uef.fi/sipu/datasets/. Accessed on: April 23, 2017.