Akan Veri Kümeleme Teknikleri Üzerine Bir Derleme
Günümüz teknolojisinin gelişmesine paralel olarak bilgisayar ortamına aktarılmış olan veri miktarı inanılmaz boyutlara ulaşmış ve güngeçtikçe de artmaktadır. Bu nedenle veriyi işleme yöntemleri de değişmektedir. Klasik kümeleme yaklaşımlarında veri statiktir. Oysagünümüz teknolojisinde, verinin çok hızlı olduğu dünyada artık veriyi akarken kümeleyecek, kullanıcıya istediği zaman sonuçverebilecek uygulamalara ihtiyaç vardır. Bu anlamda ihtiyacı karşılayan akan veri kümeleme yaklaşımlarına olan talep gün geçtikçeartmaktadır. Çünkü akan veri kümeleme yaklaşımları bir defa okumalı, hızlı ve kendisini yeni gelen veriye uyarlama özelliğine sahiptir.Yani veri bir yandan akarken bir yandan kullanıcıya sonuç üretilebilmektedir. Bu çalışmada akan veri kümeleme alanında yapılançalışmalar derlenmekte ve bu alana ilgi duyan araştırmacılara ışık tutulmaktır.
A Survey on Data Stream Clustering Techniques
In parallel with the development of today's technology, the amount of data that has been transferred to the computer environment has reached incredible dimensions and is increasing day by day. For this reason, the methods of data processing are also changing. In classical data clustering approaches, data is static. However, in today's technology in which data streams very fast, there is a need for applications that can cluster data and show results while the data is streaming whenever the user wants. In this sense, the demand for data stream clustering approaches is increasing day by day. Because, the data stream clustering approaches read once, fast, and have the ability to adapt themselves to new data. In other words, the results are shown to the user on the one hand, while the data is streaming on the other hand. In this study, the proposed studies on the data stream clustering area are collected and the researchers who are interested in this field are enlighten.
___
- Ankleshwaria, T.B. and J.S. Dhobi, Mining Data Streams: A
Survey. International Journal of Advance Research in
Computer Science and Management Studies, 2014. 2(2): p.
379-386.
- Ikonomovska, E., S. Loskovska, and D. Gjorgjevik, A survey
of stream data mining, in Eighth International Conference
with International Participation – ETAI 2007. 2007: Ohrid,
Republic ofMacedonia.
- Aggarwal, C.C., Data Streams: Models and Algorithms. 1 ed.
Advances in Database Systems. 2007: Springer US.
- Bifet, A. and R. Kirkby, Data stream mining a practical
approach. 2009.
- Yogita and D. Toshniwal. Clustering techniques for streaming
data-a survey. in 2013 3rd IEEE International Advance
Computing Conference (IACC). 2013.
- Antonellis, P., C. Makris, and N. Tsirakis, Algorithms for
clustering clickstream data. Information Processing Letters,
2009. 109(8): p. 381-385.
- Yin, C., L. Xia, and J. Wang. Application of an Improved Data
Stream Clustering Algorithm in Intrusion Detection System. in
Advanced Multimedia and Ubiquitous Engineering. 2017.
Singapore: Springer Singapore.
- Yin, C., L. Xia, and J. Wang. Data Stream Clustering
Algorithm Based on Bucket Density for Intrusion Detection. in
Advances in Computer Science and Ubiquitous Computing.
2018. Singapore: Springer Singapore.
- Li, Z.Q., A New Data Stream Clustering Approach about
Intrusion Detection. Advanced Materials Research, 2014.
926-930: p. 2898-2901.
- Weiler, A., M. Grossniklaus, and M.H. Scholl, Situation
monitoring of urban areas using social media data streams.
Information Systems, 2016. 57: p. 129-141.
- Hawwash, B., Stream-dashboard : a big data stream
clustering framework with applications to social
mediastreams, in Department of Computer Engineering and
Computer Science. 2013, University of Louisville.
- Barddal, J.P., et al., SNCStream: a social network-based data
stream clustering algorithm, in Proceedings of the 30th
Annual ACM Symposium on Applied Computing. 2015, ACM:
Salamanca, Spain. p. 935-940.
- Hendricks, D., Using real-time cluster configurations of
streaming asynchronous features as online state descriptors
in financial markets. Pattern Recognition Letters, 2017. 97: p.
21-28.
- Aggarwal, C.C., Data Streams: An Overview and Scientific
Applications, in Scientific Data Mining and Knowledge
Discovery: Principles and Foundations, M.M. Gaber, Editor.
2010, Springer Berlin Heidelberg: Berlin, Heidelberg. p. 377-
397.
- King, R.C., et al., Application of data fusion techniques and
technologies for wearable health monitoring. Medical
Engineering & Physics, 2017. 42: p. 1-12.
- Gravina, R., et al., Multi-sensor fusion in body sensor
networks: State-of-the-art and research challenges.
Information Fusion, 2017. 35: p. 68-80.
- Manzi, A., P. Dario, and F. Cavallo, A Human Activity
Recognition System Based on Dynamic Clustering of Skeleton
Data. Sensors (Basel, Switzerland), 2017. 17(5): p. 1100.
- Tasnim, S., et al. Semantic-Aware Clustering-based Approach
of Trajectory Data Stream Mining. in 2018 International
Conference on Computing, Networking and Communications
(ICNC). 2018.
- Diaz-Rozo, J., C. Bielza, and P. Larrañaga, Clustering of Data
Streams with Dynamic Gaussian Mixture Models. An IoT
Application in Industrial Processes. IEEE Internet of Things
Journal, 2018: p. 1-1.
- Sabit, H., A. Al-Anbuky, and H. Gholam-Hosseini.
Distributed WSN Data Stream Mining Based on Fuzzy
Clustering. in 2009 Symposia and Workshops on Ubiquitous,
Autonomic and Trusted Computing. 2009.
- Silva, A.d., et al., A clustering approach for sampling data
streams in sensor networks. Knowl. Inf. Syst., 2012. 32(1): p.
1-23.
- Silva, J.A., et al., Data stream clustering: A survey. ACM
Comput. Surv., 2013. 46(1): p. 1-31.
- Datar, M., et al., Maintaining stream statistics over sliding
windows: (extended abstract), in Proceedings of the thirteenth
annual ACM-SIAM symposium on Discrete algorithms. 2002,
Society for Industrial and Applied Mathematics: San
Francisco, California. p. 635-644.
- Aggarwal, C.C., et al., A framework for clustering evolving
data streams, in Proceedings of the 29th international
conference on Very large data bases - Volume 29. 2003,
VLDB Endowment: Berlin, Germany. p. 81-92.
- Keim, D.A. and M. Heczko. Wavelets and their Applications
in Databases. in 17th International Conference on Data
Engineering (ICDE'01), Heidelberg, Germany, 2001. 2001.
- Rousseeuw, P.J., Silhouettes: A graphical aid to the
interpretation and validation of cluster analysis. Journal of
Computational and Applied Mathematics, 1987. 20: p. 53-65.
- Brun, M., et al., Model-based evaluation of clustering
validation measures. Pattern Recognition, 2007. 40(3): p. 807-
824.
- Rand, W.M., Objective Criteria for the Evaluation of
Clustering Methods. Journal of the American Statistical
Association, 1971. 66(336): p. 846-850.
- Hubert, L. and P. Arabie, Comparing partitions. Journal of
Classification, 1985. 2(1): p. 193-218.
- Jaccard, P., Distribution de la flore alpine dans le bassin des
Dranses et dans quelques régions voisines. Bulletin de la
Société Vaudoise des Sciences Naturelles, 1901. 37: p. 241-
272.
- Caliński, T. and J. Harabasz, A dendrite method for cluster
analysis. Communications in Statistics, 1974. 3(1): p. 1-27.
- Maulik, U. and S. Bandyopadhyay, Performance evaluation
of some clustering algorithms and validity indices. IEEE
Transactions on Pattern Analysis and Machine Intelligence,
2002. 24(12): p. 1650-1654.
- Dunn†, J.C., Well-Separated Clusters and Optimal Fuzzy
Partitions. Journal of Cybernetics, 1974. 4(1): p. 95-104.
- Davies, D.L. and D.W. Bouldin, A Cluster Separation
Measure. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 1979. PAMI-1(2): p. 224-227.
- Wallace, D.L., A Method for Comparing Two Hierarchical
Clusterings: Comment. Journal of the American Statistical
Association, 1983. 78(383): p. 569-576.
- Raftery, A.E., A Note on Bayesian Factors for Log-Linear
Contingency Table Models with Vague Prior Information.
Journal of the Royal Statistical Society, Series B, 1986. 48(B):
p. 249-250.
- Strehl, A. and J. Ghosh, Cluster ensembles --- a knowledge
reuse framework for combining multiple partitions. J. Mach.
Learn. Res., 2003. 3: p. 583-617.
- Shannon, C.E., A mathematical theory of communication.
SIGMOBILE Mob. Comput. Commun. Rev., 2001. 5(1): p. 3-
55.
- Amini, A., T.Y. Wah, and H. Saboohi, On Density-Based
Data Streams Clustering Algorithms: A Survey. Journal of
Computer Science and Technology, 2014. 29(1): p. 116-141.
- O'Callaghan, L., et al. Streaming-data algorithms for highquality
clustering. in Proceedings 1st International
Conference on Data Engineering. 2002. San Jose, CA, USA,
USA: IEEE.
- Zhang, T., R. Ramakrishnan, and M. Livny, BIRCH: an
efficient data clustering method for very large databases.
SIGMOD Rec., 1996. 25(2): p. 103-114.
- Karypis, G., E.-H. Han, and V. Kumar, Chameleon:
Hierarchical Clustering Using Dynamic Modeling. Computer,
1999. 32(8): p. 68-75.
- Kranen, P., et al., The ClusTree: indexing micro-clusters for
anytime stream mining. Knowledge and Information Systems,
2011. 29(2): p. 249-272.
- Wang, W., J. Yang, and R.R. Muntz, STING: A Statistical
Information Grid Approach to Spatial Data Mining, in
Proceedings of the 23rd International Conference on Very
Large Data Bases. 1997, Morgan Kaufmann Publishers Inc.
p. 186-195.
- Sheikholeslami, G., S. Chatterjee, and A. Zhang,
WaveCluster: a wavelet-based clustering approach for spatial
data in very large databases. The VLDB Journal, 2000. 8(3):
p. 289-304.
- Agrawal, R., et al., Automatic subspace clustering of high
dimensional data for data mining applications. SIGMOD
Rec., 1998. 27(2): p. 94-105.
- Tu, L. and Y. Chen, Stream data clustering based on grid
density and attraction. ACM Trans. Knowl. Discov. Data,
2009. 3(3): p. 1-27.
- Wan, L., et al., Density-based clustering of data streams at
multiple resolutions. ACM Trans. Knowl. Discov. Data, 2009.
3(3): p. 1-28.
- Dempster, A., N.M. Laird, and D.B. Rubin, Maximum
Likelihood from Incomplete Data via the EM Algorithm, in
Paper presented at the Royal Statistical Society at a meeting
organized by the Research Section. 1976.
- Dang, X.H., et al. An EM-Based Algorithm for Clustering
Data Streams in Sliding Windows. 2009. Berlin, Heidelberg:
Springer Berlin Heidelberg.
- Ester, M., et al., A density-based algorithm for discovering
clusters in large spatial databases with noise, in Proceedings
of the Second International Conference on Knowledge
Discovery and Data Mining. 1996, AAAI Press: Portland,
Oregon. p. 226-231.
- Ankerst, M., et al., OPTICS: ordering points to identify the
clustering structure. SIGMOD Rec., 1999. 28(2): p. 49-60.
- Hinneburg, A. and D.A. Keim, An efficient approach to
clustering in large multimedia databases with noise, in
Proceedings of the Fourth International Conference on
Knowledge Discovery and Data Mining. 1998, AAAI Press:
New York, NY. p. 58-65.
- Cao, F., et al., Density-Based Clustering over an Evolving
Data Stream with Noise, in Proceedings of the 2006 SIAM
International Conference on Data Mining. p. 328-339.
- Mousavi, M., A.A. Bakar, and M. Vakilian, Data stream
clustering algorithms: A review. International Journal of
Advances in Soft Computing and its Applications, 2015.
7(Specialissue3): p. 1-15.
- Csernel, B., F. Clerot, and G. Hébrail. StreamSamp:
DataStream Clustering Over Tilted Windows Through
Sampling. in ECML PKDD 2006 Workshop on Knowledge
Discovery from Data Streams.
- Charu, C.A., et al., A framework for projected clustering of
high dimensional data streams, in Proceedings of the Thirtieth
international conference on Very large data bases - Volume
30 %@ 0-12-088469-0. 2004, VLDB Endowment: Toronto,
Canada. p. 852-863.
- Gao, J., et al. An Incremental Data Stream Clustering
Algorithm Based on Dense Units Detection. 2005. Berlin,
Heidelberg: Springer Berlin Heidelberg.
- Liu, L.x., et al. rDenStream, A Clustering Algorithm over an
Evolving Data Stream. in 2009 International Conference on
Information Engineering and Computer Science. 2009.
- Udommanetanakit, K., T. Rakthanmanon, and K. Waiyamai.
E-Stream: Evolution-Based Technique for Stream Clustering.
2007. Berlin, Heidelberg: Springer Berlin Heidelberg.
- Chairukwattana, R., et al. Efficient evolution-based clustering
of high dimensional data streams with dimension projection.
in 2013 International Computer Science and Engineering
Conference (ICSEC). 2013.
- Jia, C., C. Tan, and A. Yong. A Grid and Density-Based
Clustering Algorithm for Processing Data Stream. in 2008
Second International Conference on Genetic and
Evolutionary Computing. 2008.
- Meesuksabai, W., T. Kangkachit, and K. Waiyamai. HUEStream:
Evolution-Based Clustering Technique for
Heterogeneous Data Streams with Uncertainty. 2011. Berlin,
Heidelberg: Springer Berlin Heidelberg.
- Ackermann, M.R., et al., StreamKM++: A clustering
algorithm for data streams. J. Exp. Algorithmics, 2012. 17: p.
2.1-2.30.
- Ntoutsi, I., et al. Density-based Projected Clustering over
High Dimensional Data Streams. in SIAM International
Conference on Data Mining. 2012.
- Amini, A. and T.Y. Wah, LeaDen-Stream: A Leader DensityBased
Clustering Algorithm over Evolving Data Stream.
Journal of Computer and Communications, 2013. 1: p. 26-31.
- Hyde, R. and P. Angelov. A new online clustering approach
for data in arbitrary shaped clusters. in 2015 IEEE 2nd
International Conference on Cybernetics (CYBCONF). 2015.
- Masmoudi, N., et al. Incremental clustering of data stream
using real ants behavior. in 2014 Sixth World Congress on
Nature and Biologically Inspired Computing (NaBIC 2014).
2014.
- Masmoudi, N., et al., CL-AntInc Algorithm for Clustering
Binary Data Streams Using the Ants Behavior. Procedia
Comput. Sci., 2016. 96(C): p. 187-196.
- Ahmed, I., I. Ahmed, and W. Shahzad, Scaling up for high
dimensional and high speed data streams: HSDStream.
CoRR, 2015. abs/1510.03375.
- Choromanski, K., S. Kumar, and X. Liu, Fast Online
Clustering with Randomized Skeleton Sets. CoRR, 2015.
abs/1506.03425.
- Merino, J.A., Streaming data clustering in MOA using the
leader algorithm, in Department of Computer Science. 2015,
Universitat Polit`ecnica de Catalunya. p. 122.
- Hahsler, M. and M. Bolaños, Clustering Data Streams Based
on Shared Density between Micro-Clusters. IEEE
Transactions on Knowledge and Data Engineering, 2016.
28(6): p. 1449-1461.
- Khalilian, M., N. Mustapha, and N. Sulaiman, Data stream
clustering by divide and conquer approach based on vector
model. Journal of Big Data, 2016. 3(1): p. 1.
- Silva, J.d.A., et al., An evolutionary algorithm for clustering
data streams with a variable number of clusters. Expert Syst.
Appl., 2017. 67(C): p. 228-238.
- Xu, J., et al., Fat node leading tree for data stream clustering
with density peaks. Knowledge-Based Systems, 2017. 120: p.
99-117.
- Hyde, R., P. Angelov, and A.R. MacKenzie, Fully online
clustering of evolving data streams into arbitrarily shaped
clusters. Information Sciences, 2017. 382-383: p. 96-114.
- Laohakiat, S., S. Phimoltares, and C. Lursinsap, A clustering
algorithm for stream data with LDA-based unsupervised
localized dimension reduction. Information Sciences, 2017.
381: p. 104-123.
- Shao, X., M. Zhang, and J. Meng. Data Stream Clustering and
Outlier Detection Algorithm Based on Shared Nearest
Neighbor Density. in 2018 International Conference on
Intelligent Transportation, Big Data & Smart City (ICITBS).
2018.
- Keogh, E., et al. An online algorithm for segmenting time
series. in Proceedings 2001 IEEE International Conference
on Data Mining 2001. San Jose, CA, USA, USA: IEEE.
- Beringer, J. and E. Hüllermeier, Online clustering of parallel
data streams. Data & Knowledge Engineering, 2006. 58(2): p.
180-204.
- Rodrigues, P.P., J. Gama, and J. Pedroso, Hierarchical
Clustering of Time-Series Data Streams. IEEE Transactions
on Knowledge and Data Engineering, 2008. 20(5): p. 615-627.
- Chaovalit, P. and A. Gangopadhyay, A method for clustering
transient data streams, in Proceedings of the 2009 ACM
symposium on Applied Computing. 2009, ACM: Honolulu,
Hawaii. p. 1518-1519.
- Yeh, M.Y., B.R. Dai, and M.S. Chen, Clustering over Multiple
Evolving Streams by Events and Correlations. IEEE
Transactions on Knowledge and Data Engineering, 2007.
19(10): p. 1349-1362.