Akan veri kümeleme probleminde ağaç veri yapılarının performans karşılaştırması

Teknolojideki gelişmeler, insanların pek çok farklı kaynakta üretilen verileri toplamasına ve analiz etmesine imkân tanımıştır. Sensörler, mobil cihazlar, nesnelerin interneti gibi yapılarda üretilen veriler akan veri formatında olup, bu tür verilerden işlenerek faydalı bilgilerin elde edilmesi zor bir problemdir. Akan verileri analiz etmek için sıklıkla kullanılan yöntemlerden birisi olan kümelemede, veriler dağılımlarına göre çeşitli gruplara ayrılarak analiz edilir. Bu çalışmada, akan veri kümeleme problemi için iki yeni algoritma geliştirilerek literatürdeki başka bir yöntemle karşılaştırılmıştır. Farklı veri kümeleri üzerinde yapılan deneyler neticesinde, geliştirilen algoritmaların iyi sonuçlar verdiği görülmüştür.

A comparison of tree data structures in the streaming data clustering issue

Advances in technology have allowed people to collect data produced in many different sources and analyze it. The data collected from sensors, mobile devices, and the internet of things are in the form of streaming data, and it is a hard problem to obtain useful information from such data. In clustering which is one of the most frequently used methods to analyze streaming data, data is analyzed by dividing them into various groups according to their distribution. In this study, two new algorithms in streaming data clustering were developed and compared with another clustering algorithm in the literature. As a result, the algorithms developed gave successful results on different datasets.

___

  • AlNuaimi, N., et al., Streaming feature selection algorithms for big data: A survey. Applied Computing and Informatics, 2020.
  • Das, A., S. Das, and N.J.A.I.i.E. Rathee, Roles of Big Data, Data Science, Artificial Intelligence in Entrepreneurships. 2021.
  • Zheng, X., et al., A survey on multi-label data stream classification. IEEE Access, 2019. 8: p. 1249-1275.
  • Jain, A.K., Data clustering: 50 years beyond K-means. Pattern recognition letters, 2010. 31(8): p. 651-666.
  • Yin, C., et al., Anomaly detection model based on data stream clustering. Cluster Computing, 2019. 22(1): p. 1729-1738.
  • Laurinec, P. and M. Lucká, Interpretable multiple data streams clustering with clipped streams representation for the improvement of electricity consumption forecasting. Data Mining and Knowledge Discovery, 2019. 33(2): p. 413-445.
  • Gajowniczek, K., M. Bator, and T. Ząbkowski, Whole time series data streams clustering: dynamic profiling of the electricity consumption. Entropy, 2020. 22(12): p. 1414.
  • Tajalizadeh, H. and R. Boostani, A novel stream clustering framework for spam detection in Twitter. IEEE Transactions on Computational Social Systems, 2019. 6(3): p. 525-534.
  • Yin, J., et al. Model-based clustering of short text streams. in Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2018.
  • Diaz-Rozo, J., C. Bielza, and P. Larrañaga, Clustering of data streams with dynamic gaussian mixture models: an IoT application in industrial processes. IEEE Internet of Things Journal, 2018. 5(5): p. 3533-3547.
  • Al-Shammari, A., et al., An effective density-based clustering and dynamic maintenance framework for evolving medical data streams. International journal of medical informatics, 2019. 126: p. 176-186.
  • Hendricks, D., Using real-time cluster configurations of streaming asynchronous features as online state descriptors in financial markets. Pattern Recognition Letters, 2017. 97: p. 21-28.
  • Zubaroğlu, A. and V. Atalay, Data stream clustering: a review. Artificial Intelligence Review, 2021. 54(2): p. 1201-1236.
  • Kokate, U., et al., Data stream clustering techniques, applications, and models: comparative analysis and discussion. Big Data and Cognitive Computing, 2018. 2(4): p. 32.
  • Mansalis, S., et al., An evaluation of data stream clustering algorithms. Statistical Analysis and Data Mining: The ASA Data Science Journal, 2018. 11(4): p. 167-187.
  • Kranen, P., et al., The clustree: indexing micro-clusters for anytime stream mining. Knowledge and information systems, 2011. 29(2): p. 249-272.
  • Zhang, T., R. Ramakrishnan, and M. Livny, BIRCH: an efficient data clustering method for very large databases. ACM sigmod record, 1996. 25(2): p. 103-114.
  • Lang, A. and E. Schubert, BETULA: Fast clustering of large data with improved BIRCH CF-Trees. Information Systems, 2022. 108: p. 101918.
  • Aggarwal, C.C., et al. A framework for clustering evolving data streams. in Proceedings 2003 VLDB conference. 2003. Elsevier.
  • Zhou, A., et al., Tracking clusters in evolving data streams over sliding windows. Knowledge and Information Systems, 2008. 15(2): p. 181-214.
  • Karypis, G., E.-H. Han, and V. Kumar, Chameleon: Hierarchical clustering using dynamic modeling. Computer, 1999. 32(8): p. 68-75.
  • Lühr, S. and M. Lazarescu, Incremental clustering of dynamic data streams using connectivity based representative points. Data & knowledge engineering, 2009. 68(1): p. 1-27.
  • Udommanetanakit, K., T. Rakthanmanon, and K. Waiyamai. E-stream: Evolution-based technique for stream clustering. in International conference on advanced data mining and applications. 2007. Springer.
  • Meesuksabai, W., T. Kangkachit, and K. Waiyamai. Hue-stream: Evolution-based clustering technique for heterogeneous data streams with uncertainty. in International Conference on Advanced Data Mining and Applications. 2011. Springer.
  • Nikpour, S. and S. Asadi, A dynamic hierarchical incremental learning-based supervised clustering for data stream with considering concept drift. Journal of Ambient Intelligence and Humanized Computing, 2022: p. 1-21.
  • Sangma, J.W., et al., Hierarchical clustering for multiple nominal data streams with evolving behaviour. Complex & Intelligent Systems, 2022: p. 1-25.
  • Ahmed, R., G. Dalkılıç, and Y. Erten, DGStream: High quality and efficiency stream clustering algorithm. Expert Systems with Applications, 2020. 141: p. 112947.
  • Li, Y., et al., Esa-stream: Efficient self-adaptive online data stream clustering. IEEE Transactions on Knowledge and Data Engineering, 2020.
  • Huang, L., et al., MVStream: Multiview data stream clustering. IEEE transactions on neural networks and learning systems, 2019. 31(9): p. 3482-3496.
  • Laohakiat, S. and V. Sa-Ing, An incremental density-based clustering framework using fuzzy local clustering. Information Sciences, 2021. 547: p. 404-426.
  • Nguyen, H.-L., Y.-K. Woon, and W.-K. Ng, A survey on data stream clustering and classification. Knowledge and information systems, 2015. 45(3): p. 535-569.
  • Şenol, A. and H. Karacan, Kd-tree and adaptive radius (KD-AR Stream) based real-time data stream clustering. Journal of the Faculty of Engineering Architecture of Gazi University, 2020. 35(1): p. 337-354.
  • Bentley, J.L., Multidimensional binary search trees used for associative searching. Communications of the ACM, 1975. 18(9): p. 509-517.
  • Omohundro, S.M., Five balltree construction algorithms. 1989: International Computer Science Institute Berkeley.
  • Yianilos, P.N. Data Structures and Algorithms for Nearest Neighbor. in Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms. 1993. SIAM.
  • Cao, F., et al. Density-based clustering over an evolving data stream with noise. in Proceedings of the 2006 SIAM international conference on data mining. 2006. SIAM.
  • Dua, D. and C. Graff. UCI Machine Learning Repository. 2021; Available from: http://archive.ics.uci.edu/ml.
Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi-Cover
  • ISSN: 1300-1884
  • Yayın Aralığı: Yılda 4 Sayı
  • Başlangıç: 1986
  • Yayıncı: Oğuzhan YILMAZ
Sayıdaki Diğer Makaleler

Binalarda giydirme cephe açısının enerji tüketimine etkilerinin incelenmesi

Kübra SÜMER HAYDARASLAN, Neşe DİKMEN

Pigmentlerin ve kompozisyonun renkli porselen çamur ve bünye özelliklerine etkisinin araştırılması

Özge ÖZEL, Evren ARIÖZ, Hanife KADIOĞLU

3 Boyutlu yazıcı kullanılarak üretilen sandviç kompozitlerin düşük hızda darbe performanslarının araştırılması

Serdar KAVELOĞLU, Şemsettin TEMİZ

Transformatör mimarisi tabanlı derin öğrenme yöntemi ile Türkçe haber metinlerine başlık üretme

Abdulkadir KARACA, Özlem AYDIN

Kimyasal depolama tesisinde risk analizi: bağımsız koruma katmanlarının etkisi

Oğuzcan TAŞDEMİR, Saliha ÇETİNYOKUŞ

Elektromanyetik dalga yayılım ve saçılım problemlerine analitik ve sayısal pertürbasyon teknikleri ile çözüm yaklaşımı

Murat Koray AKKAYA, Asım Egemen YILMAZ, Mustafa KUZUOĞLU

HBIM destekli arşiv modeli

Merve ANAÇ, E. Gorun ARUN

GO ve GO-EDTA kullanılarak sulu çözeltiden Thioflavin T ve Co (II) iyonlarının uzaklaştırılması; adsorpsiyon parametrelerinin araştırılması

Eda GÖKIRMAK SÖĞÜT, Metin ÇELEBİ

Çoklu şarj teknolojisine dayalı kısmi şarj politikalı karma filolu araç rotalama problemi: Matematiksel model ve çözüm kurucu sezgisel

Sercan DÖNMEZ, Çağrı KOÇ, Fulya ALTIPARMAK

Yakın-fay puls modellerinin yalıtımlı binaların kurşun çekirdek ısınmasına bağlı davranışlarının belirlenmesindeki etkinliği

Zafer KANBİR, Hatice GAZİ, Seda ÖNCÜ DAVAS, Cenk ALHAN