Tweet Toplama, Analiz ve Depolama için Platform Tasarımı (TweetCASP)

Bu büyük veri çağında, büyük miktarda verinin akışı, depolanması ve analizi çeşitli zorluklar ortaya çıkarır. Yararlı bilgilere ulaşmak için yoğun veri kullanan sistemlerin tasarımcıları tarafından çeşitli zorluklar ele alınmalıdır. Verilerin toplanması, saklanması ve analiz edilmesi, anlamlı öngörüler elde etmek için uygun veri işleme ve analitik teknolojilerinden oluşan bir toplama ve analiz platformu gerektirir. Bu çalışmada, Twitter'ın Streaming API'sini kullanarak kullanıcı tarafından girilen anahtar kelimelere dayalı olarak tweet'leri toplayan, akış verileri üzerinde gerçek zamanlı analitik için bir ortam sağlayan ve gelecekteki toplu iş odaklı görevleri yerine getirmek için verileri bir Apache Cassandra NoSQL veri deposunda kalıcı olarak depolayan TweetCASP (Tweet toplama, analiz ve depolama platformu) sistemini sunuyoruz. Ayrıca, TweetCASP, yazılım geliştiriciler, tasarımcılar ve bu alandaki araştırmacılar için veri toplama ve analiz için kullanılan veri yoğunluklu bir sistem örneği sunar.

Designing a platform for Tweet Collection, Analytics and Storage (TweetCASP)

In this era of big data, the streaming, storage, and analysis of large amounts of data present a variety of challenges. Several challenges must be addressed by designers of data-intensive systems in order to retrieve useful information. Collecting, storing, and analyzing data requires a collection and analytics platform comprised of an appropriate choice of data processing and analytics technologies in order to acquire meaningful insight. In this paper, we report on TweetCASP (Tweet Collection, Analytics and Storage Platfrom), which gathers tweets based on user-entered keywords using Twitter's Streaming API, providing an environment for real-time analytics on streaming data and permanently storing data in an Apache Cassandra NoSQL datastore to fulfill future batch-oriented data processing requirements. Moreover, The TweetCASP presents an example of a data-intensive system used by software developers, designers, and researchers for data collecting and analytics.

___

  • Amghar, S., Cherdal, S., & Mouline, S. (2020). Storing , preprocessing and analyzing tweets : finding the suitable noSQL system. https://doi.org/10.1080/1206212X.2020.1846946
  • Anderson, K. M., Aydin, A. A., Barrenechea, M., Cardenas, A., Hakeem, M., & Jambi, S. (2015). Design Challenges/Solutions for Environments Supporting the Analysis of Social Media Data in Crisis Informatics Research. 2015 48th Hawaii International Conference on System Sciences, 2015-March, 163–172. https://doi.org/10.1109/HICSS.2015.29
  • Anderson, K. M., & Schram, A. (2011). Design and implementation of a data analytics infrastructure in support of crisis informatics research: NIER track. 2011 33rd International Conference on Software Engineering (ICSE), 844–847. https://doi.org/10.1145/1985793.1985920
  • ApacheCassandra. (2022). ApacheCassandra.pdf. https://cassandra.apache.org/_/index.html Aswathy, A., Prabha, R., Gopal, L. S., Pullarkatt, D., & Ramesh, M. V. (2022). An efficient twitter data collection and analytics framework for effective disaster management. 2022 IEEE Delhi Section Conference, DELCON 2022. https://doi.org/10.1109/DELCON54057.2022.9753627
  • Aydin, A. A. (2016). INCREMENTAL DATA COLLECTION & ANALYTICS THE DESIGN OF NEXT-GENERATION CRISIS INFORMATICS SOFTWARE [Ph.D., University of Colorado Boulder].
  • https://www.proquest.com/pagepdf/1834583278/Record/9F7C2D640FDE4BCCPQ/3?accountid=16268 Aydin, A. A., & Anderson, K. M. (2017). Batch to Real-Time : Incremental Data Collection & Analytics Platform.
  • Proceedings of the 50th Hawaii International Conference on System Sciences, 5911–5920. http://hdl.handle.net/10125/41876
  • Aydin, A. A., & Anderson, K. M. (2020). Data modelling for large-scale social media analytics: design challenges and lessons learned. International Journal of Data Mining, Modelling and Management, 12(4), 386. https://doi.org/10.1504/IJDMMM.2020.111409
  • Brewer, E. (2012). CAP Twelve Years Later: How the “Rules” Have Changed. Computer, February. https://doi.org/10.1109/MC.2012.37
  • DB-ENGINES. (2022). DB-Engines Ranking. https://db-engines.com/en/ranking Doguc, T. B., & Aydin, A. A. (2019). CAP-based Examination of Popular NoSQL Database Technologies in Streaming Data Processing. 2019 International Artificial Intelligence and Data Processing Symposium (IDAP), 1–6. https://doi.org/10.1109/IDAP.2019.8875874
  • Domo Company. (2023). Data Never Sleeps 9.0. https://www.domo.com/learn/infographic/data-never-sleeps-9 Flask Documentation (2.3.x). (2023). https://flask.palletsprojects.com/en/2.3.x/tutorial/database/ Gartner Inc. (2022). Gartner. https://www.gartner.com/en/glossary/all-terms
  • Gehring, M., Charfuelan, M., & Markl, V. (2019). A comparison of distributed stream processing systems for time series analysis. Lecture Notes in Informatics (LNI), Proceedings - Series of the Gesellschaft Fur Informatik (GI), P-290, 205–214. https://doi.org/10.18420/btw2019-ws-21
  • Han, H., Yonggang, W., Tat-Seng, C., & Xuelong, L. (2014). Toward Scalable Systems for Big Data Analytics: A Technology Tutorial. Access, IEEE, 2, 652–687. https://doi.org/0.11 09/ACCESS.2014.2332453
  • Jambi, S., & Anderson, K. M. (2017). Engineering scalable distributed services for real-time big data analytics. Proceedings - 3rd IEEE International Conference on Big Data Computing Service and Applications, BigDataService 2017, 131–140. https://doi.org/10.1109/BigDataService.2017.22
  • KEKEVİ, U., & AYDIN, A. A. (2022). Real-Time Big Data Processing and Analytics: Concepts, Technologies, and Domains. Computer Science, 55(35), 1–100. https://doi.org/10.53070/bbd.1204112
  • Laksham Avinash, & Prashant Malik. (2010). Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 1–6. https://doi.org/10.1145/1773912.1773922
  • Lv, Z., Song, H., Basanta-Val, P., Steed, A., & Jo, M. (2017). Next - Generation Big Data Analytics : State of the. IEEE Transactions on Industrial Informatics, 3203(4), 1891–1899.
  • Marungo, F. (2018). A primer on NoSQL databases for enterprise architects: The cap theorem and transparent data access with MongoDB and Cassandra. Proceedings of the Annual Hawaii International Conference on System Sciences, 2018-Janua, 4621–4630. https://doi.org/10.24251/hicss.2018.583
  • RabbitMQ. (2023). https://www.rabbitmq.com/
  • Roesslein, J. (2022). Tweepy. https://www.tweepy.org/
  • Rostanski, M., Grochla, K., & Seman, A. (2014). Evaluation of highly available and fault-tolerant middleware clustered architectures using RabbitMQ. 2014 Federated Conference on Computer Science and Information Systems, FedCSIS 2014, 879–884. https://doi.org/10.15439/2014F48
  • Satauri, I., Satouri, B., & El Beqqali, O. (2023). Big Data Analysis in Commercial Social Networks: Analysis of Twitter Reviews for Marketing Decision Making. European Journal of Information Technologies and Computer Science, 3(2), 1–6. https://doi.org/10.24018/compute.2023.3.2.94
  • Syed, D., Zainab, A., Ghrayeb, A., Refaat, S. S., Abu-Rub, H., & Bouhali, O. (2021). Smart Grid Big Data Analytics: Survey of Technologies, Techniques, and Applications. IEEE Access, 9, 59564–59585. https://doi.org/10.1109/ACCESS.2020.3041178
  • Twitter. (2022). Twitter Streaming API. https://developer.twitter.com/en/docs/twitter-api/v1/tweets/filter-realtime/overview
  • Vanam, H., & R, J. R. R. (2023). Sentiment Analysis of Twitter Data Using Big Data Analytics and Deep Learning Model. 2023 International Conference on Artificial Intelligence and Knowledge Discovery in Concurrent Engineering (ICECONF), 1–6. https://doi.org/10.1109/ICECONF57129.2023.10084281
  • Yaqoob, I., Hashem, I. A. T., Gani, A., Mokhtar, S., Ahmed, E., Anuar, N. B., & Vasilakos, A. v. (2016). Big data: From beginning to future. International Journal of Information Management, 36(6), 1231–1247. https://doi.org/10.1016/j.ijinfomgt.2016.07.009
Bilgisayar Bilimleri-Cover
  • ISSN: 2548-1304
  • Yayın Aralığı: Yılda 2 Sayı
  • Başlangıç: 2016
  • Yayıncı: Ali KARCI