Gerçek Zamanlı Büyük Veri İşleme ve Analitiği: Kavramlar, Teknolojiler ve Etki Alanları

Dijital çağda veriler, değerli bilgileri gizlediği için en önemli varlıklardan biridir. Veri yoğun sistemlerin geliştiricileri, çeşitli biçimlerde ve hızlarda büyük miktarda verinin akışının, depolanmasının ve işlenmesinin her düzeyinde yeni zorluklarla karşılaşmaktadır. Doğru zamanda ve yerde faydalı bilgiler edinmek de çok önemlidir. Bilginin değeri zamanla ters orantılı olduğundan, gerçek zamanlı veri işleme ve analitik daha fazla ilgi görmektedir. Gerçek zamanlı veri işleme ve analitiğin önemi nedeniyle, bu çalışmada gerçek zamanlı veri işleme kavramları ve terminolojisi, gerçek zamanlı veri işleme ve analitikte kullanılan popüler teknolojiler, gerçek zamanlı veri işlemede kullanılan popüler NoSQL depolama teknolojileri, ve gerçek zamanlı veri işleme uygulama alanları sunulmuştur. Bu makalenin amacı, gerçek zamanlı veri işleme teknolojilerinin temel özelliklerini, NoSQL depolama teknolojilerini ve bunların uygulamalarını vurgulayarak, gerçek zamanlı analiz araştırmacılarına ve veri yoğun sistem geliştiricilerine gerçek zamanlı veri işleme konusunda önceki çalışmalardan seçilmiş örnekler ile karşılaştırmalı bir bakış açısı sağlamaktır.

Real-Time Big Data Processing and Analytics: Concepts, Technologies, and Domains

In the digital era, data is one of the most important assets since it conceals valuable information. Developers of data-intensive systems have new challenges at each level of streaming, storing, and processing large quantities of data in a variety of forms and speeds. Obtaining useful information at the proper time and place is also crucial. Since the value of information is inversely proportional to time, real-time data processing and analytics are receiving more attention. Due to the importance of real-time data processing and analytics, this study focuses on real-time data processing concepts and terminology, popular technologies used in real-time data processing and analytics, popular NoSQL storage technologies used in real-time data processing, and real-time data processing application areas. The purpose of this paper is to provide researchers of real-time analysis and developers of data-intensive systems with a comparative perspective on real-time data processing by highlighting the key characteristics of real-time data processing technologies, NoSQL storage technologies, their application domains, and selected examples from previous studies.

___

  • Abdul Ghani, N. B., Hamid, S., Ahmad, M., Saadi, Y., Jhanjhi, N. Z., Alzain, M. A., & Masud, M. (2021). Tracking Dengue on Twitter Using Hybrid Filtration-Polarity and Apache Flume. Computer Systems Science and Engineering, 40(3), 913–926. https://doi.org/10.32604/CSSE.2022.018467
  • Acharjya, D. P., & Ahmed, K. (n.d.). A Survey on Big Data Analytics: Challenges, Open Research Issues and Tools. www.ijacsa.thesai.org
  • Acharjya, D. P., & Ahmed P, K. (2016). A Survey on Big Data Analytics: Challenges, Open Research Issues and Tools. International Journal of Advanced Computer Sciences and Applıcatıons, 7(2), 511–518.
  • Alhomsi, Y., Alsalemi, A., al Disi, M., Bensaali, F., Amira, A., & Alinier, G. (2019). CouchDB Based Real-Time Wireless Communication System for Clinical Simulation. Proceedings - 20th International Conference on High Performance Computing and Communications, 16th International Conference on Smart City and 4th International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018, 1094–1098. https://doi.org/10.1109/HPCC/SmartCity/DSS.2018.00182
  • Apache Software Foundation. (2022a). Cassandra. https://cassandra.apache.org/_/index.html
  • Apache Software Foundation. (2022b). CouchDB. https://couchdb.apache.org/
  • Apache Software Foundation. (2022c). Flink. https://flink.apache.org/
  • Apache Software Foundation. (2022d). Flume. https://flume.apache.org/
  • Apache Software Foundation. (2022e). Hadoop. https://hadoop.apache.org/
  • Apache Software Foundation. (2022f). HBase. https://hbase.apache.org/
  • Apache Software Foundation. (2022g). Kafka. https://kafka.apache.org/
  • Apache Software Foundation. (2022h). Spark. https://spark.apache.org/
  • Apache Software Foundation. (2022i). Storm. https://storm.apache.org/
  • Aydin, A. A. (2016). INCREMENTAL DATA COLLECTION & ANALYTICS THE DESIGN OF NEXT-GENERATION CRISIS INFORMATICS SOFTWARE.
  • Aydin, A. A., & Anderson, K. M. (2017). Batch to Real-Time : Incremental Data Collection & Analytics Platform. Proceedings of the 50th Hawaii International Conference on System Sciences, 5911–5920.
  • Azzedin, F. (2013). Towards a scalable HDFS architecture. Proceedings of the 2013 International Conference on Collaboration Technologies and Systems, CTS 2013, 155–161. https://doi.org/10.1109/CTS.2013.6567222
  • Bagga, S., & Sharma, A. (2019). Big Data and Its Challenges: A Review. Proceedings - 4th International Conference on Computing Sciences, ICCS 2018, 183–187. https://doi.org/10.1109/ICCS.2018.00037
  • Bajaber, F., Elshawi, R., Batarfi, O., Altalhi, A., Barnawi, A., & Sakr, S. (2016). Big Data 2.0 Processing Systems: Taxonomy and Open Challenges. Journal of Grid Computing, 14(3), 379–405. https://doi.org/10.1007/s10723-016-9371-1
  • Baron, C. A. (2015). NoSQL Key-Value DBs Riak and Redis. In Database Systems Journal: Vol. VI (Issue 4).
  • Beata, P. A., Jeffers, A. E., & Kamat, V. R. (2018). Real-Time Fire Monitoring and Visualization for the Post-Ignition Fire State in a Building. Fire Technology, 54(4), 995–1027. https://doi.org/10.1007/s10694-018-0723-1
  • Chatterjee, N., Chakraborty, S., Decosta, A., & Nath, A. (2018). Real-time Communication Application Based on Android Using Google Firebase. International Journal of Advance Research in Computer Science and Management Studies, 6(4). www.ijarcsms.com
  • Croushore, D., & Stark, T. (2001). A real-time data set for macroeconomists. In Journal of Econometrics (Vol. 105). www.elsevier.com/locate/econbase
  • DB-Engines. (2022). https://db-engines.com/en/
  • de Castro Martins, J., Mancilha Pinto, A. F., Junior, E. E. B., Goncalves, G. S., Louro, H. D. B., Gomes, J. M., Filho, L. A. L., da Silva, L. H. R. C., Rodrigues, R. A., Neto, W. C., da Cunha, A. M., & Dias, L. A. V. (2018). Using big data, internet of things, and agile for crises management. Advances in Intelligent Systems and Computing, 558, 373–382. https://doi.org/10.1007/978-3-319-54978-1_50
  • Diogo, M., Cabral, B., & Bernardino, J. (2019). Consistency models of NoSQL databases. In Future Internet (Vol. 11, Issue 2). MDPI AG. https://doi.org/10.3390/fi11020043
  • Doğuç, T. B., & Aydin, A. A. (2019). CAP-based Examination of Popular NoSQL Database Technologies in Streaming Data Processing. 2019 International Artificial Intelligence and Data Processing Symposium (IDAP).
  • Dutta, K., & Jayapal, M. (2016). Big Data Analytics for Real Time Systems. https://www.researchgate.net/publication/304078196
  • Erzi, H. M., & Aydin, A. A. (2020). IoT Based Mobile Smart Home Surveillance Application. 4th International Symposium on Multidisciplinary Studies and Innovative Technologies, ISMSIT 2020 - Proceedings. https://doi.org/10.1109/ISMSIT50672.2020.9255303
  • Gavrilenko, I., Sharma, M., Litmaath, M., Tikhomirova, T., Gavrilenko, I., Sharma, M., Litmaath, M., & Tikhomirova, T. (2019). DYNAMIC APACHE SPARK CLUSTER FOR ECONOMIC MODELING.
  • Gibadullin, R. F., Baimukhametova, G. A., & Perukhin, M. Y. (2019). Service-Oriented Distributed Energy Data Management Using Big Data Technologies; Service-Oriented Distributed Energy Data Management Using Big Data Technologies. In 2019 International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM).
  • Google Trends. (2022). https://trends.google.com/trends/
  • Guo, D., & Onstein, E. (2020). State-of-the-art geospatial information processing in NoSQL databases. In ISPRS International Journal of Geo-Information (Vol. 9, Issue 5). MDPI AG. https://doi.org/10.3390/ijgi9050331
  • Gürcan, F., & Berigel, M. (2018). Real-Time Processing of Big Data Streams: Lifecycle, Tools, Tasks, and Challenges; Real-Time Processing of Big Data Streams: Lifecycle, Tools, Tasks, and Challenges. In 2018 2nd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT).
  • Hamadou, H. ben, Bach Pedersen, T., & Thomsen, C. (2020). The Danish National Energy Data Lake: Requirements, Technical Architecture, and Tool Selection. Proceedings - 2020 IEEE International Conference on Big Data, Big Data 2020, 1523–1532. https://doi.org/10.1109/BigData50022.2020.9378368
  • Han, H., Yonggang, W., Tat-Seng, C., & Xuelong, L. (2014). Toward Scalable Systems for Big Data Analytics: A Technology Tutorial. Access, IEEE, 2, 652–687. https://doi.org/0.11 09/ACCESS.2014.2332453
  • Hegde, G. P., Tech, M., Hegde, N., & Seetha, M. (2021). SMART CITY DATA GENERATION FOR IOT APPLICATIONS USING ESSENTIAL HADOOP FRAMEWORKS. Embracing Change & Transformation-Breakthrough Innovation and Creativity, 153–160.
  • Jiang, S., Qian, X., Mei, T., & Fu, Y. (2016). Personalized Travel Sequence Recommendation on Multi-Source Big Social Media. IEEE Transactions on Big Data, 2(1), 43–56. https://doi.org/10.1109/tbdata.2016.2541160
  • Kejariwal, A., Kulkarni, S., & Ramasamy, K. (2017). Real Time Analytics: Algorithms and Systems. http://arxiv.org/abs/1708.02621
  • Khan, M. F., Azam, M., Khan, M. A., Algarni, F., Ashfaq, M., Ahmad, I., & Ullah, I. (2021). A Review of Big Data Resource Management: Using Smart Grid Systems as a Case Study. Wireless Communications and Mobile Computing, 2021. https://doi.org/10.1155/2021/3740476
  • Krishnamoorthy, R., & Udhayakumar, K. (2021). Futuristic bigdata framework with optimization techniques for wind energy resource assessment and management in smart grid. Proceedings of the 7th International Conference on Electrical Energy Systems, ICEES 2021, 507–514. https://doi.org/10.1109/ICEES51510.2021.9383710
  • Lakshman, A., & Malik, P. (2014). Cassandra - A Decentralized Structured Storage System. Dancing Times, 105(1252), 43. https://doi.org/10.1145/1773912.1773922
  • Lennon, J. (2009). CouchDB Beginning.
  • Li, W. J., Yen, C., Lin, Y. S., Tung, S. C., & Huang, S. M. (2018). JustIoT Internet of Things based on the Firebase real-time database. Proceedings - 2018 IEEE International Conference on Smart Manufacturing, Industrial and Logistics Engineering, SMILE 2018, 2018-January, 43–47. https://doi.org/10.1109/SMILE.2018.8353979
  • Liu, X., Lftikhar, N., & Xie, X. (2014). Survey of real-time processing systems for big data. ACM International Conference Proceeding Series, 356–361. https://doi.org/10.1145/2628194.2628251
  • Lv, Z., Chirivella, J., & Gagliardo, P. (2016). Bigdata oriented multimedia mobile health applications. Journal of Medical Systems, 40(5). https://doi.org/10.1007/s10916-016-0475-8
  • Lv, Z., Li, X., Zhang, B., Wang, W., Zhu, Y., Hu, J., & Feng, S. (2016). Managing Big City Information Based on WebVRGIS. IEEE Access, 4, 407–415. https://doi.org/10.1109/ACCESS.2016.2517076
  • Lv, Z., Song, H., Basanta-Val, P., Steed, A., & Jo, M. (2017). Next-Generation Big Data Analytics: State of the Art, Challenges, and Future Research Topics. IEEE Transactions on Industrial Informatics, 13(4), 1891–1899. https://doi.org/10.1109/TII.2017.2650204
  • Miler, M., Medak, D., & Odobasic, D. (2011). Two-Tier Architecture for Web Mapping with NoSQL Database CouchDB. 62–71. https://www.researchgate.net/publication/236951067
  • MongoDB. (2022). https://www.mongodb.com/
  • Moroney, L. (2017a). The Definitive Guide to Firebase. In The Definitive Guide to Firebase. Apress. https://doi.org/10.1007/978-1-4842-2943-9
  • Moroney, L. (2017b). The Definitive Guide to Firebase. In The Definitive Guide to Firebase. https://doi.org/10.1007/978-1-4842-2943-9
  • Nambiar, S., Kalambur, S., & Sitaram, D. (2020). Modeling Access Control on Streaming Data in Apache Storm. Procedia Computer Science, 171, 2734–2739. https://doi.org/10.1016/j.procs.2020.04.297
  • Nasiri, H., Nasehi, S., & Goudarzi, M. (2019). Evaluation of distributed stream processing frameworks for IoT applications in Smart Cities. Journal of Big Data, 6(1). https://doi.org/10.1186/s40537-019-0215-2
  • Nasr, K. (2021). Comparison of Popular Data Processing Systems KTH Thesis Report. Degree Project in Computer Science and Engineering, 76. https://www.diva-portal.org/smash/record.jsf?dswid=6172&pid=diva2%3A1547503
  • Oussous, A., Benjelloun, F. Z., Ait Lahcen, A., & Belfkih, S. (2018). Big Data technologies: A survey. In Journal of King Saud University - Computer and Information Sciences (Vol. 30, Issue 4, pp. 431–448). King Saud bin Abdulaziz University. https://doi.org/10.1016/j.jksuci.2017.06.001
  • Philip Chen, C. L., & Zhang, C. Y. (2014). Data-intensive applications, challenges, techniques and technologies: A survey on Big Data. Information Sciences, 275, 314–347. https://doi.org/10.1016/j.ins.2014.01.015
  • Redis. (2022). https://redis.io/
  • Riak. (2022). https://riak.com/
  • Ryan, J. (2019). Big Data Velocity in Plain English. https://www.voltdb.com/wp-content/uploads/2018/02/VoltDB_BigData_eBook_Feb2018-v2.pdf
  • Saloot, M. A., & Pham, D. N. (2021). Real-time Text Stream Processing: A Dynamic and Distributed NLP Pipeline. ACM International Conference Proceeding Series, 575–584. https://doi.org/10.1145/3459104.3459198
  • Saranya, K., Chellammal, S., & Chelliah, P. R. (2020). Ontology-Based Information Retrieval for Healthcare Systems.
  • Schram, A., & Anderson, K. M. (2012). MySQL to NoSQL. 191. https://doi.org/10.1145/2384716.2384773
  • Singh, V. K., Taram, M., Agrawal, V., & Baghel, B. S. (2018). A Literature Review on Hadoop Ecosystem and Various Techniques of Big Data Optimization. In Lecture Notes in Networks and Systems (Vol. 38, pp. 231–240). Springer. https://doi.org/10.1007/978-981-10-8360-0_22
  • Splunk. (2022). https://www.splunk.com/
  • Sudhakar Yadav, N., Eswara Reddy, B., & Srinivasa, K. G. (2018). Cloud-Based Healthcare Monitoring System Using Storm and Kafka. In Towards Extensible and Adaptable Methods in Computing (pp. 99–106). Springer Singapore. https://doi.org/10.1007/978-981-13-2348-5_8
  • Sun, Z., Han, L., Huang, W., Wang, X., Zeng, X., Wang, M., & Yan, H. (2015). Recommender systems based on social networks. Journal of Systems and Software, 99, 109–119.
  • Syed, D., Zainab, A., Ghrayeb, A., Refaat, S. S., Abu-Rub, H., & Bouhali, O. (2021). Smart Grid Big Data Analytics: Survey of Technologies, Techniques, and Applications. IEEE Access, 9, 59564–59585. https://doi.org/10.1109/ACCESS.2020.3041178
  • Tang, L., Li, J., Du, H., Li, L., Wu, J., & Wang, S. (2022). Big Data in Forecasting Research: A Literature Review. Big Data Research, 27, 100289. https://doi.org/10.1016/j.bdr.2021.100289
  • Verma, S., Kawamoto, Y., Fadlullah, Z. M., Nishiyama, H., & Kato, N. (2017). A Survey on Network Methodologies for Real-Time Analytics of Massive IoT Data and Open Research Issues. In IEEE Communications Surveys and Tutorials (Vol. 19, Issue 3, pp. 1457–1477). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/COMST.2017.2694469
  • Vohra, D. (2016). Practical Hadoop Ecosystem. In Practical Hadoop Ecosystem. Apress. https://doi.org/10.1007/978-1-4842-2199-0
  • Xie, L., Zhou, W., & Li, Y. (2016). Application of improved recommendation system based on spark platform in big data analysis. Cybernetics and Information Technologies, 16(Specialissue6), 245–255. https://doi.org/10.1515/cait-2016-0092
  • Yang, J., Wang, H., Lv, Z., Wei, W., Song, H., Erol-Kantarci, M., Kantarci, B., & He, S. (2017). Multimedia recommendation and transmission system based on cloud platform. Future Generation Computer Systems, 70, 94–103. https://doi.org/10.1016/j.future.2016.06.015
  • Yaqoob, I., Hashem, I. A. T., Gani, A., Mokhtar, S., Ahmed, E., Anuar, N. B., & Vasilakos, A. v. (2016). Big data: From beginning to future. In International Journal of Information Management (Vol. 36, Issue 6, pp. 1231–1247). Elsevier Ltd. https://doi.org/10.1016/j.ijinfomgt.2016.07.009
  • Zheng, Z., Wang, P., Liu, J., & Sun, S. (2015). Real-time big data processing framework: Challenges and solutions. Applied Mathematics and Information Sciences, 9(6), 3169–3190. https://doi.org/10.12785/amis/090646