Data division effect on machine learning performance for prediction of streamflow

Accurate estimation of stream flow has an important role in water resources management, disaster preparedness and early warning, reservoir operation, and sizing of water structures. In this study, Extreme gradient boosting (XGBoost) and K-Nearest Neighbours (KNN) algorithms are used for modeling river flows. In order to reveal the appropriate model, the raw model and models with optimized parameters were evaluated while the models were being built. In the setup of the models, various training test rates were also tried, and it was investigated which data division showed more effective results. For this purpose, the data were divided into ratios such as 60-40, 70-30, 80-20, and 90-10, respectively, and the model results were compared. Various statistical indicators such as root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2) were used when comparing the models. As a result of the analysis, it was determined that the most suitable model for monthly flow estimation was obtained by using the optimized Xgboost algorithm and 60-40% data division. The obtained outputs constitute a vital resource for decision-makers regarding water resources planning and flood and drought management.

Data division effect on machine learning performance for prediction of streamflow

Accurate estimation of streamflow has an important role in water resources management, disaster preparedness and early warning, reservoir operation, and sizing of water structures. In this study, Extreme gradient boosting (XGBoost) and K-Nearest Neighbours (KNN) algorithms are used for the estimation of streamflow. In order to reveal the appropriate model, the raw model and models with optimized parameters were evaluated while the models were being built. In the setup of the models, various training test rates were also tried, and it was investigated which data division showed more effective results. For this purpose, the data were divided into ratios such as 60-40, 70-30, 80-20, and 90-10, respectively, and the model results were compared. Various statistical indicators such as Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Coefficient of Determination (R2) were used when comparing the models. As a result of the analysis, it was determined that the most suitable model for monthly streamflow estimation was obtained by using the optimized Xgboost algorithm and 60-40% data division. The obtained outputs constitute a vital resource for decision-makers regarding water resources planning and flood and drought management.

___

  • Reference1 [1] X. Yu, Y. Wang, L. Wu, G. Chen, L. Wang, and H. Qin, "Comparison of support vector regression and extreme gradient boosting for decomposition-based data-driven 10-day streamflow forecasting," Journal of Hydrology, vol. 582, p. 124293, 2020.
  • Reference2 [2] P. Parisouj, H. Mohebzadeh, and T. Lee, "Employing machine learning algorithms for streamflow prediction: a case study of four river basins with different climatic zones in the United States," Water Resources Management, vol. 34, no. 13, pp. 4113-4131, 2020.
  • Reference3 [3] W. Wang, Stochasticity, nonlinearity and forecasting of streamflow processes. Ios Press, 2006.
  • Reference4 [4] F. Tosunoğlu, S. HANAY, E. Çintaş, and B. Özyer, "Monthly streamflow forecasting using machine learning," Erzincan University Journal of Science and Technology, vol. 13, no. 3, pp. 1242-1251, 2020.
  • Reference5 [5] R. M. Adnan, Z. Liang, A. Kuriqi, O. Kisi, A. Malik, and B. Li, "Streamflow forecasting using heuristic machine learning methods," in 2020 2nd International Conference on Computer and Information Sciences (ICCIS), 2020: IEEE, pp. 1-6.
  • Reference6 [6] L. Ni et al., "Streamflow forecasting using extreme gradient boosting model coupled with Gaussian mixture model," Journal of Hydrology, vol. 586, p. 124901, 2020.
  • Reference7 [7] H. Tyralis, G. Papacharalampous, and A. Langousis, "Super ensemble learning for daily streamflow forecasting: Large-scale demonstration and comparison with multiple machine learning algorithms," Neural Computing and Applications, vol. 33, no. 8, pp. 3053-3068, 2021.
  • Reference8 [8] R. M. Adnan, R. R. Mostafa, A. Elbeltagi, Z. M. Yaseen, S. Shahid, and O. Kisi, "Development of new machine learning model for streamflow prediction: Case studies in Pakistan," Stochastic Environmental Research and Risk Assessment, vol. 36, no. 4, pp. 999-1033, 2022.
  • Reference9 [9] S. G. Meshram, C. Meshram, C. A. G. Santos, B. Benzougagh, and K. M. Khedher, "Streamflow prediction based on artificial intelligence techniques," Iranian Journal of Science and Technology, Transactions of Civil Engineering, vol. 46, no. 3, pp. 2393-2403, 2022.
  • Reference10 [10] T. Chen and C. Guestrin, "Xgboost: A scalable tree boosting system," in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785-794.
  • Reference11 [11] W. Yucong and W. Bo, "Research on EA-xgboost hybrid model for building energy prediction," in Journal of Physics: Conference Series, 2020, vol. 1518, no. 1: IOP Publishing, p. 012082.
  • Reference12 [12] D. Kılınç, E. Borandağ, F. Yücalar, V. Tunalı, M. Şİimşek, and A. Özçift, "KNN algoritması ve r dili ile metin madenciliği kullanılarak bilimsel makale tasnifi," Marmara Fen Bilimleri Dergisi, vol. 28, no. 3, pp. 89-94, 2016.
  • Reference13 [13] A. Yıldırım, "Karakaya barajı ve doğal çevre etkileri," DÜ Ziya Gökalp Eğitim Fakültesi Dergisi, vol. 6, pp. 32-39, 2006.
  • Reference14 [14] EIEI, "General Directorate of Electric Power Resources Survey and Development Administration.," 2011.
  • Reference15 [15] M. Rose and N. Chithra, "Tree-based ensemble model prediction for hydrological drought in a tropical river basin of India," International Journal of Environmental Science and Technology, pp. 1-18, 2022.
  • Reference16 [16] M. A. Ghorbani, R. C. Deo, S. Kim, M. Hasanpour Kashani, V. Karimi, and M. Izadkhah, "Development and evaluation of the cascade correlation neural network and the random forest models for river stage and river flow prediction in Australia," Soft Computing, vol. 24, no. 16, pp. 12079-12090, 2020.
  • Reference17 [17] M. Elkurdy, A. D. Binns, and B. Gharabaghi, "Improved Streamflow Forecasting Using Variational Mode Decomposition and Extreme Gradient Boosting," in AGU Fall Meeting Abstracts, 2020, vol. 2020, pp. H165-0003.
Dicle Üniversitesi Mühendislik Fakültesi Mühendislik Dergisi-Cover
  • ISSN: 1309-8640
  • Başlangıç: 2009
  • Yayıncı: DÜ Mühendislik Fakültesi / Dicle Üniversitesi
Sayıdaki Diğer Makaleler

Geosentetik Takviyeli Kum Kazıkları İle Güçlendirilen Yumuşak Kil Zeminlerin Düşey Yer Değiştirme Davranışının Değerlendirilmesi

Kaveh DEHGHANİAN, Serpil ERDEN

A brain-computer interface with gamification in the Metaverse

Yaşar DAŞDEMİR

Automated Detection of Alzheimer’s Disease using raw EEG time series via. DWT-CNN model

Mesut ŞEKER, Mehmet Siraç ÖZERDEM

Barajların deşarj yapılarındaki akış karakteristikleri ve enerji kırıcı yapıların etkinliğinin sayısal analizi

Selman OĞRAŞ, Fevzi ÖNEN

Sabit Hız İle Yörünge Takibi Sağlayan Dört Çubuk Mekanizmasının Hız Kontrolü

Halit HÜLAKO, Orhan ÇAKAR

Küresel Optimizasyon Problemlerinde Balçık Kalıp Algoritması ve Hibrit Balçık Kalıp Algoritmalarının Performansının İncelenmesi

Osman ALTAY, Elif VAROL ALTAY

Monkeypox Hastalığını Tanımlamak için Derin Öğrenme Yaklaşımları Kullanmak

Sedat ÖRENÇ, Emrullah ACAR, Mehmet Siraç ÖZERDEM

İki taşlı bir payanda: Batman Çayı üzerinde yer alan bir Roma köprüsünün temelinde kullanılan farklı taşlar üzerine bir çalışma

Felat DURSUN, Fatma Meral HALİFEOĞLU

İzmit Körfez Geçiş Köprüsü (Osman Gazi Köprüsü) Hersek Burnu ayağı kuru havuz inşaatında zemin ve jeoteknik tasarım parametrelerinin korelasyon yöntemleriyle belirlenmesi sonucu karşılaşılan sorunlar

Mehmet İNCE, Ahmet KARAKAŞ, Özkan CORUK

Data division effect on machine learning performance for prediction of streamflow

Okan Mert KATİPOĞLU