Makine Öğrenmesi Algoritmaları ile Hava Kirliliği Tahmini Üzerine Karşılaştırmalı Bir Değerlendirme

Hava kirliliği, günümüzün en büyük sorunlarından birini teşkil etmektedir. Hava kirliliği, nüfusun artması, kentsel gelişme vebüyüme, endüstrinin gelişmesiyle giderek artan bir önem arz etmektedir. Genellikle hava kirleticilerinin insanlara, canlılara veçevreye zararlı etkileri zaman, mekan, etki süresi, konsantrasyon ve diğer karakteristiklerine bağlı olarak karmaşık dağılım şekillerigöstermektedir. Bu karmaşıklık, kirletici örnekleri ve eğilimleri modelleme veya ölçmede, ayrıca insanların maruz kaldığı seviyeleritahmin etmenin zor olduğu anlamına gelmektedir. Hava kirliliğini önleme çalışmaları arasında en önemli adımlardan biri havakirlenmesi olayının bir model içerisinde değerlendirilmesidir. Bu çalışmada Kastamonu ili ele alınarak, meteoroloji ve çevreuygulamalarında oldukça yeni ve başarılı sonuçlar elde edilen çeşitli makine öğrenmesi algoritmaları ile hava kirliliğinin tahmininde,bazı meteorolojik değişkenler kullanılarak hava kirliliği tahmini yapacak modeller geliştirilmiştir. Minimum-Maksimum (Min-Max)normalizasyon tekniği, öğrenme yöntemleri ile birlikte kullanılmıştır. Tahmin modellerinde, Yapay Sinir Ağları (YSA), RastgeleOrman (Random Forest), K-En Yakın Komşu (K-Nearest Neighborhood), Lojistik Regresyon (Logistic Regression), Karar Ağacı(Decision Tree), Lineer Regresyon (Linear Regression) ve Basit Bayes (Naive Bayes) yöntemleri kullanılmıştır. Çalışmada eldeedilen performans değerleri, literatürdeki benzer çalışmalarla kıyaslanarak problemin çözümüne ilişkin en uygun tahmin algoritmasıtespit edilmiştir. Veri setinin %70’i eğitim ve %30’si test verisi olarak ayrılmıştır. Çalışma sonucunda, YSA modeli için doğrutahmin oranı %87 ve diğer makine öğrenmesi modellerinden Rastgele Orman doğruluk oranı %99 ve Karar Ağacı doğruluk oranı%99 değerleri ile tahminlemede en başarılı sonuçları verdiği görülmüştür. Lineer Regresyon yöntemi %30’lık doğruluk oranı ileoldukça kötü performans sergilemektedir. KastamonuDataSet üzerinde kullanılan yöntemlerin performans değerlendirmelerindeAçıklayıcılık Katsayısı ($R^2$), Ortalama Karesel Hata (Mean Squared Error-MSE), Ortalama Hata Kare Kökü (Root Mean SquareError-RMSE) ve Ortalama Mutlak Hata (Mean Absolute Error-MAE) metrikleri bakımından istatistiksel önemli farklılıklarınbulunduğu tespit edilmiştir.

A Comparative Assessment on Air Pollution Estimation by Machine Learning Algorithms

Air pollution is one of the biggest problems of today. Air pollution, population growth, urban development and growth are increasingly important with the development of industry. Generally, the harmful effects of air pollutants on humans, animals and the environment show complex distribution patterns depending on time, space, duration of action, concentration and other characteristics. This complexity means that modeling and measurement of pollutant samples and trends is also difficult to predict the levels of pollution to which people are exposed. One of the most important steps in prevention of air pollution is the evaluation of contamination in a model. In this study, it is aimed to model air pollution by using some meteorological parameters in the estimation of air pollution by various machine learning algorithms which give new and successful results in meteorology and environment applications. Minimum-Max (Min-Max) normalization technique was used with learning methods. The performance values obtained in the study are compared with the similar studies in the literature and the most appropriate classification algorithm for the solution of the problem has been determined. Separate models were designed and analyzed by using methods such as Artificial Neural Networks (ANN), Random Forest, K-Nearest Neighborhood (K-NN), Logistic Regression, Decision Tree, Linear Regression and Naive Bayes. The performance values obtained in the study were compared with similar studies in the literature and the most appropriate estimation algorithm for the solution of the problem was determined. In this case, 70% of the data set is used for training and 30% for testing. As a result of the study, it was seen that the correct estimation rate for the ANN model was 87% and the other machine learning models gave the best results in the estimation with 99% of the Random Forest accuracy rate and 99% of the Decision Tree accuracy rate. The Linear Regression method performs poorly with a 30% accuracy rate. Performance evaluation of methods used on KastamonuDataSet in terms of the Explanatory Coefficient ($R^2$ ), Mean Squared Error (MSE), Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) metrics.

PDF

___

Alimissis, A., Philippopoulos, K., Tzanis, C.G., and Deligiorgi, D. (2018). Spatial estimation of urban air pollution with the use of artificial neural network models, Atmospheric Environment, 191, 205-213, 2018.
Bilbro, R. (2016). An Introduction to Machine Learning with Python. Erişim adresi: https://districtdatalabs.silvrback.com/anintroduction-to-machine-learning-with-python.
Aydınlar, B., Güveni H. ve Kırksekiz, S., 2009. Hava Kirliliği ve Modellenmesi, Sakarya Üniversitesi, Fen Bilimleri Enstitüsü, Çevre Mühendisliği Bölümü Yüksek Lisans Rapor.
Çınaroğlu, S. (2017). Sağlık Harcamasının Tahmininde Makine Öğrenmesi Regresyon Yöntemlerinin Karşılaştırılması, Uludağ Üniversitesi Mühendislik Fakültesi Dergisi, 22(2): 179-200.
Demirarslan, O. Çetin, Ş., & Ayberk, S. (2008). Hava Kirliliği Belirlemelerinde Modelleme Yaklaşımı ve Modelleme Aşamasında Karşılaşılabilecek Sorunlar, Environmental Problems Symposium, Kocaeli 2008.
Dey, A. (2016). Machine Learning Algorithms: A Review, International Journal of Computer Science and Information Technologies, 7(3): 1174-1179.
Dondurmacı, G.A. & Çınar, A. (2014). Finans Sektöründe Veir Madenciliği Uygulaması, Akademik Sosyal Araştırmalar Dergisi, 2(1): 258-271.
Hu, K. & Rahman, A. (2017). HazeEst: Machine Learning Based Metropolitan Air Pollution Estimation From Fixed and Mobile Sensors, IEE Sensors, 17(11): 3571-3525.
Huang, C-J., & Kuo, P-H. (2018). A Deep CNN-LSTM Model for Particulate Matter (PM2.5) Forecasting in Smart Cities, Sensors 2018.
Jayalakshmi, T. & Santhakumaran A. (2011). Statistical normalization and back propagation for classification. International Journal of Computer Theory and Engineering, 3(1): 89-93
Karasu, S., Hacıoğlu, R. & Altan, A. (2018). Prediction of Bitcoin Prices with Machine Learning Methods using Time Series Data, 26th signal Processing and Communications Applications Conference.
Kunt, F. (2007). Hava Kirliliğinin Yapay Sinir Ağları Yöntemiyle Modellenmesi ve Tahmini, Selçuk University Graduate School of Natural and Applied Sciences, M.Sc. Thesis, Environmental Engineering Department, Konya, 2007.
Martínez-Espaňa, R., Bueno-Crespo, A., Timón, I., Soto, J., Muňoz, A. & Cecilia, J.M. (2018). Air-Pollution Prediction in Smart Cities through Machine Learning Methods: A Case of Study in Murcia, Spain.
Peng, H. (2013). Air Quality Prediction by Machine Learning Methods. The Degree of Master of Science, the University of British Columbia.
Saygın, H., Eren, Ö., & Oral, H.V. (2018). Peaks Over Threshold Method Application on Airborne Particulate Matter (PM10) and Sulphur Dioxide (SO2) Pollution Detection in Specified Regions of İstanbul, Avrupa Bilim ve Teknoloji Dergisi, 14:399-407.
Tamas, W., Notton, G., Paoli, C., Nivet, M-L. & Voyant, C. (2016). Hybridization of Air Quality Frecasting Models Using Machine Learning and Clustering: An Orginal Approach to Detect Pollutant Peaks, Aerosol and Air Qaulity Research, 16: 405-416.
Yapraklı, T.Ş. & Erdal, H. (2016). Firma Başarısızlığı Tahminlemesi: Makine Öğrenmesie Dayalı Bir Uygulama, Bilişim Teknolojileri Dergisi, 9(1): 21-31.
Zaree, T. & Honarvar, A.R. (2018). Improvement of Air Pollution Prediction in a Smart City and its Correction with Weather Conditions using Metrological Big Data, Turkish Journal of Electrical Engineering & Computer Sciences, 26: 1302-1313.
Zhao, Y. & Hasan, Y.A. (2013). Machine Learning Algorithms for Predicting Roadside Fine Particulate Matter Concentration Level in Hong Kong Central, Computation Ecology and Software, 3(3): 61-73.
Wang, W. & Xu, Z. (2004). A Heuristic Training for Support Vector Regression, Neurocomputing, 61: 259-275.