Performance Evaluation of Various Regression Models and Features for Prediction of Ozone Concentration

Air pollution caused by ozone is a problem which threaten human health. Therefore, prediction of O3 concentration is important. In this work, O3 concentration level for Adana, Turkey is predicted with support vector regression (SVR), multi-layer perceptron (MLP), gradient boosting decision trees (GBDT), K nearest neighbors (KNN), elastic net machine learning methods. Parameters utilized for this prediction are hourly measurement of pollutants like particular matter (PM10), sulfur dioxide (SO2), nitrogen dioxide (NO2), nitrogen oxides (NOx), nitric oxide (NO) concentrations and also meteorological parameters like air temperature, wind speed, relative humidity, air pressure, wind direction. Additionally, hour, day and season information are used as features. It has been shown that SVR method achieves the best result with R2 value of 0.9697. Furthermore, backward elimination method is implemented for feature selection process and according to the results, current O3 concentration has the highest importance to predict the concentration for the next hour

Ozon Konsantrasyonu Tahmininde Çeşitli Regresyon Modelleri ve Özniteliklerin Performans Değerlendirmesi

Ozondan kaynaklanan hava kirliliği insan sağlığını tehdit eden bir problemdir. Bu nedenle, O3 konsantrasyonunun tahmini önemlidir. Bu çalışmada, Türkiye’nin Adana ili için O3 konsantrasyon seviyesi, destek vektör regresyonu (DVR), çok katmanlı algılayıcı (ÇKA), gradyan artırılmış karar ağaçları (GAKA), K en yakın komşu (KEK) ve elastik net makinesi öğrenme yöntemleri kullanılarak tahmin edilmiştir. Bu tahmin için kullanılan parametreler, partiküler madde (PM10), sülfür dioksit (SO2), azot dioksit (NO2), azot oksitler (NOx), azot monoksit (NO) gibi kirleticilerin konsantrasyonları ve ayrıca hava sıcaklığı, rüzgâr hızı, bağıl nem, hava basıncı, rüzgâr yönü gibi meteorolojik parametrelerin saatlik ölçümleridir. Ek olarak saat, gün ve sezon bilgileri de parametre olarak kullanılmaktadır. DVR yöntemi ile elde edilen R2 değeri 0,9697 olup diğer yöntemlerle elde edilen değerlerden yüksektir. Ayrıca öznitelik seçimi için geriye doğru eleme yöntemi uygulanmıştır ve sonuçlara göre bir sonraki saatin O3 konsantrasyonunu tahmin etmek için şimdiki O3 konsantrasyon seviyesinin en önemli öznitelik olduğu görülmüştür.

___

1. Lippmann, M., 1989. Health Effects of Ozone a Critical Review. Japca, 39(5), 672-695.

2. Manning, W.J., Tiedemann, A.V., 1995. Climate Change: Potential Effects of Increased Atmospheric Carbon Dioxide (CO2), Ozone (O3), and Ultraviolet-B (UV-B) Radiation on Plant Diseases. Environmental Pollution, 88(2), 219-245.

3. Selin, N.E., Wu, S., Nam, K.M., Reilly, J.M., Paltsev, S., Prinn, R.G., Webster, M.D., 2009. Global Health and Economic Impacts of Future Ozone Pollution. Environmental Research Letters, 4(4), 044014.

4. Vukovich, F.M., Sherwell, J., 2003. An Examination of the Relationship Between Certain Meteorological Parameters and Surface Ozone Variations in the Baltimore–Washington Corridor. Atmospheric Environment, 37(7), 971-981.

5. Hava Kalitesi Değerlendirme ve Yönetimi Yönetmeliği. Turkish Official Journal (Issue: 29940). https://www.resmigazete.gov.tr/eskiler/2008/06 /20080606-6.htm

6. Yu, R., Yang, Y., Yang, L., Han, G., Move, O.A., 2016. RAQ–A Random Forest Approach for Predicting Air Quality in Urban Sensing Systems. Sensors, 16(1), 86.

7. Corani, G., Scanagatta, M., 2016. Air Pollution Prediction Via Multi-label Classification. Environmental Modelling & Software, 80, 259-264.

8. Rybarczyk, Y., Zalakeviciute, R., 2016. Machine Learning Approach to Forecasting Urban Pollution. In 2016 IEEE Ecuado Technical Chapters Meeting (ETCM) IEEE, 1-6.

9. Shaban, K.B., Kadri, A., Rezk, E., 2016. Urban Air Pollution Monitoring System with Forecasting Models. IEEE Sensors Journal, 16(8), 2598-2606.

10. Salazar-Ruiz, E., Ordieres, J. B., Vergara, E.P., Capuz-Rizo, S.F., 2008. Development and Comparative Analysis of Tropospheric Ozone Prediction Models Using Linear and Artificial Intelligence-based Models in Mexicali, Baja California (Mexico) and Calexico, California (US). Environmental Modelling & Software, 23(8), 1056-1069.

11. Peng, H., Lima, A.R., Teakles, A., Jin, J., Cannon, A.J., Hsieh, W.W., 2017. Evaluating Hourly Air Quality Forecasting in Canada with Nonlinear Updatable Machine Learning Methods. Air Quality, Atmosphere & Health, 10(2), 195-211.

12. Chaloulakou, A., Saisana, M., Spyrellis, N., 2003. Comparative Assessment of Neural Networks and Regression Models for Forecasting Summertime Ozone in Athens. Science of The Total Environment, 313(1-3), 1–13. doi:10.1016/s0048-9697(03)00335-8

13. Ghoneim, O.A., Manjunatha, B.R., 2017. Forecasting of Ozone Concentration in Smart City Using Deep Learning. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 1320-1326). IEEE.

14. Luna, A.S., Paredes, M.L.L., De Oliveira, G.C.G., Corrêa, S.M., 2014. Prediction of Ozone Concentration in Tropospheric Levels Using Artificial Neural Networks and Support Vector Machine at Rio de Janeiro, Brazil. Atmospheric Environment, 98, 98-104.

15. Smola, A.J., Schölkopf, B., 2004. A Tutorial on Support Vector Regression. Statistics and Computing, 14(3), 199-222.

16. Gardner, M., Dorling, S., 1998. Artificial Neural Networks (the multilayer perceptron)- a Review of Applications in the Atmospheric Sciences. Atmospheric Environment, 32(14- 15), 2627–2636. doi:10.1016/s1352- 2310(97)00447-0.

17. Govindaraju, R.S., Rao, A.R. (Eds.). 2013. Artificial Neural Networks in Hydrology (Vol. 36). Springer Science & Business Media.

18. Zou, H., Hastie, T., 2005. Regularization and Variable Selection Via the Elastic Net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301-320.

19. Friedman, J.H., 2001. Greedy Function Approximation: a Gradient Boosting Machine. Annals of Statistics, 1189-1232.

20. Mohan, A., Chen, Z., Weinberger, K., 2011. Web-search Ranking with Initialized Gradient Boosted Regression Trees. In Proceedings of the Learning to Rank Challenge, 77-89.

21. Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., Zhou, Z.H., 2008. Top 10 Algorithms in data Mining. Knowledge and Information Systems, 14(1), 1-37.

22. Republic of Turkey Ministry of Environment and Urbanization National Air Quality Monitoring Network. https://www.havaizleme. gov.tr/

23. Kohavi, R., John, G.H., 1997. Wrappers for Feature Subset Selection. Artificial Intelligence, 97(1-2), 273-324.